And now, the long-promised explanation of the CMS distributed computing system. (I know, you have been on the edge of your seats all this time.)
Let’s start by considering boundary conditions. First, the LHC will produce a lot of data. Every year, the CMS detector will produce something like a petabyte of raw data. A petabyte is a million gigabytes, and if I did the calculation right, if stored on a set of DVD’s, they would stack up twice as high as the Nebraska state capitol, a famously tall building (if you know your Nebraska). This data needs to be processed (which usually means adding more information to it, making it bigger), stored and analyzed. On top of that there is an even larger amount of simulated data — if you are looking for new physics, you have to simulate it first so you know exactly what detector signatures you are looking for. Thus, we are talking many petabytes of data per year that we must work with.
Second, you may not notice this while tapping on your laptop, but computers require a significant amount of power and cooling for their operation. This has become a constraint on operating data centers; last year I went to a conference on computing in high-energy physics, and the whole week ended up being about power and cooling. (Yes, I was able to stay awake.) No single site can deploy enough power and cooling to support all of the computing needed for CMS data processing and analysis.
So, our answer is to run a highly-distributed computing system, with centers distributed around the globe. Now, this does present significant organizational challenges, but it also allows us to make use of computing expertise in every CMS country, and also gives people a sense of ownership — my vice-chancellor for research was much more interested in helping to pay for computers in Nebraska than he would have been to send computers to Switzerland.
To keep the system manageable, we’ve imposed a tiered hierarchy on it. Different computing centers are given different responsibilities, and are designed to meet those responsibilities. (“Design” here means how much CPU or disk they have, and what sort of networking requirements, etc.) A too-cool-for-school graphic showing how the whole thing works can be found here. The Tier-0 facility at CERN receives data directly from the detector, and it reconstructs events and writes a copy of the output to tape. This may not sound like much, but it saturates the resources that are available at CERN.
Data is then transferred to Tier-1 centers. CMS has seven of these, in the US (at Fermilab), the UK, France, Spain, Italy, Germany and Taiwan. These centers store some fraction of the data that come from CERN, and as we gain a better understanding of our detector behavior and of how we want to reconstruct the data, they also re-reconstruct their fraction of the data every now and then. They also make “skims” of these events — a particular physics measurement typically relies on only a portion of all the collisions that we record, so we split the data into different subsamples that will each be enriched in certain kinds of events.
Note that in all this no one has yet made a plot that will appear in a journal publication! This starts to happen at Tier-2 sites; that’s where skims get placed for general users to analyze them. There are about forty of these sites spread over five continents, and they are also responsible for generating all of that simulated data mentioned earlier. This makes the Tier-2 sites very diverse and dynamic facilities — they are responsible to many different people trying to do many different things.
I have surely rambled on enough for a single posting, so some other time I will write about some of the particular challenges we face in making this system work. Suffice it to say that I spend a lot of time thinking about it. I try not to let it keep me up at night, but sometimes the title turns out to be true. Sorry, I needed to come up with a title for this post, and while “Trail of tiers” was more appropriate, it also has negative connotations in Native American history.























I understand that LHC is a big challenge, but gone are days when HEP data rates and amounts ruled the world. Nowadays kids download movies which is few Gigs each in a day via torrents and internet routers are coping with ever increasing tremendous traffic, so don’t scare us with grids and tiers.
thanks for news
It’s true, twobyte, that a lot of people are moving a lot of data around these days. But that’s only part of the issue for us — we also have to move jobs to the data, and we want to do it in a way that doesn’t involve a lot of administrative overhead (e.g. every person in CMS getting a computing account on every machine that might possibly host their data). This is one thing that the grid technology gets us. As for the data movement thing, I must admit that I am often frustrated by how difficult it can be for us to do it — although there too I will note that there is a lot more bookkeeping overhead involved than the kid downloading movies has. Still, just yesterday we moved 29 TB from Fermilab to Nebraska in under 24 hours, looks like an average transfer rate of about 500 MB/s = 0.5 GB/s. This would be a lot of movies.