And now, the long-promised explanation of the CMS distributed computing system. (I know, you have been on the edge of your seats all this time.)
Let’s start by considering boundary conditions. First, the LHC will produce a lot of data. Every year, the CMS detector will produce something like a petabyte of raw data. A petabyte is a million gigabytes, and if I did the calculation right, if stored on a set of DVD’s, they would stack up twice as high as the Nebraska state capitol, a famously tall building (if you know your Nebraska). This data needs to be processed (which usually means adding more information to it, making it bigger), stored and analyzed. On top of that there is an even larger amount of simulated data — if you are looking for new physics, you have to simulate it first so you know exactly what detector signatures you are looking for. Thus, we are talking many petabytes of data per year that we must work with.
Second, you may not notice this while tapping on your laptop, but computers require a significant amount of power and cooling for their operation. This has become a constraint on operating data centers; last year I went to a conference on computing in high-energy physics, and the whole week ended up being about power and cooling. (Yes, I was able to stay awake.) No single site can deploy enough power and cooling to support all of the computing needed for CMS data processing and analysis.
So, our answer is to run a highly-distributed computing system, with centers distributed around the globe. Now, this does present significant organizational challenges, but it also allows us to make use of computing expertise in every CMS country, and also gives people a sense of ownership — my vice-chancellor for research was much more interested in helping to pay for computers in Nebraska than he would have been to send computers to Switzerland.
To keep the system manageable, we’ve imposed a tiered hierarchy on it. Different computing centers are given different responsibilities, and are designed to meet those responsibilities. (“Design” here means how much CPU or disk they have, and what sort of networking requirements, etc.) A too-cool-for-school graphic showing how the whole thing works can be found here. The Tier-0 facility at CERN receives data directly from the detector, and it reconstructs events and writes a copy of the output to tape. This may not sound like much, but it saturates the resources that are available at CERN.
Data is then transferred to Tier-1 centers. CMS has seven of these, in the US (at Fermilab), the UK, France, Spain, Italy, Germany and Taiwan. These centers store some fraction of the data that come from CERN, and as we gain a better understanding of our detector behavior and of how we want to reconstruct the data, they also re-reconstruct their fraction of the data every now and then. They also make “skims” of these events — a particular physics measurement typically relies on only a portion of all the collisions that we record, so we split the data into different subsamples that will each be enriched in certain kinds of events.
Note that in all this no one has yet made a plot that will appear in a journal publication! This starts to happen at Tier-2 sites; that’s where skims get placed for general users to analyze them. There are about forty of these sites spread over five continents, and they are also responsible for generating all of that simulated data mentioned earlier. This makes the Tier-2 sites very diverse and dynamic facilities — they are responsible to many different people trying to do many different things.
I have surely rambled on enough for a single posting, so some other time I will write about some of the particular challenges we face in making this system work. Suffice it to say that I spend a lot of time thinking about it. I try not to let it keep me up at night, but sometimes the title turns out to be true. Sorry, I needed to come up with a title for this post, and while “Trail of tiers” was more appropriate, it also has negative connotations in Native American history.