Zachary Marshall | USLHC | USA

It’s Just Like Work!

It’s Like Work

Several bloggers have talked about the LHC Computing Grid already. We use a lot of computing resources as physicists. The WLCG homepage actually has some nice information about the Grid, including cool pictures of what’s online now:

There’s a wonderful thing that comes along with using all these computing resources. Frequently, I’ll set up some task and set it off to run on a few hundred computers somewhere. It feels like I’m working hard – even if the computers are doing all of the heavy lifting!! It also lets you justify a long coffee break: “I’m working right now! The Grid is whirring away because of me!!”

I’ve spent a lot of my time working on improving the ATLAS software (usually trying to make it faster). Most computers these days use around 80 Watts of electricity – about as much as a bright light bulb (or one of those lightbulbs you might find in a dimming lamp). That means, if we leave them on and running year-round, we spend about $100 for the electricity for each computer we have. The Grids that ATLAS uses (there are three, actually) have about 30,000 computers on them, which means that we spend about$3M a year for the electricity to run the computers on the Grid.

Of course, you have to cool all those machines, and most of the buildings that they live in are not the most elegant, modern, energy-efficient buildings that you might construct today. So you can guess that we spend about the same amount on air conditioning – another $3M (that is actually pretty close to right, based on CERN’s experience). Recently, ATLAS changed the operating system that we run our applications on – like an upgrade from Windows XP to Vista, or Mac OS X 10.5 (Leopard) to 10.6 (Snow Leopard). The operating system we use is called “Scientific Linux,” and we moved from Scientific Linux 4 to Scientific Linux 5. Because of a few of the fancy new tricks that came along with that change, our software suddenly runs 20% faster. So an operating system upgrade just saved us$1.2M a year!!

Actually, that’s not quite true. The electricity for computers on the Grid is pretty cheap compared to some of the other parts of the budget. So rather than turning off the computers, we run them more, and we can process more data in the same amount of time. Still, it’s a nice thought! And little calculations like this make me think they should give me a bonus when ever I make our software a little bit faster…

Richard asked about [email protected] after my last post. You can read all about it at their website. That’s a neat project, and we’ve talked about different ways to use it to our advantage. There are a few problems, though, that are perhaps interesting to mention (note: I’m a mere blogger – this is just one fellow’s opinion).

[email protected] Screensaver

One problem is that the software we use is pretty big. A typical installation is around 7 GB, and runs natively on linux machines. On top of that, the data files are typically a few more GB. There aren’t a whole lot of people who are willing to blow 15GB of their hard drive space on a nice screen saver, so we have to think carefully about whether there is a slimmed down version that we can send out and run on Windows or Apple computers (to reach a broader audience).

Another problem is that our data is still “sensitive.” In order to make full use of our friends’ computers, we would want to give them full access to our data. But we want to be the first to publish results with that data! So it is a bit nervous-making to just send the data to whoever asks for it. More likely, there would be someone out there who would try to use the data, but wouldn’t really understand it, and so would end up misidentifying something interesting. Then we’d have to spend our time trying to fix the things they’d done wrong. There was an interesting discussion about that at a conference I attended a few years ago. Someone asked that all LHC data be made publicly available. Of course, we raised this objection then (that they wouldn’t understand what we were giving them). And then a person asked a very nice question: “The data from the previous experiment at CERN (called LEP) is publicly available. Has anyone looked at it?” No one outside the experiments had. So one more reason to not try to make our data public.

One more problem is dealing with “conditions.” What we get out of the detector depends on the state that the detector is in at the time – which pieces are on or off, what temperature those pieces are, what voltage is being used, and so on. All that information (called “conditions”) is put in a big database at CERN. When ever we want to use the data, we have to read some of that information, and so we have to access the database. If more than a few thousand people tried to connect at the same point, it would bring down the database, and no one would be able to use it! We have duplicates set up around the world now to ease that problem. On top of that, we now have caching servers set up near those. When you ask one of those servers for information, it checks whether it has it around, and only if it doesn’t will it go back to the original database. That way we make the most frequently used conditions available all over. But I am not sure that we have the infrastructure to allow that many new people to request conditions information! And it would be risky to launch a program that might bring our work to a halt, just as the LHC is getting up and running!!

Of course, Moore’s Law continues to hold, and computers continue to get cheaper. So by 2020 this might all be easy, and everyone might be running our software as a screen saver. But for now, it’s quite a challenge!

–Zach