Given the tedium of what I need to deal with day to day on the computing, what is it that makes computing interesting? Let me make a comparison with what is going on in the collision halls. My colleagues underground at CERN are working very hard as we head towards LHC startup. There are some very tight time constraints at this point, and they are working with very complex systems that are pushing the limits of their technologies. And as we head into these final weeks, the separate systems that have been under development for years must be integrated into one large experiment. It’s a tremendous task, and I don’t want to take anything away from what they are doing.
However, they are starting to get out of the woods. The door to the collision hall will be shut at some point, and very little can be changed after that. And the number of people who will interact directly with those systems is relatively small; a team of experts, who will continue to make a lot of effort to make their hardware work and keep it running happily. Most of their work will be hidden to the world; physicists will be happy to see lots of silicon hits on tracks, but they will only have a vague idea of how much labor went into that. (I’ll say again, the hardware guys are under-appreciated!)
In contrast, just about everyone on CMS will interact with the computing in some way, which means that my problems are just beginning. Everyone will want to know where the datasets are. Everyone will be trying to submit jobs. Everyone will be trying to make plots. Performance will be documented and updated regularly on Web pages. This means that everyone will have an opinion on what works well and what doesn’t, and they won’t hesitate to voice it. And all the computers are above ground, and software can be modified with a few keystrokes; we can tweak things endlessly, and we might well be called upon to do so.
So in fact this is a very human enterprise — we are building a system that 2000 motivated, smart and creative people will be using every day. We need to make it work for each of them as individuals, while also making sure that the group as a whole is not harmed. And while ultimately we have to build good systems, there is a lot of psychology and sociology involved too. Everyone needs to actually buy in to the idea of distributed computing for it to work, which might be hard while we still work through all the kinks, and everyone will need to trust that they are being treated fairly. One of my mentors said to me once, “If all of our problems were physics problems, this job would be easy.” She was of course referring to the fact that we must work with people every step of the way. Physics equations and plots are interesting, but the human aspect of the work adds an extra dimension.
It is on my mind today because I have been corresponding with some users who are having trouble running jobs on our site. It sounds like there could be any number of things going on…many of which may have nothing to do with the performance of the cluster here. But it doesn’t matter; I’m invested in getting the entire chain working, because we have to build confidence. More to come, I’m sure.