During my brief time participating in the wide world of High Energy Physics (HEP) I have learned many, many things. But above all, if there is one thing I’ve come to understand, it’s that there will never be enough:
While some people may concern themselves with blood alcohol content. I spend my time thinking about blood caffeine content. I’ve become thoroughly addicted as a grad student, and without my daily (or sometimes hourly) “fix,” I doubt I would get anything done.
But caffeine isn’t just my own vice (or at least that’s the addict in me talking), I’ve come to think its a necessary evil within all fields of research. As an example, there are not one, not two, but four coffee pots on my floor of the Physics & Chemistry building; and I’m not even counting the chemistry side (or those that may be found in offices).
The coffee pot that I contribute to is filled twice a day (at least). We go through several containers of half & half every week, along with a tub of say Maxwell House coffee. We rely on everyone to contribute to keep this stream of liquid productivity flowing.
My own coffee mug has become to be known as “The Soup Bowl” among the grad students & professors on my floor. I maintained that it is a coffee mug, however I’ve been fighting a losing battle ever since the start of last spring semester. But whether its a mug for drinking coffee or a bowl for holding chicken noodle soup, I would get a whole lot less done in a day without this beautiful piece of ceramic:
And even though this mug fits a gigantic amount of coffee; I’ve come to think that it’s never enough.
Hours in a Day
While I need coffee to get through the hours of my day, I just really wish there were more of them.
My day begins between 8-10 am (usually depending on when I get home from the night before); I usually end up having to work until as late as 8-9pm (or sometimes even midnight) to accomplish what I need to for the day. I spend my time corresponding with other physicists via email, attending meetings, reading papers, and computer programming. It’s a lot of work, but I enjoy what I do. However, I am of the opinion that the sunrise and sunset should be a bit farther apart.
It’s been my experience that every single analysis in CMS can always benefit from more people becoming involved.
To give you an idea of what tasks are involved in an analysis, here’s a generic outline most conform to:
- Define experimental techniques
- Perform measurements
- Determine backgrounds
- Analyze experimental/theoretical uncertainties
- Obtain approval (each of the LHC’s Collaborations undergo an internal peer-review process before submitting for publication in an external peer-review journal).
These tasks take time, and above all, they need warm bodies (who sometimes have more in common with Zombies, sans coffee that is).
But HEP is a collaborative science. Within a given experiment (such as CMS or ATLAS) we all work together to make sure research is conducted precisely, and promptly. Each individual within the CMS Collaboration is usually juggling a series of different analyses. The time they invest in each of these analyses varies. However, each researcher usually has one project which is their “pet project,” and occupies the majority of their time.
But needless to say, HEP is a massive undertaking, and it seems like there are never enough Physicists/Grad Students involved.
What’s the difference between one inverse femtobarn (fb-1) of data, and say ten, or a hundred?? Only a series of discoveries that will forever change our understanding of the universe. You know, nothing major.
Humor aside, the experiments at the LHC have collected over 1 fb-1 of data this past year. And there have been several times in which we collected more data in a day then we did in all of 2010 (which I find astounding):
But what’s the big deal? Well, one of the rules of thumb in particle physics says: to have a discovery, you need to have a statistical significance of five sigma over your current theory/background. Simply put, the chances that your discovery is a statistical fluke must be less then 0.01%.
While this may seem a bit ad hoc, it is actually necessary. Three sigma effects come and go in particle physics.
But because of this stringent requirement we are always asking for more. We always wish for our colliding beams to have a higher luminosity. We always want the time between crossings of particles in the detector to be minimized. In short, we always want more data, and there is never enough!
Who knows what is on the horizon of tomorrow’s proton collisions. I for one have no idea, but I avidly look forward to the coming “more glorious dawn.”
I’m sure my colleagues have differing opinions on what is and is not needed in high energy physics. But, I adamantly believe there are two things all of us would agree on. We always need more data, and we always need more CPU’s.
Cluster computing is the name of the game. There are rooms at HEP Labs that can usually be heard from “miles away” (or at least a few meters). They literally hum with activity. To me it sounds like raw science. To someone more “normal,” it probably sounds like hundreds of fans all operating at once (which is exactly what it is). These rooms are filled with racks upon racks of computers, all linked in some fashion. Users all over the country/world submit hundreds of thousands of “jobs,” or research tasks, to these clusters. In each of these jobs, a piece of the cluster is given some software a researcher has developed, and use this software to analyze data.
As an example, I perform a relatively small analysis (with respect to the scope of LHC Physics), but I run between 7.5-14K computing jobs a week. Job number is a bit arbitrary though; this is because a user specifies how large each job is. To be a bit more concrete, the size of all the data & simulated samples I need for my work is over 80 terabytes.
So how do I, and other physicists, analyze all this data? With jobs!
And here’s how it works: one of my data sets has roughly 35 million events. If I attempt to process this data all at once, with one computer (even recent jeopardy champion Watson) it will take forever. Instead, I break the task of data analysis up into many many tasks (aka jobs). Each job will analyze 25-50K events. In this manner high energy physics makes use of “parallel-computing,” and save time.
But why do we need this job system, how long would it really take to process that data in one shot? Well assuming a spherical cow, each of my jobs takes ~12 hours. To run over those 35 Million events I mentioned, I need 3836 jobs. So at 12 hours a job, it would take Watson ~5.3 years to process all the data if it was done in one job.
So much for getting my degree in less then a decade (and heaven forbid I make a mistake!).
But the irony of having so many physicists participating in a HEP experiment, is that not everyone will have all of their jobs running at a time. Each cluster has a finite number of CPU’s, and a seemingly infinite amount of jobs submitted to it (continually). What usually happens is a person will have anywhere between 6 to 600 of their jobs running at a time (depending on who else is using the cluster).
So to analyze data, it could take anywhere between a night to a week. And in this regard, I believe we will never have enough CPU’s.
Until next time,