Today CMS and ATLAS, the two large experiments operating at the Large Hadron Collider (LHC), have reached five inverse femtobarn of data, the goal established for 2011.
Having more data is crucial. All phenomena we study follow statistical laws and are therefore subject to statistical fluctuations. Earlier this summer, we observed small excesses that could have been seen as the first signs of the Higgs boson. Over time, these small excesses can become bigger, smaller or disappear. The only thing we can do about it is analyze more data to get a definitive answer. In time, either the signal will emerge unambiguously if it was real or it will vanish if it was only due to a statistical fluctuation.
Fortunately with statistics, when you double the data sample size, the error bar or margin for statistical fluctuations goes down as the square root of the increase. This is why we are always trying to collect more data, to reduce the size of possible statistical fluctuations. We now have five inverse femtobarns of data per experiment, that is five times more data than what was available in July.
One may think that once the analysis is defined, it is just a matter of passing all the newly accumulated data through those selection criteria to extract the type of events we want to study. That would be too easy…
Producing new results requires an incredible number of checks and cross-checks.
Our analysis technique is fairly simple: we use a theoretical model to predict new phenomena and particles, and with complex simulation methods, we reproduce what our detector response would be to such events. We do the same for all known processes, that is, we can predict the various types of collisions that will come out of the LHC. The simulated events look just like the events we collect in our detectors, except they are fabricated based on all our knowledge of what can be produced when protons collide in the LHC.
The next step is to determine a series of selection criteria designed with the sole purpose of spotting the needle from a barn full of haystacks. For this, we study in detail the characteristics of the events we are interested in, comparing these characteristics with those of other types of known processes. At this stage, the name of the game is to isolate the signal from all other types of events, those we refer to as background.
Most of the time, the background constitutes the bulk of all collected events. This is normal since the events we know best are the ones that are produced copiously and we have already had a chance to study them in depth in previous experiments.
The final step consists in comparing the sum of all simulations of known processes that would survive our selection criteria to the data we collect. We compare the sifted data to these specific events to see if we select more events than what was expected from all backgrounds, and check if these events bear any resemblance to the theoretical model under test.
And here is where all our time and effort goes: cross-checking that all is well done at each step. We constantly look at our simulated data events and compare them with real events collected in our detector. Since we are also trying to improve both our reconstruction algorithms and our simulations, every time something is modified, we need to crosscheck it against real data.
The more data we collect, the more precise these comparisons get, making it increasingly more stringent. In the end, the goal is to produce absolutely trustworthy results, excluding flaws, bugs and oversights.
Should we expect big announcements soon? It is hard to tell but we can all hope. We are tracking elusive particles that have escaped detection so far. If we don’t find anything new right away, we will at the very least show in detail where we have searched and map out all territory covered so far, where these particles can no longer hide. With lots of work, extreme rigor and huge computing facilities like the Grid, it can be done. At the very least, if we do not find new particles right away, we will be able to set limits that theorists will be able to take into account to draw a better picture of the world we live in. The more data we accumulate, the closer we get to this goal.
Pauline Gagnon
To be alerted of new postings, follow me on Twitter: @GagnonPauline or sign-up on this mailing list to receive and e-mail notification.