• John
  • Felde
  • University of Maryland
  • USA

Latest Posts

  • USLHC
  • USLHC
  • USA

  • James
  • Doherty
  • Open University
  • United Kingdom

Latest Posts

  • Andrea
  • Signori
  • Nikhef
  • Netherlands

Latest Posts

  • CERN
  • Geneva
  • Switzerland

Latest Posts

  • Aidan
  • Randle-Conde
  • Université Libre de Bruxelles
  • Belgium

Latest Posts

  • TRIUMF
  • Vancouver, BC
  • Canada

Latest Posts

  • Laura
  • Gladstone
  • MIT
  • USA

Latest Posts

  • Steven
  • Goldfarb
  • University of Michigan

Latest Posts

  • Fermilab
  • Batavia, IL
  • USA

Latest Posts

  • Seth
  • Zenz
  • Imperial College London
  • UK

Latest Posts

  • Nhan
  • Tran
  • Fermilab
  • USA

Latest Posts

  • Alex
  • Millar
  • University of Melbourne
  • Australia

Latest Posts

  • Ken
  • Bloom
  • USLHC
  • USA

Latest Posts

Anadi Canepa | TRIUMF | Canada

View Blog | Read Bio

Data analysis in simple words

And here I am, back to CERN! In the past weeks I spent most of my time traveling and attending conferences, which is one exciting component of our job. The IFAE (Italian Conference of High Energy) was held in Bari, located in the South of Italy.

dsc00575 Its fame in the world comes not only from the wonderful cathedral,but mainly from the “orecchiette”, the Puglia’s traditional ear-shaped pasta. As well   known, you can get “orecchiette” while walking through the narrow and charming streets of the old town!

(These nice pictures are from good friends of mine whom I had the pleasure to see again)

img_1440

The conference covered a broad range of physics results, spanning from astrophysics to nuclear physics and finally to particle physics. Let me spend some time now to explain how we actually carry out a data analysis and produce our results. The process is long and complex, involving data taking, high level programming and final extensive data understanding. First the events are collected on tape, i.e. all signals from all sub-detectors are saved and used for so called “reconstructing” the objects in the event (from a set of hits recorded by the muon chamber we can infer that a muon passed through that given chamber, for instance). At this stage we know the nature of the object (being muon or electron, etc) at high level of confidence. Once we have such pictures of all events, we select those events which resemble the event we look for. If the particle we hunt for decayed into two muons and two neutrinos, we would select only events with two muons and missing transverse energy (neutrinos translate into missing transverse energy in the detector language). However, not only the particle we look for decays into two muons and two neutrinos, but also many other (non-interesting) ones. And generally the non-interesting processes happen at higher rate than the interesting ones! We might be left with millions of possible (candidate) events while we expect our particle to contribute with just hundreds events (or less). How do we dig these events out ? For each given event, we don’t know which process it corresponds to. We only know the rates of processes. Our approach needs to be probabilistic. In this framework, we then look for deviation from the rates we expect. Typically we measure the rates of the background processes that populate our pool of candidate events. The rate are known within some uncertainty. In most of the current searches the uncertainty is larger than the signal itself. The plot below gives you an example.

st

The histogram presents the number of candidate events we observe in data (black marker) compared to the number of events we expect from out background model (the meaning of the x-axis is not crucial now). The dashed area indicates the uncertainty on the prediction. If we focus on the first bin, we expect  a number of events varying between 2800 and 4000 and we observe 3600.  If the signal causes a deviation of – say – 50 events we would not be able to see it by simple counting.

To overcome this experimental limitations, advanced analysis techniques have been studied and finally, after careful consideration, deployed in the searches. Those techniques are not new, but were imported in the field fairly recently. They are machine learning tools, ranging from Neural Network to Boosted Decision Tree. Let me steal from Wikipedia a concise description: “machine learning is the sub-field of artificial intelligence that is concerned with the design and development of algorithms that allow computers to improve their performance over time based on data, such as from sensor data or databases. A major focus of machine learning research is to automatically produce (induce) models, such as rules and patterns, from data. Hence, machine learning is closely related to fields such as data mining, statistics, inductive reasoning, pattern recognition, and theoretical computer science.”
The basic idea is to teach an artificial brain to distinguish the signal from the background to levels that the experiments could not reach. This gave a boost to the sensitivity of the current experiments at the Tevatron. The top quark is mainly produced in pairs from a gluon at high rate; however it can also be singly produced in processes involving the exchange of the W boson. While in the first case we end up with high energy events containing a large number of jets (experimental manifestation of quarks in this case) and leptons, in the second case the amount of energy is smaller and the number of objects is reduced (one top quark decays instead of two top quarks). As a consequence the second process is extremely challenging from experimental point of view. The Tevatron experiments, CDF and D0, invested the past years in looking for that process!
Teams of physicists analyzed the data produced in proton-antiproton collisions to build the basis of background modeling and construct a solid “single top” search. Depending on the mode in which the single top decays, they could look for electrons, or muons or jets and missing transverse energy. Each decay mode needs to be distinguished from a different background source, due to other un-interesting processes or detector mis-measurements. Finally the separate analyzes are combined in a single sensitive search using machine learning techniques. The observation was announced in March, 15 years after the pair production of top quark pairs was firstly observed! The measured rate of single top production is in agreement with the Standard Model expectations.

tev_st

Share