So in the last few posts, I’ve been talking about Jets. I’ve also touched on ways to identify a specific type of jet: the b-Jet. Recall, a b-Jet is a jet that is produced as a result of the hadronization of a b or anti-b quark (termed bbar or simply b).
I also outlined some properties of B-Hadrons (see the second link above). So let’s start to put these properties to good use and flesh out one of the standard B-Taggers used by high energy physicists, namely the Track Counting (TC) Algorithm.
(Again, you may click on any of the below images to enlarge them further).
One Track, Two Track, Three Track, Four!?
In my previous post I stated that a B-Hadron will produce roughly five charged particles per decay. These charged particles will then leave a track within the Silicon Tracker of CMS. So if a jet is a b-jet, it will have more high impact parameter tracks then a jet produced from the hadronization of a light quark or gluon.
The Track Counting approach is actually rather simple, physicists require a jet to have at least N tracks (for some integer N). In CMS we take N to usually be 3, but I will explain why in more detail below. However, we don’t use all of a jet’s tracks in the TC Algorithm. We require the tracks used to be of “high quality”.
But what does that mean? How do you judge the “quality” of a track? The answer is information, how much the detector knows about the particle that made that track. This comes about in the form of how many hits a particle left within the silicon tracker as it traveled (these hits are then used to make the track). If the particle left more hits in the tracker it means higher track quality. Here’s an example of two tracks, the one on the left has more then 10 hits in the tracker, while the one on the right has only 5. So, the left track is of higher quality.
Here the blue dot at the start of each track represents the location of the primary vertex (the point where the proton-proton collision occurred). The track itself is represented by the green line. The track’s hits in Silicon Tracker of CMS are represented by the blue rectangles (each rectangle is a piece of the Silicon Tracker).
A Measurement of Impact
Now that we have a collection of high quality tracks belonging to a jet, how do I use them to test if the jet is a b-Jet or not? We look at something called the impact parameter, or the distance between the primary vertex and the closest approach to the track. A visualization will help with understanding this:
The track is represented by the dotted blue line. And this track belongs to a jet (with a direction given by the green arrow). This Jet direction represents the direction of the jet’s cone within the detector (see the first link above to get an idea of what a jet cone is like in CMS).
The Impact Parameter, is represented by the red line, and is drawn from the primary vertex to the track. Notice how the point where the IP touches the track, a right angle is formed, this is how the point of closest approach is identified. Also, the location where the red line makes a right angle with the track is unique. Meaning, the IP always makes a right angle with the track, and there is only one IP per track.
However, the error on the IP measurement could sometimes be large. To account for this physicists divide the IP by its error, and this new value is called the IP-Significance (IP-Sig).
We also have a sign convention for this IP-Sig value. If the cosine of the angle between the track and the jet axis is positive (marked as θ in above diagram), the IP-Sig is a positive number (the track is said to be “downstream” of the jet axis). If the cosine of this angle is negative, the IP-Sig is negative (and this is said to be “upstream” of the jet axis).
Discriminating against non-b-Jets
The goal of all B-Tagging algorithms is to create what is called a discriminator. A discriminator is some number that is calculated from a jet’s properties. As the value of a jet’s discriminator increases, the likelihood that the jet is a b-Jet also increases. It’s a very simplistic approach, and works beautifully.
In the TC Algorithm, the discriminator is the signed IP-Sig value mentioned above. The reason we use the signed IP-Sig value is best summarized as:
Prompt tracks from the primary vertex have small IP values while tracks from decays of B hadrons have rather large IP values because of the B hadron lifetime .
So b-jets will have several tracks with large IP values. But as I mentioned above, we convert these IP values to signed IP-Sig values to minimize the impact of the measurement’s error on our discriminator. In summary, if a jet has many tracks with small signed IP-Sig values, it is most likely not from the hadronization of a b quark/anti-quark. While a jet originating from a b quark/anti-quark will have tracks with large IP values, because they will be “displaced” from the primary vertex.
This again ties back to my previous post which outlined the properties of B Hadrons (second link at the start of this post). And it was these B-Hadron properties that motivated the creation of the TC Algorithm years ago.
But this raises a new question. Each of the high quality tracks within a jet has a signed IP-Sig value, so which of these of these IP-Sig values do we use in B-Tagging?
To answer this we first order all of a jet’s high quality tracks by decreasing IP-Sig value. We then choose to look at the Nth track in this listing for all of our jets under study (remember how I said N was usually 3 above?). The Nth track has a signed IP-Sig value greater then some number Y; and thus the jet has a chance X of being b-Jet. As the number Y increases, the chance to be a b-Jet, X, also increases. Here are some plots that will let us get a better understanding of this:
From left to right: signed IP-Sig for all selected tracks, 1st, and 3rd track, in selected jets found in proton-proton collisions recorded by the CMS Detector in 2010 .
In the above plots, CMS physicists have plotted the signed IP-Sig values for: (from left to right) all high quality tracks within all jets, the first high quality track within a jet, and the third high quality track within a jet. The x-axis in each case is the value of the signed IP-Sig of the jet’s track(s). The y-axis represents the number of jets found with tracks/a track with that signed IP-Sig value.
The black dots in each of the colored distributions represent the signed IP-Sig values of jets found in actual collision data. Whereas the colored distributions represent the values found in simulation for light jets (blue), c-jets (green), and b-jets (red). Recall that when I say a jet is a light-jet or a c-jet, I mean the jet was created by the hadronization of a light quark/anti-quark (or gluon), or a c quark/anti-quark.
The distributions below the colored distributions represent how well the simulation compares to actual data. If the simulation matches data, the black points there should be at one, or close to one. For the most part, the simulation describes the data well, and we are constantly improving our simulation so that the agreement becomes better and better.
What’s interesting to note is what happens when we look at a jet’s high quality track that has the third highest IP-Sig value. We see that as this value increases positively, the distribution (far right) is completely dominated by b-jets. Whereas in the other two distributions, there is still a reasonable contribution of light jets at all values of signed IP-Sig.
This far right distribution is known as the Track Counting High Purity (TCHP) Algorithm. And CMS Physicists use this algorithm to search for b-Jets in many different research areas; from top quark physics, precision QCD measurements, to supersymmetric searches, this algorithm is one of the major tools employed by high energy physicists as a whole.
Because of B-Hadron properties, physicists have come up with a way to identify b-Jets, require the jet to have tracks with high IP-Sig values.
Recall that this TC Algorithm made use of the fact that B-Hadrons decay into many charged particles, and the long life-time of B-Hadrons. This long-lifetime ensures that particles produced by decaying B-Hadrons will have tracks with large IP values (this then translates into large IP-Sig values). All of these things are illustrated in the three distributions shown above.
Until next time,
 CMS Collaboration, “Performance of track and vertex reconstruction and b-tagging studies with CMS in pp collisions at sqrt(s) = 7 TeV,” Proceedings of Science, Kruger National Park, Mpumalanga, South Africa, December 2010.
 CMS Collaboration, “Commissioning of b-jet identification with pp collisions at sqrt(s) = 7 TeV,” CMS Physics Analysis Summary, CMS-PAS-BTV-10-001, http://cdsweb.cern.ch/record/1279144?ln=en.