Whenever we come across a new result one of the first things we ask is “How many sigma is it?!” It’s a strange question, and one that deserves a good answer. What is a sigma? How do sigmas get (mis)used? How many sigmas is enough?

The name “sigma” refers to the symbol for the standard deviation, σ. When someone says “It’s a one sigma result!” what they really mean is “If you drew a graph and measured a curve that was one standard deviation away from the underling model then this result would sit on that curve.” Or to use a simple analogy, the height distribution for male adults in the USA is 178cm with a standard deviation of 8cm. If a man measured 170cm tall he would be a one sigma deviation from the norm and we could say that he’s a one sigma effect. As you can probably guess, saying something is a one sigma effect is not very impressive. We need to know a bit more about sigmas before we can say anything meaningful.

The term sigma is usually used for the Gaussian (or normal) distribution, and the normal distribution looks like this:

The area under the curve tells us the population in that region. We can color in the region that is more than one sigma away from the mean on the high side like this:

This accounts for about one sixth of the total, so the probability of getting a one sigma fluctuation up is about 16%. If we include the downward fluctuations (on the low side of the peak) as well then this becomes about 33%.

If we color in a few more sigmas, we can see that the probability of getting two, three, four and five sigma effect above the underlying distribution is 2%, 0.1%, 0.003%, and 0.00003%, respectively. To say that we have a five sigma result is much more than five times as impressive as a one sigma result!

When confronted with a result that is (for example) three sigma above what we expect we have to accept one of two conclusions:

- the distribution shows a fluctuation that has a one in 500 chance of happening
- there is some effect that is not accounted for in the model (eg a new particle exists, perhaps a massive scalar boson!)

Unfortunately it’s not as simple as that, since we have to ask ourselves “What is the probability of getting a one sigma effect somewhere in the distribution?” rather than “What is the probability of getting a one sigma effect for a single data point?”. Let’s say we have a spectrum with 100 data points. The probability that every single one of those data points will be within the one sigma band (upward and downward fluctuations) is 68% to the power 100, or \(2\times 10^{-17}\), a tiny number! In fact, we should be expecting one sigma effects in every plot we see! By comparison, the probability that every point falls within the three sigma band is 76%, and for five sigma it’s so close to 100% it’s not even worth writing out.

A typical distribution with a one sigma band drawn on it looks like the plot below. There are plenty of one and two sigma deviations. So whenever you hear someone says “It’s an X sigma effect!” ask them how many data points there are. Ask them what the probability of seeing an X sigma effect is. Three sigma is unlikely for 100 data points. Five sigma is pretty much unheard of for that many data points!

So far we’ve only looked at statistical effects, and found the probability of getting an X sigma deviation due to fluctuations. Let’s consider what happens with systematic uncertainties. Suppose we have a spectrum that looks like this:

It seems like we have a two-to-three sigma effect at the fourth data point. But if we look more closely we can see that the fifth data point looks a little low. We can draw three conclusions here:

- the distribution shows a fluctuation that has a one in 50 chance of happening (when we take all the data points into account)
- there is some effect that is not accounted for in the model
- the model is correct, but something is causing events from one data point to “migrate” to another data point

In many cases the third conclusion will be correct. There are all kinds of non-trivial effects which can change the shape of the data points, push events around from one data point to another and create false peaks where really, there is nothing to discover. In fact I generated the distribution randomly and then manually moved 20 events from the 5th data point to the 4th data point. The correct distribution looks like this:

So when we throw around sigmas in conversation we should also ask people what the shape of the data points looks like. If there is a suspicious downward fluctuation in the vicinity of an upward fluctuation be careful! Similarly, if someone points to an upward fluctuation while ignoring a similarly sized downward fluctuation, be careful! Fluctuations happen all the time, because of statistical effects and systematic effects. Take X sigma with a pinch of salt. Ask for more details and look at the whole spectrum available. Ask for a probability that the effect is due to the underlying model.

Most of the time it’s a matter of “A sigma here, a sigma there, it all balances out in the end.” It’s only when the sigma continue to pile up as we add more data that we should start to take things seriously. Right now I’d say we’re at the point where a potential Higgs discovery could go either way. There’s a good chance that there is a Higgs at 125GeV, but there’s also a reasonable chance that it’s just a fluctuation. We’ve seen so many bumps and false alarms over the years that another one would not be a big surprise. Keep watching those sigmas! The magic number is five.

Tags: data analysis, sigma, Statistics

Thank you ! It’s the first time I see a clear explanation of what sigma really means !

But… 118 cm (46.5 in) for the US male adult average height, really ? Did you mean 178 cm (70.1 in) ?

Please triple check your maths and remember Mars Climate Orbiter !

Thanks for the comment! And good catch, that error’s fixed. (Sloppy handwriting on my part!)

118 cm average height with 8 cm sigma, Wow, that makes me a 9 sigma freak!

to Aidan:

I wonder a little about the:

“68% to the power 100″

The calculation is based on the assumption that

the 100 data points are statistically independent.

Perhaps this is obviously true in the contexts you

have in mind.

A helpful and enjoyable post.

Roger

(Roger Purves)

Hi Roger, thanks for your comment! I’ve tried to keep the treatment as simple as possible, and the point about 68% to the power 100 was made assuming that we only had to deal with statistical uncertainties. As you point out, if the data points are not statistically independent then we start to run into problems. As long as we have one entry per physics event then it’s usually safe to assume that the data points are statistically independent, since each physics events is independent of the other physics events.

Things start to get very complicated if there is an overall fluctuation (up or down) of the total number of events summed over the whole distribution. When this happens we need to make our best estimate of the shape of the distribution, and the significance of individual fluctuations. There can also be problems in a distribution that has a large gradient, where a fluctuation at one data point can cause a dramatic shift in the overall shape.

To take uncertainties into account where the data points are not statistically independent (eg if we have several entries per event) we can either choose to weight the contributions to each data point, or to include a covariance matrix that takes the correlations between the data points into account. A good example of a distribution where the data points are not statistically independent would be the transverse momentum of leptons. Suppose we have a Z boson decaying to two leptons. We can then find the relationship between the transverse momenta of the two leptons and we find that when we get one high transverse momentum lepton we also get one low transverse momentum lepton. If we don’t take this into account we can end up with all kinds of unwanted correlations. We usually separate out the “leading” from the “subleading” lepton for this reason.

I hope that answers your question!

Aidan:

Thank you, but Quantum diaries mission is not facebook and is not aimed to be a teaching class, especially one for teaching basic statistics and statistical probability theory! Unless there are extremetly important results coming out of CERN’s experiments that merit setting up teaching webclasses or teacher blogs to clarify why or why not, how or how not, and setup strong arguments, etc… Such would then assist science community to absorb a specific complicated issue.

Anyway, good try.

Hi, thanks for your comment, but I must disagree with what you’ve said. The tagline of Quantum Diaries is “Thoughts on work and life from particle physicists from around the world.” and as an author it’s up to me to interpret what that means. Now if you already understand the basics of statistical analysis then good for you, and if you felt this post was below your abilities then that’s too bad for both of us.

However, not everyone has this knowledge, and if someone is genuinely interested in what we’re doing at CERN, but they get put off by the jargon that we use then there is a need to explain these terms. It’s always preferable to show someone how to make their own minds up about the jargon we use, rather than being spoon fed a result, which is often what I see in a lot of articles.

Not every post can be as exciting as we’d like, and some of the posts are necessary in order to help explain the more exciting posts. In fact, quite a large number of posts on here are laying the groundwork for something grander, and we refer people back to those posts as we need to. It’s a fine line to walk, and this feedback helps a lot. I hope you keep reading my posts, I’ll be back with more exciting posts!

Aidan:

Hi, I like your positive argument. I am pleased and did not expect less. Will read more of your blogs. Thanx

“

Suppose we have a spectrum that looks like this:”http://blogs.discovermagazine.com/badastronomy/files/2011/12/atlas_higgs_plot.gif

“

But if we look more closely we can see that the” 114 GeV unparticle “looks a little low.” You can’t have it both ways. If you accept the mathematics of statistics, the Higgs’ “signal” is not significant.Hi Al, I agree and I’ve already come out of the closet as a Higgs skeptic (http://www.quantumdiaries.org/2011/09/08/higgs-skeptic/) My mind is slowly being changed as the data come in and the peak seems to get higher, but as I say, there’s still a good chance that there is no Higgs. We’ll see once either one of two things happens, either we get a 5 sigma excess, or every point will be excluded to 95% confidence.

Hi

You got me lined up behind you!

Higgs does not exist as an independent particle. As I explained in many responses on this topic, Allow to label Higgs as ISM “Invisible State of Matter” because it exists as an ultra-rapid state-2-state transition of matter at speeds several folds the speed of light. ISM state lifespan is less than 1.0 X 10-33sec which is technologically infeasible to detect.

I confidently conjecture and predice that CERN’s scientists will one day agree on such findings from their CMS and ATLAS experiments, possibly also from othes.

Well! One way to know, is wait and see.

I am a high school student and I’m very interested in Physics. I have very limited (high school level) knowledge of statistics and the sigma thing has always confused me, but this post clarified a lot!

Thank you!

Thee is a certain asymmetry: surely if you require 5 sigma to accept, then 5 sigma should surely be required to reject?

Hi Gavin, good question! Usually we’re not so stringent about a non discovery, but we do have sigmas for non discovery too. If you take a look at http://blogs.discovermagazine.com/badastronomy/files/2011/12/atlas_higgs_plot.gif you can see that there’s a data point at 135GeV that is two sigma way from the line at 1 (this line is 1 times the Standard Model expected number of events.) This tells us that at 135GeV, if the Higgs exists at this mass point then there’s a two sigma fluctuation down with respect to the Standard Model expectation. Usually we just require ourselves to be 95% confident that a new particle doesn’t exist at a given point. We can never exclude it entirely, so we have to choose a cut-off point. It turns out that 95% confidence is a good compromise between minimizing the probability of getting what are known as Type I and Type II errors, but that’s a whole blog post in itself!

Also, on a more politically motivated point, there are no Nobel prizes for non-discovery

[...] Keep reading… Share this:TwitterFacebookLike this:LikeBe the first to like this post. [...]

Hi,

I am an undergrad student and very much interested in joining a grad school. Thank you for the very clear explanation of sigma. I enjoyed reading it very much. I was wondering if you could also explain the meaning of chi-square, delta chi-square and likelihood in future blogs. These are some of the terms that have always confused me.

Thanks very much!

Hi Rick, sure! Keep an eye on the blog and I’ll try to write a post like that. At the moment we have a very busy conference season and there are already lots of topics and breaking results that need to be covered, so it may take a week or two before I get to it.

[...] de una hipótesis? Nos lo contó Aidan Randle-Conde, “A sigma here, a sigma there…,” Quantum Diaries, 9 May 2012. La palabra “sigma” se refiere a la desviación estándar, denotada por la letra griega del [...]

[...] “A sigma here, a sigma there…” | AIDAN RANDLE-CONDE | Quantum Diaries [...]

[...] de una hipótesis? Nos lo contó Aidan Randle-Conde, “A sigma here, a sigma there…,” Quantum Diaries, 9 May 2012. La palabra “sigma” se refiere a la desviación estándar, denotada por la letra griega del [...]