Last week’s press release Fermilab about the latest Higgs search results, describing the statistical significance of the excess events, said:
Physicists claim evidence of a new particle only if the probability that the data could be due to a statistical fluctuation is less than 1 in 740, or three sigmas. A discovery is claimed only if that probability is less than 1 in 3.5 million, or five sigmas.
This actually contains a rather common error — not in how we present scientific results, but in how we explain them to the public. Here’s the issue:
Wrong: “the probability that the data could be due to a statistical fluctuation”
Right: “the probability that, were there no Higgs at all, a statistical fluctuation that could explain our data would occur”
Obviously the first sentence fragment is easier to read — sorry! — but, really, what’s the difference? Well, if the only goal is to give a qualitative idea of the statistical power of the measurement, it likely doesn’t matter at all. But technically it’s not the same, and in unusual cases things could be quite different. My edited (“right”) sentence fragment is only a statement about what could happen in a particular model of reality (in this case, the Standard Model without the Higgs boson). The mistaken fragment implies that we know the likelihood of different possible models actually being true, based on our measurement. But there’s no way to make such a statement based on only one measurement; we’d need to include some of our prior knowledge of which models are likely to be right.
Why is that? Well, consider the difference between two measurements, one of which observed the top quark with 5 sigma significance and the other of which found that neutrinos go faster than light with 5 sigma significance. If “5 sigma significance” really meant “the probability that the data could be due to a statistical fluctuation,” then we would logically find both analyses equally believable if they were done equally carefully. But that’s not how those two measurements were received, because the real interpretation of “5 sigma” is as the likelihood that we would get a measurement like this if the conclusion were false. We were expecting the top quark, so it’s a lot more believable that the excess is associated with the top quark than with an incredibly unlikely fluctuation. But we have many reasons to believe neutrinos can’t go faster than light, so we would sooner believe that an incredibly unlikely fluctuation had happened than that the measurement was correct.
Isn’t it bad that we’d let our prior beliefs bias whether we think measurements are right or not? No, not as long as we don’t let them bias the results we present. It’s perfectly fair to say, as OPERA did, that they were compelled to publish their results but thought they were likely wrong. Ultimately, the scientific community does reach conclusions about which “reality” is more correct on a particular question — but one measurement usually can’t do it alone.
 For what it’s worth, I actually spent a while thinking and chatting about how to make the second sentence fragment simpler, while preserving the essential difference between the two. In this quest for simplicity, I’ve left off any mention of gaussian distributions, the fact that we really give the chance of a statistical fluctuation as large or larger than our excess, the phrase “null hypothesis,” and doubtless other things as well. I can only hope I’ve hit that sweet spot where experts think I’ve oversimplified to the point of incorrectness, while non-expert readers still think it’s completely unreadable.
 The consensus among experimental particle physicists is that it’s not wise to include prior knowledge explicitly in the statistical conclusions of our papers. Not everyone agrees; the debate is between Frequentist and Bayesian statistics, and a detailed discussion is beyond the scope of both this blog entry and my own knowledge. A wider discussion of the issues in this entry, from a Bayesian perspective, can be found in this preprint by G. D’Agostini. I certainly don’t agree with all of the preprint, but I do owe it a certain amount of thanks for help in clarifying my thinking.
 A systematic mistake in the result, or in the calculation of uncertainties, would be an even likelier suspect.