A colleague of mine is an avid fan of the New York Yankees baseball team. At a meeting a few years ago, when the Yankees had finished first in the American league regular season, I pointed out to him that the result was not statistically significant. He did not take kindly to the suggestion. He actually got rather angry! A person, who in his professional life would scorn anyone for publishing a one sigma effect, was crowing about a one sigma effect for his favorite sports team. But then most people do ignore the effect of statistical fluctuations in sports.
In sports, there is a random effect in who wins or loses. The best team does not always win. In baseball where two teams will frequently play each other four games in a row over three or four days, it is relatively uncommon for one team to win all four games. Similarly a team at the top of the standings does not always beat a team lower down. As they say in sports: on any given day, anything can happen. Indeed it can and frequently does.
Let us return to American baseball. Each team plays 162 games during the regular season. If the results were purely statistical with each team having a 50% chance of winning any given game, then we would expect a normal distribution of the results with a spread of sigma=6.3 games. The actual spread or standard deviation for the last few seasons is closer to 11 games. Thus slightly more than half the spread in games won and lost is due to statistical fluctuations. Moving from the collective spread to the performance of individual teams, if a team wins the regular season by six games or one sigma, as with the Yankees above, there is a one in three chance that it is purely a statistical fluke. For a two-sigma effect, a team would have to win by twelve games or by eighteen games for a three-sigma effect. The latter would give over 99% confidence that the winner won justly, not due to a statistical fluctuation. When was the last time any team won by eighteen games? For particle physics we require an even higher standard–a five sigma effect to claim a discovery. Thus a team would have to lead by 30 games to meet this criterion. Now my colleague from the first paragraph suggested that by including more seasons the results become more significant. He was right of course. If the Yankees finished ahead by six games for thirty-four seasons in a row that would be five-sigma effect. From this we can also see why sports results are never published in Physical Review with its five-sigma threshold for a discovery–there has yet to be such a discovery. To make things worse for New York Yankees’ fans they have already lost their chance for an undefeated season this year.
In other sports the statistics are even worse. In the National Hockey League (NHL) teams play eighty-two games and the spread in win-loss expected from pure chance is sigma=4.5. The actual distribution for last year was 6.3 sigma. The signal due the difference in the individual teams’ ability is all in the 1.8 sigma difference. Perhaps there is more parity in the NHL than in Major League Baseball. Or perhaps there is not enough statistics to tell. Speaking of not telling. Last year the Vancouver Canucks finished with the best record for the regular season, two games ahead of the New York Rangers and three games ahead of the St. Louis Blues. Only a fool or a Vancouver Canucks fan would think this ordering was significant and not just a statistical fluctuation. In the National Football League last year, 14 of the 32 teams were within two sigma of the top. Again much of the spread was statistical. It was purely a statistical fluke that the New England Patriots did not win the super bowl as they should have.
Playoffs are even worse (this is why the Canucks have never won a Stanley Cup). Consider a best of seven game series. Even if the two teams are equal, we would expect that the series would only go four games one in every eight (two cubed) series. When a series goes the full seven games one might as well flip a coin. Rare events, like one team winning the first three games and losing the last four, are expect to happen once in every sixty-four series and considering the number of series being played it is not surprising we see them occasionally.
Probably the worst example of playoff madness is the American college basketball tournament called, appropriately enough, March Madness. Starting with 64 teams or 68 depending on how you count, the playoffs proceed through a single elimination tournament. With over 70 games it is not surprising that strange things happen. One of the strangest would be that the best team wins. To win the title the best team would have to win six straight games. If the best team has on average a 70% chance of winning each game they would only have a 12% chance of winning the tournament. Perhaps it would be better if they just voted on who is best.
But you say they would never decide a national championship based on a vote. Consider American college football. Now that is a multi-million dollar enterprise! Nobel Laureates do not get paid as much as US college football coaches. They do not generate as much money either. So what is more important to American universities–sports or science?
In the past, the US college national football champions were decided by a vote of some combination of sports writers, coaches and computers. Now that combination only decides who will play in the championship game. The national champion is ultimately decided by who wins that one final game. Is that better than the old system? More exciting but as they say: on any given day anything can happen. Besides sports is more about deciding winners and losers rather than who is best.
To receive a notice of future posts follow me on Twitter: @musquod.
 With the expected frequency of course.
 Not two to the fourth power because one of the two teams has to win the first game and that team has to win the next three games.