• John
  • Felde
  • University of Maryland
  • USA

Latest Posts

  • USLHC
  • USLHC
  • USA

  • James
  • Doherty
  • Open University
  • United Kingdom

Latest Posts

  • Andrea
  • Signori
  • Nikhef
  • Netherlands

Latest Posts

  • CERN
  • Geneva
  • Switzerland

Latest Posts

  • Aidan
  • Randle-Conde
  • Université Libre de Bruxelles
  • Belgium

Latest Posts

  • TRIUMF
  • Vancouver, BC
  • Canada

Latest Posts

  • Laura
  • Gladstone
  • MIT
  • USA

Latest Posts

  • Steven
  • Goldfarb
  • University of Michigan

Latest Posts

  • Fermilab
  • Batavia, IL
  • USA

Latest Posts

  • Seth
  • Zenz
  • Imperial College London
  • UK

Latest Posts

  • Nhan
  • Tran
  • Fermilab
  • USA

Latest Posts

  • Alex
  • Millar
  • University of Melbourne
  • Australia

Latest Posts

  • Ken
  • Bloom
  • USLHC
  • USA

Latest Posts

Zeynep Isvan | Brookhaven | USA

View Blog | Read Bio

Whose data is it, anyway?

When I was in elementary school we read a story that went something like this. The protagonist’s grandmother was baking a cake for his tenth birthday. Before the big day she told him that she had a surprise: she was going to serve a cake prepared by a thousand people! The kid could hardly wait to see this enormous cake, baked by all those people in honor of his coming of age. When the family arrived grandma brought out the cake – a nine inch layer cake with ten candles. Disappointed, the kid protested that this was not the thousand-baker cake he was led to expect. Then comes the moral of the story, where grandma explains that indeed a thousand people contributed to the making of this cake. Someone had to grow the wheat, another had to mill it into flour. Then there was the milk and the butter and the sugar. Then the people who built the mixer and the oven and the cookware. You get the idea.

This is partly the reason why high energy physics collaborations’ publications have author lists with as many as thousands of people. There is usually over a decade’s worth of work by hundreds of people from conception to the beginning of an experiment. Then these experiments run anywhere from a few to tens of years. Generations of graduate students complete their dissertation work on an experiment. Each has a few people in the control room 24/7 monitoring the operation of every component. Countless technicians maintain the hardware. There are vast computing resources at labs and universities. Each of the groups and people I listed who contributes to the experiment is an author on publications; he gets to comment on what said publications state and how it’s phrased. (This last part is no fun. Perhaps more on that later when the paper I’m writing is submitted.)

The tradition of high energy physics, you’ll say, is very admirable since it credits all of these individuals who collaborate on the science. True, HEP (and science in general) has a very strict understanding of giving credit where it’s due and respecting intellectual property. However, we also have a very strong sense of possession when it comes to our data. We, the thousand some collaborators, designed, built, ran this experiment. We will analyze every last bit of data and publish the results with our names on it. We might even calculate what not to publish so that other scientists don’t read our journal article and proceed to do an analysis that we haven’t yet done but intend to do in the future.

This is not entirely analogous to the cake story above, but I’ll try to draw the parallel that I find interesting. In HEP, the thousand collaborators are the one baking grandma. And they all get credit, always. The unnamed contributors whose roles are indirect and difficult to quantify are the rest of us. Universities are knowledge and research hubs that enhance science as a whole, independent of the actual number of professors and students working on one particular experiment. Without the rest of the university community, the handful of people in one field in one department, such as high energy physics, wouldn’t really exist. Same goes for national labs. It further applies to smaller universities overseas who produce quite a lot of the researchers that work on these large experiments. We’re a large community in the knowledge-making business whose boundaries are blurry. So whose is the vast amount of data we generate everyday?

This is a somewhat controversial subject (one which an un-tenured scientist wiser than myself might avoid), but I find it necessary to debate the ownership of data. All of these experiments are funded by the government, therefore by the taxpayer. Science benefits from scrutiny and from transparency. On the other hand, science values being the first to discover something above all else. And science needs expertise, something which those who designed and ran an experiment for years will have a lot more of than a distant colleague looking at an unfamiliar set of data.

What do we do then? Do we want to be the best baker in town with the most sought after cake at all cost? Do we take no interest in who bakes the cake as long as it’s the best cake possible? Is a compromise possible? I think the metric should be the quality of science itself and the speed with which it progresses, and while familiarity with and expertise of an experiment are highly important, ownership shouldn’t be.

Share