Data preservation survey October 30, 2008

Today I found twenty free minutes to fill out a questionnaire about data preservation in the CERN web site.

I have always considered the issue of data archiving very important, even crucial for the advancement of Science. I find it appalling that huge moneys are invested in building and operating large particle physics experiments, with no clear plan about what to do with the data, once the experiments close down.

There are several reasons why one needs to ensure that data is preserved.

  1. First and foremost, we do not burn books. Why should we dispose of data files we took so much care, years of efforts and the work of thousands of people, to put together ?
  2. Old data is potentially crucial to confirm new results, to disprove others, to compare to them. I hope I can give a very clear example of what I mean, this evening, when I will discuss a new result by CDF which is potentially groundbreaking, and which might be tested with older data from CDF itself as well as other hadron collider experiments.
  3. Old data may be invaluable as a laboratory to train new scientists. The number of Ph.D. students working on LHC experiments who have never ever worked with real data is disturbing: can we train them on Monte Carlo simulations ? Sure, but it is not the same thing, not really.
  4. The data is ultimately a world heritage. I maintain that it is not the property of this or that collaboration. The people lucky enough to have been given the privilege of analyzing data from high-energy physics experiments have done so thanks to the funds provided by whole countries. The data -being a form of distilled knowledge- are owned by the peoples, and I am sorry if I sound like a communist here. If you do not agree, it is you who look like a fool to me.

So, if you have a wish to provide your input in this important issue, why don’t you take the time to visit the site http://cern.ch/data-preservation-survey ?