Well my frequency of blogging has fallen off quite a bit recently, and all this has to do with the impending doom of having to try to finish my analysis and graduate sometime soon!
Needless to say this is a very trying time in any graduate student’s life, and is proving to be a hard one for me too.
One interesting thing that is just a forgone conclusion in the world of particle physics and I thought might be of general interest to the rest of the world is just how hard it can be to do something as simple as “Get to the data.”
Granted, I am currently part of a very well aged collaboration with a lot of on site expertise and plenty of students have gone before me down this path…however the same problem still persists…NO ONE WRITES ANYTHING DOWN!
Documentation, it would seem, has not and will never be a physicists strong suit. A process that to put into words, “I just need to access the full data set and then run my analysis scripts over them”, is wrought with danger. Some of CDF code is FORTRAN wrapped with C++ that uses ac++ to access the data which is written to tape, then put into a handy form known as an NTuple. All of this then has to be checked for quality and “known” bugs (I say ‘known’ because often it isn’t written anywhere…you just have to know) then validated, corrected, and checked again.
I found an image describing the data flow for us end users…DO YOU SEE ALL THE ARROWS!!!
So what does it take to get to the data:
1) Expertise, which on CDF I am very lucky to have a lot around me and can’t thank all those who respond to my worried emails
2) The ability to troll code looking for the right module or magical incantation to say so your code compiles.
3) Sample code from those who went before you…it’s true that they found things out that worked and didn’t work. Never wrote it down or commented their code, and you are left to stare at line after line saying things like “Why the hell did they do it like that?!?!” Never knowing that this arrangement is the only way the thing works
4) And, a lot of time. I think it was Feynmann who quoted about physics that what you need is a lot of uninterrupted time to think very deeply about things. This is true when melding your code into something useful and tracking down every error and library link you need…lots of uninterrupted time.
So, time to go back to more of that…I wonder if other scientists from other fields suffer the way particle physicists tend to suffer when trying to read the data collected by our amazing and complex machines?!