I had the idea to live blog from the ATLAS control room this past weekend since I was going to be there on shift. But since there was going to be actual work to do and I should be doing it instead of blogging, I decided to not post it live but instead to just type notes into my laptop as I had a chance. I cleaned up the notes a little today, but below is basically what I typed while I was there.
I was supposed to be on shift from 3pm-9pm Geneva time in the ATLAS control room at the Liquid Argon Calorimeter desk. The plan was to detect and record data with the ATLAS detector on muons from cosmic rays. Muons are constantly created in the atmosphere through collisions of cosmic ray particles from space with particles in the atmosphere. The muons travel from the atmosphere all the way down underground to the ATLAS detector and we can detect them. We have been doing this for many months now, with more and more of the ATLAS detector as it is installed. It is a nice test for our equipment before the LHC starts colliding protons inside our detector in a few months, and nice practice for everyone here to operate the detector. Anyway, without further ado…
Saturday, May 24, 2008:
3pm: Start of shift. There are 3 shifters working together which is probably too many in the long term, but for now it’s not so bad for training purposes since a lot of people have little/no experience in the control room.
The 3 of us arrive and meet the 3 people that have been there since 9am on the previous shift. So actually there are 6 people here for a little while.
3:01pm: There is an ongoing problem and the 6 of us will try to figure it out. The problem is that we can see, on one of the monitoring displays, that there is no data coming from one of the parts of the detector. It shows up as a blank spot in a plot that shows the average energy recorded in every channel.
3:10pm: After some investigation it looks like everything was okay yesterday, and some time between midnight and 4 am the data started to be missing. Nobody was here overnight, and as far as we know nobody was working at the time to mess things up, so it’s a mystery.
3:30pm: The old shift crew gives us the rest of the story about the previous shift. They tell us that there has been on-and-off data taking all day, with stops to adjust the configuration from time to time.
3:40pm: New run starts. We are recording data again!
4pm: The old shift crew leaves, with an invitation to join them at karaoke later in the evening.
The other two people on shift with me are another postdoc and a graduate student. Being on shift is a nice chance to get to know your collaborators a little better.
We go through some checklists at the start of the shift to make sure we don’t forget to check on all of the things we are responsible for keeping an eye on. We have several monitors here which display status information on various detectors. Usually green means good, red means bad. If something is red, usually a few clicks brings up more specific information.
4:15pm: The data quality shifter spots the same problem that was spotted by the previous LAr shift crew. We suspect a database problem.
4:30pm: We are searching the log files to figure out the problem.
Meanwhile, we are checking data quality plots to try to see how the data we are currently taking looks. Whenever we are recording data, it is possible to make plots immediately to get instant feedback that the data looks as it should.
We have some bad data from a few Front-End boards here and there. Almost the whole detector looks good.
5pm: We get in touch with the expert who can tell us where to find the log file with the information we need to understand our problem. It turns out we have a misconfiguration of the monitoring software. In order to save memory, only select channels are retrieved from the database, and the channels we need for a piece of the detector weren’t being retrieved.
5:10pm: ATLAS stops taking data to change some configuration in the muon system.
We fill out an entry in the electronic logbook with summary information from the run that ended.
5:30pm: There are some people in Toronto who are helping us out today remotely by looking at the data we are recording, trying to spot any problems that we could then fix here.
5:57pm: We go back and check and it turns out the misconfiguration problem we found earlier has been there for a few weeks, and nobody noticed!
6pm: It is a little bizarre that the data quality shifter noticed the problem at the same time as the liquid argon shifter at almost exactly the same time. independently, even though it had been there for weeks.
6:17pm: We are still learning our way around here in the control room…we find that we can look at plots of data being taking right now (from what we call the current “run”), but we don’t know how to go back and see plots from the previous run that just ended.
6:20pm: A new run has started! ATLAS is recording data again, after a 1 hour, 10 minute pause. Eventually, when everything works well, the pause between runs should be a few minutes at most.
6:40pm: The missing data from earlier is no longer missing in the current run, so that problem is fixed.
6:45pm: We looked at monitoring plots for the current run. It looks even better than the previous one, with less data errors reported.
We are recording about 20 events every second.
7pm: The data that was taken at 5pm is already finished being reconstructed, within 2 hours. That’s pretty quick. Reconstruction is software that takes the raw data, things like energy recorded per channel, and reconstructs what happened in the event, i.e. it tells you how many electrons and muons and other particles there were and details about them. This is supposed to happen automatically here, at a farm of computers. Apparently it is working well.
7:20pm: There is now a discussion on the phone about some alarms we saw earlier on the alarm display concerning the dew point in the counting house, where lots of sensitive electronics are.
7:30pm: ATLAS stops the run to put in a new trigger configuration. This can change what kind and how much data we record.
We see evidence of muons in the histograms we look at!
7:45pm: Dinner time! turkey sandwich and coke (almost the only soda available in Europe) at the desk…
8:10pm: Start new run, just 40 minutes in-between runs that time.
There is some discussion with the project leader for this system, who stopped by to ask how things are going, about the dew point rising in the counting room.
8:25pm: There is a conversation with the run control people about wanting to take a run with a higher trigger rate (they want to try to process data faster to test the data acquisition system). They may need us to make some configuration changes. They want to do that in a little while. They didn’t know that we are supposed to leave at 9pm. They are here until midnight.
8:30pm: One of the muon systems doesn’t see evidence of muons in their monitoring plots, but we do!
9:30pm: Still recording data. The Tile calorimeter people see evidence of muons on one side of their detector, but maybe not the other.
9:40pm: The run ends. It turns out the muon detector plots didn’t show muons because they are looking at plots gathered from data after the lowest level trigger, which is not very pure (mostly noise and not real muons).
9:50pm: A calorimeter expert calls in from his office where he was looking at some data recorded earlier to report some problems he sees in the data. We compare notes and it turns out we had already noted the same problems in our logbook here, so our monitoring is working pretty well.
10:20pm: Filling out electronic logbook entry for the shift
10:30pm: We notice the high voltage tripped on a part of our detector. We consult the boss on the phone and he decides to leave it until tomorrow.
10:40pm: We can go home! It has been a very productive shift, and it felt good to be back in a control room recording data, even if it was not collider data.
I hope this was interesting. I think I’d like to try it again sometime, maybe even really live.