Real-time data analysis at the LHC: present and future
Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning, PMLR 42:1-18, 2015.
The Large Hadron Collider (LHC), which collides protons at an energy of 14 TeV, produces hundreds of exabytes of data per year, making it one of the largest sources of data in the world today. At present it is not possible to even transfer most of this data from the four main particle detectors at the LHC to “offline” data facilities, much less to permanently store it for future processing. For this reason the LHC detectors are equipped with real-time analysis systems, called triggers, which process this volume of data and select the most interesting proton-proton (pp) collisions. The LHC experiment triggers reduce the data produced by the LHC by between 1/1000 and 1/100000, to tens of petabytes per year, allowing its economical storage and further analysis. The bulk of the data-reduction is performed by custom electronics which ignores most of the data in its decision making, and is therefore unable to exploit the most powerful known data analysis strategies. I cover the present status of real-time data analysis at the LHC, before explaining why the future upgrades of the LHC experiments will increase the volume of data which can be sent off the detector and into off-the-shelf data processing facilities (such as CPU or GPU farms) to tens of exabytes per year. This development will simultaneously enable a vast expansion of the physics programme of the LHC’s detectors, and make it mandatory to develop and implement a new generation of real-time multivariate analysis tools in order to fully exploit this new potential of the LHC. I explain what work is ongoing in this direction and motivate why more effort is needed in the coming years.