- title: 'Preface'
abstract: 'This is the preface'
volume: 42
URL: https://proceedings.mlr.press/v42/edit14b.html
PDF: http://proceedings.mlr.press/v42/edit14b.pdf
edit: https://github.com/mlresearch//v42/edit/gh-pages/_posts/2015-08-27-edit14b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning'
publisher: 'PMLR'
author:
- given: The
family: Editors
editor:
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balázs
family: Kégl
- given: David
family: Rousseau
address: Montreal, Canada
page: i-v
id: edit14b
issued:
date-parts:
- 2015
- 8
- 27
firstpage: i
lastpage: v
published: 2015-08-27 00:00:00 +0000
- title: 'Real-time data analysis at the LHC: present and future'
abstract: 'The Large Hadron Collider (LHC), which collides protons at an energy of 14 TeV, produces hundreds of exabytes of data per year, making it one of the largest sources of data in the world today. At present it is not possible to even transfer most of this data from the four main particle detectors at the LHC to “offline” data facilities, much less to permanently store it for future processing. For this reason the LHC detectors are equipped with real-time analysis systems, called triggers, which process this volume of data and select the most interesting proton-proton (pp) collisions. The LHC experiment triggers reduce the data produced by the LHC by between 1/1000 and 1/100000, to tens of petabytes per year, allowing its economical storage and further analysis. The bulk of the data-reduction is performed by custom electronics which ignores most of the data in its decision making, and is therefore unable to exploit the most powerful known data analysis strategies. I cover the present status of real-time data analysis at the LHC, before explaining why the future upgrades of the LHC experiments will increase the volume of data which can be sent off the detector and into off-the-shelf data processing facilities (such as CPU or GPU farms) to tens of exabytes per year. This development will simultaneously enable a vast expansion of the physics programme of the LHC’s detectors, and make it mandatory to develop and implement a new generation of real-time multivariate analysis tools in order to fully exploit this new potential of the LHC. I explain what work is ongoing in this direction and motivate why more effort is needed in the coming years.'
volume: 42
URL: https://proceedings.mlr.press/v42/glig14.html
PDF: http://proceedings.mlr.press/v42/glig14.pdf
edit: https://github.com/mlresearch//v42/edit/gh-pages/_posts/2015-08-27-glig14.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning'
publisher: 'PMLR'
author:
- given: Vladimir
family: Gligorov
editor:
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balázs
family: Kégl
- given: David
family: Rousseau
address: Montreal, Canada
page: 1-18
id: glig14
issued:
date-parts:
- 2015
- 8
- 27
firstpage: 1
lastpage: 18
published: 2015-08-27 00:00:00 +0000
- title: 'The Higgs boson machine learning challenge'
abstract: 'The Higgs Boson Machine Learning Challenge (HiggsML or the Challenge for short) was organized to promote collaboration between high energy physicists and data scientists. The ATLAS experiment at CERN provided simulated data that has been used by physicists in a search for the Higgs boson. The Challenge was organized by a small group of ATLAS physicists and data scientists. It was hosted by Kaggle at \urlhttps://www.kaggle.com/c/higgs-boson; the challenge data is now available on \url\opendataLink. This paper provides the physics background and explains the challenge setting, the challenge design, and analyzes its results.'
volume: 42
URL: https://proceedings.mlr.press/v42/cowa14.html
PDF: http://proceedings.mlr.press/v42/cowa14.pdf
edit: https://github.com/mlresearch//v42/edit/gh-pages/_posts/2015-08-27-cowa14.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning'
publisher: 'PMLR'
author:
- given: Claire
family: Adam-Bourdarios
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balàzs
family: Kégl
- given: David
family: Rousseau
editor:
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balázs
family: Kégl
- given: David
family: Rousseau
address: Montreal, Canada
page: 19-55
id: cowa14
issued:
date-parts:
- 2015
- 8
- 27
firstpage: 19
lastpage: 55
published: 2015-08-27 00:00:00 +0000
- title: 'Dissecting the Winning Solution of the HiggsML Challenge'
abstract: 'The recent Higgs Machine Learning Challenge pitted one of the largest crowds seen in machine learning contests against one another. In this paper, we present the winning solution and investigate the effect of extra features, the choice of neural network activation function, regularization and data set size. We demonstrate improved classification accuracy using a very similar network architecture on the permutation invariant MNIST benchmark. Furthermore, we advocate the use of a simple method that lies on the boundary between bagging and cross-validation to both estimate the generalization error and improve accuracy.'
volume: 42
URL: https://proceedings.mlr.press/v42/meli14.html
PDF: http://proceedings.mlr.press/v42/meli14.pdf
edit: https://github.com/mlresearch//v42/edit/gh-pages/_posts/2015-08-27-meli14.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning'
publisher: 'PMLR'
author:
- given: Gábor
family: Melis
editor:
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balázs
family: Kégl
- given: David
family: Rousseau
address: Montreal, Canada
page: 57-67
id: meli14
issued:
date-parts:
- 2015
- 8
- 27
firstpage: 57
lastpage: 67
published: 2015-08-27 00:00:00 +0000
- title: 'Higgs Boson Discovery with Boosted Trees'
abstract: 'The discovery of the Higgs boson is remarkable for its importance in modern Physics research. The next step for physicists is to discover more about the Higgs boson from the data of the Large Hadron Collider (LHC). A fundamental and challenging task is to extract the signal of Higgs boson from background noises. The machine learning technique is one important component in solving this problem. In this paper, we propose to solve the Higgs boson classification problem with a gradient boosting approach. Our model learns ensemble of boosted trees that makes careful tradeoff between classification error and model complexity. Physical meaningful features are further extracted to improve the classification accuracy. Our final solution obtained an \emphAMS of 3.71885 on the private leaderboard, making us the top 2% in the Higgs boson challenge.'
volume: 42
URL: https://proceedings.mlr.press/v42/chen14.html
PDF: http://proceedings.mlr.press/v42/chen14.pdf
edit: https://github.com/mlresearch//v42/edit/gh-pages/_posts/2015-08-27-chen14.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning'
publisher: 'PMLR'
author:
- given: Tianqi
family: Chen
- given: Tong
family: He
editor:
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balázs
family: Kégl
- given: David
family: Rousseau
address: Montreal, Canada
page: 69-80
id: chen14
issued:
date-parts:
- 2015
- 8
- 27
firstpage: 69
lastpage: 80
published: 2015-08-27 00:00:00 +0000
- title: 'Deep Learning, Dark Knowledge, and Dark Matter'
abstract: 'Particle colliders are the primary experimental instruments of high-energy physics. By creating conditions that have not occurred naturally since the Big Bang, collider experiments aim to probe the most fundamental properties of matter and the universe. These costly experiments generate very large amounts of noisy data, creating important challenges and opportunities for machine learning. In this work we use \emphdeep learning to greatly improve the statistical power on three benchmark problems involving: (1) Higgs bosons; (2) supersymmetric particles; and (3) Higgs boson decay modes. This approach increases the expected discovery significance over traditional shallow methods, by 50%, 2%, and 11% respectively. In addition, we explore the use of model compression to transfer information (\emphdark knowledge) from deep networks to shallow networks.'
volume: 42
URL: https://proceedings.mlr.press/v42/sado14.html
PDF: http://proceedings.mlr.press/v42/sado14.pdf
edit: https://github.com/mlresearch//v42/edit/gh-pages/_posts/2015-08-27-sado14.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning'
publisher: 'PMLR'
author:
- given: Peter
family: Sadowski
- given: Julian
family: Collado
- given: Daniel
family: Whiteson
- given: Pierre
family: Baldi
editor:
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balázs
family: Kégl
- given: David
family: Rousseau
address: Montreal, Canada
page: 81-87
id: sado14
issued:
date-parts:
- 2015
- 8
- 27
firstpage: 81
lastpage: 87
published: 2015-08-27 00:00:00 +0000
- title: 'Consistent optimization of AMS by logistic loss minimization'
abstract: 'In this paper, we theoretically justify an approach popular among participants of the Higgs Boson Machine Learning Challenge to optimize approximate median significance (AMS). The approach is based on the following two-stage procedure. First, a real-valued function f is learned by minimizing a surrogate loss for binary classification, such as logistic loss, on the training sample. Then, given f, a threshold \hatθ is tuned on a separate validation sample, by direct optimization of AMS. We show that the regret of the resulting classifier (obtained from thresholding f on \hatθ) measured with respect to the squared AMS, is upperbounded by the regret of f measured with respect to the logistic loss. Hence, we prove that minimizing logistic surrogate is a consistent method of optimizing AMS. '
volume: 42
URL: https://proceedings.mlr.press/v42/kotl14.html
PDF: http://proceedings.mlr.press/v42/kotl14.pdf
edit: https://github.com/mlresearch//v42/edit/gh-pages/_posts/2015-08-27-kotl14.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning'
publisher: 'PMLR'
author:
- given: Wojciech
family: Kotłowski
editor:
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balázs
family: Kégl
- given: David
family: Rousseau
address: Montreal, Canada
page: 99-108
id: kotl14
issued:
date-parts:
- 2015
- 8
- 27
firstpage: 99
lastpage: 108
published: 2015-08-27 00:00:00 +0000
- title: 'Optimization of AMS using Weighted AUC optimized models'
abstract: 'In this paper, we present an approach to deal with the maximization of the approximate median discovery significance (AMS) in high energy physics. This paper proposes the maximization of the Weighted AUC as a criterion to train different models and the subsequent creation of an ensemble that maximizes the AMS. The algorithm described in this paper was our solution for the Higgs Boson Machine Learning Challenge and we complement this paper describing the preprocessing of the dataset, the training procedure and the experimental results that our model obtained in the challenge. This approach has proven its good performance finishing in ninth place among the solutions of 1785 teams.'
volume: 42
URL: https://proceedings.mlr.press/v42/diaz14.html
PDF: http://proceedings.mlr.press/v42/diaz14.pdf
edit: https://github.com/mlresearch//v42/edit/gh-pages/_posts/2015-08-27-diaz14.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning'
publisher: 'PMLR'
author:
- given: Roberto
family: Díaz-Morales
- given: Ángel
family: Navia-Vázquez
editor:
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balázs
family: Kégl
- given: David
family: Rousseau
address: Montreal, Canada
page: 109-127
id: diaz14
issued:
date-parts:
- 2015
- 8
- 27
firstpage: 109
lastpage: 127
published: 2015-08-27 00:00:00 +0000
- title: 'Weighted Classification Cascades for Optimizing Discovery Significance in the HiggsML
Challenge'
abstract: 'We introduce a minorization-maximization approach to optimizing common measures of discovery significance in high energy physics. The approach alternates between solving a weighted binary classification problem and updating class weights in a simple, closed-form manner. Moreover, an argument based on convex duality shows that an improvement in weighted classification error on any round yields a commensurate improvement in discovery significance. We complement our derivation with experimental results from the 2014 Higgs boson machine learning challenge.'
volume: 42
URL: https://proceedings.mlr.press/v42/mack14.html
PDF: http://proceedings.mlr.press/v42/mack14.pdf
edit: https://github.com/mlresearch//v42/edit/gh-pages/_posts/2015-08-27-mack14.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning'
publisher: 'PMLR'
author:
- given: Lester
family: Mackey
- given: Jordan
family: Bryan
- given: Man Yue
family: Mo
editor:
- given: Glen
family: Cowan
- given: Cécile
family: Germain
- given: Isabelle
family: Guyon
- given: Balázs
family: Kégl
- given: David
family: Rousseau
address: Montreal, Canada
page: 129-134
id: mack14
issued:
date-parts:
- 2015
- 8
- 27
firstpage: 129
lastpage: 134
published: 2015-08-27 00:00:00 +0000