LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data

Peer Nagy, Sascha Yves Frey, Kang Li, Bidipta Sarkar, Svitlana Vyetrenko, Stefan Zohren, Ani Calinescu, Jakob Nicolaus Foerster
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:45437-45460, 2025.

Abstract

While financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB) in the LOBSTER format. Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data, supporting flexible multivariate statistical evaluation. The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times, along with scores from a trained discriminator network. Lastly, LOB-Bench contains "market impact metrics", i.e. the cross-correlations and price response functions for specific events in the data. We benchmark generative autoregressive state-space models, a (C)GAN, as well as a parametric LOB model and find that the autoregressive GenAI approach beats traditional model classes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-nagy25a, title = {{LOB}-Bench: Benchmarking Generative {AI} for Finance - an Application to Limit Order Book Data}, author = {Nagy, Peer and Frey, Sascha Yves and Li, Kang and Sarkar, Bidipta and Vyetrenko, Svitlana and Zohren, Stefan and Calinescu, Ani and Foerster, Jakob Nicolaus}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {45437--45460}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/nagy25a/nagy25a.pdf}, url = {https://proceedings.mlr.press/v267/nagy25a.html}, abstract = {While financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB) in the LOBSTER format. Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data, supporting flexible multivariate statistical evaluation. The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times, along with scores from a trained discriminator network. Lastly, LOB-Bench contains "market impact metrics", i.e. the cross-correlations and price response functions for specific events in the data. We benchmark generative autoregressive state-space models, a (C)GAN, as well as a parametric LOB model and find that the autoregressive GenAI approach beats traditional model classes.} }
Endnote
%0 Conference Paper %T LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data %A Peer Nagy %A Sascha Yves Frey %A Kang Li %A Bidipta Sarkar %A Svitlana Vyetrenko %A Stefan Zohren %A Ani Calinescu %A Jakob Nicolaus Foerster %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-nagy25a %I PMLR %P 45437--45460 %U https://proceedings.mlr.press/v267/nagy25a.html %V 267 %X While financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB) in the LOBSTER format. Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data, supporting flexible multivariate statistical evaluation. The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times, along with scores from a trained discriminator network. Lastly, LOB-Bench contains "market impact metrics", i.e. the cross-correlations and price response functions for specific events in the data. We benchmark generative autoregressive state-space models, a (C)GAN, as well as a parametric LOB model and find that the autoregressive GenAI approach beats traditional model classes.
APA
Nagy, P., Frey, S.Y., Li, K., Sarkar, B., Vyetrenko, S., Zohren, S., Calinescu, A. & Foerster, J.N.. (2025). LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:45437-45460 Available from https://proceedings.mlr.press/v267/nagy25a.html.

Related Material