TempoQL: A Readable, Precise, and Portable Query System for Electronic Health Record Data

Ziyong Ma, Richard D. Boyce, Adam Perer, Venkatesh Sivaraman
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:399-423, 2026.

Abstract

Electronic health record ({EHR}) data is an essential data source for machine learning for health, but researchers and clinicians face steep barriers in extracting and validating {EHR} data for modeling. Existing tools incur trade-offs between expressivity and usability and are typically specialized to a single data standard, making it difficult to write temporal queries that are ready for modern model-building pipelines and adaptable to new datasets. This paper introduces {TempoQL}, a Python-based toolkit designed to lower these barriers. {TempoQL} provides a simple, human-readable language for temporal queries; support for multiple {EHR} data standards, including {OMOP}, {MEDS}, and others; and an interactive notebook-based query interface with optional large language model ({LLM}) authoring assistance. Through a performance evaluation and two use cases on different datasets, we demonstrate that {TempoQL} simplifies the creation of cohorts for machine learning while maintaining precision, speed, and reproducibility.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-ma26a, title = {{TempoQL}: A Readable, Precise, and Portable Query System for Electronic Health Record Data}, author = {Ma, Ziyong and Boyce, Richard D. and Perer, Adam and Sivaraman, Venkatesh}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {399--423}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/ma26a/ma26a.pdf}, url = {https://proceedings.mlr.press/v297/ma26a.html}, abstract = {Electronic health record ({EHR}) data is an essential data source for machine learning for health, but researchers and clinicians face steep barriers in extracting and validating {EHR} data for modeling. Existing tools incur trade-offs between expressivity and usability and are typically specialized to a single data standard, making it difficult to write temporal queries that are ready for modern model-building pipelines and adaptable to new datasets. This paper introduces {TempoQL}, a Python-based toolkit designed to lower these barriers. {TempoQL} provides a simple, human-readable language for temporal queries; support for multiple {EHR} data standards, including {OMOP}, {MEDS}, and others; and an interactive notebook-based query interface with optional large language model ({LLM}) authoring assistance. Through a performance evaluation and two use cases on different datasets, we demonstrate that {TempoQL} simplifies the creation of cohorts for machine learning while maintaining precision, speed, and reproducibility.} }
Endnote
%0 Conference Paper %T TempoQL: A Readable, Precise, and Portable Query System for Electronic Health Record Data %A Ziyong Ma %A Richard D. Boyce %A Adam Perer %A Venkatesh Sivaraman %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-ma26a %I PMLR %P 399--423 %U https://proceedings.mlr.press/v297/ma26a.html %V 297 %X Electronic health record ({EHR}) data is an essential data source for machine learning for health, but researchers and clinicians face steep barriers in extracting and validating {EHR} data for modeling. Existing tools incur trade-offs between expressivity and usability and are typically specialized to a single data standard, making it difficult to write temporal queries that are ready for modern model-building pipelines and adaptable to new datasets. This paper introduces {TempoQL}, a Python-based toolkit designed to lower these barriers. {TempoQL} provides a simple, human-readable language for temporal queries; support for multiple {EHR} data standards, including {OMOP}, {MEDS}, and others; and an interactive notebook-based query interface with optional large language model ({LLM}) authoring assistance. Through a performance evaluation and two use cases on different datasets, we demonstrate that {TempoQL} simplifies the creation of cohorts for machine learning while maintaining precision, speed, and reproducibility.
APA
Ma, Z., Boyce, R.D., Perer, A. & Sivaraman, V.. (2026). TempoQL: A Readable, Precise, and Portable Query System for Electronic Health Record Data. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:399-423 Available from https://proceedings.mlr.press/v297/ma26a.html.

Related Material