[edit]
TempoQL: A Readable, Precise, and Portable Query System for Electronic Health Record Data
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:399-423, 2026.
Abstract
Electronic health record ({EHR}) data is an essential data source for machine learning for health, but researchers and clinicians face steep barriers in extracting and validating {EHR} data for modeling. Existing tools incur trade-offs between expressivity and usability and are typically specialized to a single data standard, making it difficult to write temporal queries that are ready for modern model-building pipelines and adaptable to new datasets. This paper introduces {TempoQL}, a Python-based toolkit designed to lower these barriers. {TempoQL} provides a simple, human-readable language for temporal queries; support for multiple {EHR} data standards, including {OMOP}, {MEDS}, and others; and an interactive notebook-based query interface with optional large language model ({LLM}) authoring assistance. Through a performance evaluation and two use cases on different datasets, we demonstrate that {TempoQL} simplifies the creation of cohorts for machine learning while maintaining precision, speed, and reproducibility.