NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery

Anurag J. Vaidya, Felix Meissen, Daniel C. Castro, Shruthi Bannur, Tristan Lazard, Drew F. K. Williamson, Faisal Mahmood, Javier Alvarez-Valle, Stephanie L. Hyland, Kenza Bouzid
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:310-349, 2026.

Abstract

Histopathology image analysis involves time-intensive and specialized workflows, limiting its accessibility. We introduce Nova, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. Nova integrates 49 domain-specific tools (e.g., nuclei segmentation, whole-slide encoding) built on open-source software, and can also create new tools ad hoc. To evaluate such systems, we present SlideQuest, a 90-question benchmark, verified by pathologists and biomedical scientists, spanning data processing, quantitative analysis, and hypothesis testing. Unlike prior biomedical benchmarks focused on knowledge recall or diagnostic QA, SlideQuest demands multi-step reasoning, iterative coding, and computational problem solving. Quantitative evaluation shows Nova outperforms coding-agent baselines, and a pathologist-verified case study links morphology to prognostically relevant PAM50 subtypes, demonstrating its discovery potential.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-vaidya26a, title = {{NOVA}: An Agentic Framework for Automated Histopathology Analysis and Discovery}, author = {Vaidya, Anurag J. and Meissen, Felix and Castro, Daniel C. and Bannur, Shruthi and Lazard, Tristan and Williamson, Drew F. K. and Mahmood, Faisal and Alvarez-Valle, Javier and Hyland, Stephanie L. and Bouzid, Kenza}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {310--349}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/vaidya26a/vaidya26a.pdf}, url = {https://proceedings.mlr.press/v297/vaidya26a.html}, abstract = {Histopathology image analysis involves time-intensive and specialized workflows, limiting its accessibility. We introduce Nova, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. Nova integrates 49 domain-specific tools (e.g., nuclei segmentation, whole-slide encoding) built on open-source software, and can also create new tools ad hoc. To evaluate such systems, we present SlideQuest, a 90-question benchmark, verified by pathologists and biomedical scientists, spanning data processing, quantitative analysis, and hypothesis testing. Unlike prior biomedical benchmarks focused on knowledge recall or diagnostic QA, SlideQuest demands multi-step reasoning, iterative coding, and computational problem solving. Quantitative evaluation shows Nova outperforms coding-agent baselines, and a pathologist-verified case study links morphology to prognostically relevant PAM50 subtypes, demonstrating its discovery potential.} }
Endnote
%0 Conference Paper %T NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery %A Anurag J. Vaidya %A Felix Meissen %A Daniel C. Castro %A Shruthi Bannur %A Tristan Lazard %A Drew F. K. Williamson %A Faisal Mahmood %A Javier Alvarez-Valle %A Stephanie L. Hyland %A Kenza Bouzid %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-vaidya26a %I PMLR %P 310--349 %U https://proceedings.mlr.press/v297/vaidya26a.html %V 297 %X Histopathology image analysis involves time-intensive and specialized workflows, limiting its accessibility. We introduce Nova, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. Nova integrates 49 domain-specific tools (e.g., nuclei segmentation, whole-slide encoding) built on open-source software, and can also create new tools ad hoc. To evaluate such systems, we present SlideQuest, a 90-question benchmark, verified by pathologists and biomedical scientists, spanning data processing, quantitative analysis, and hypothesis testing. Unlike prior biomedical benchmarks focused on knowledge recall or diagnostic QA, SlideQuest demands multi-step reasoning, iterative coding, and computational problem solving. Quantitative evaluation shows Nova outperforms coding-agent baselines, and a pathologist-verified case study links morphology to prognostically relevant PAM50 subtypes, demonstrating its discovery potential.
APA
Vaidya, A.J., Meissen, F., Castro, D.C., Bannur, S., Lazard, T., Williamson, D.F.K., Mahmood, F., Alvarez-Valle, J., Hyland, S.L. & Bouzid, K.. (2026). NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:310-349 Available from https://proceedings.mlr.press/v297/vaidya26a.html.

Related Material