Group-Sparse Manifold-Aware Integrated Gradients for Multimodal Transformers on EHR Trajectories

Ali Amirahmadi; Farzaneh Etminani; Mattias Ohlsson

Group-Sparse Manifold-Aware Integrated Gradients for Multimodal Transformers on EHR Trajectories

Ali Amirahmadi, Farzaneh Etminani, Mattias Ohlsson

Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:740-758, 2026.

Abstract

Integrated Gradients ({IG}) is a popular method for explaining clinical deep models—including widely used multimodal, pretrained Transformers—but its utility on {EHR} code sequences is hampered by (i) the lack of principled baselines for sequence of discrete tokens and (ii) dense, hard-to-interpret generated attributions. To address both, first, we introduce a manifold-aware baseline: the mean input embedding (computed on the validation set), which keeps {IG}’s interpolated points close to typical sequences in embedding space. Second, we introduce {GS-IG}, which preserves the straight path geometry but re-parameterizes the schedule $\alpha(t) = t^\theta$ and selects $\theta$ per input by minimizing a token-level $\ell_{2,1}$ (group-sparsity) objective, producing concise, practitioner-friendly explanations. On {MIMIC-IV} (incident heart failure) and {MDC} (early mortality), the manifold-aware baseline improves faithfulness (higher Comprehensiveness, lower Sufficiency), and {GS-IG} reduces token-level $\ell_{2,1}$ by 9–18% with negligible change in those metrics on the manifold-aware baseline. The method is lightweight and yields faithful, sparse, and actionable explanations.

Cite this Paper

BibTeX

@InProceedings{pmlr-v297-amirahmadi26a,
  title = 	 {Group-Sparse Manifold-Aware Integrated Gradients for Multimodal Transformers on EHR Trajectories},
  author =       {Amirahmadi, Ali and Etminani, Farzaneh and Ohlsson, Mattias},
  booktitle = 	 {Proceedings of the Fifth Machine Learning for Health Symposium},
  pages = 	 {740--758},
  year = 	 {2026},
  editor = 	 {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush},
  volume = 	 {297},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--14 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v297/main/assets/amirahmadi26a/amirahmadi26a.pdf},
  url = 	 {https://proceedings.mlr.press/v297/amirahmadi26a.html},
  abstract = 	 {Integrated Gradients ({IG}) is a popular method for explaining clinical deep models—including widely used multimodal, pretrained Transformers—but its utility on {EHR} code sequences is hampered by (i) the lack of principled baselines for sequence of discrete tokens and (ii) dense, hard-to-interpret generated attributions. To address both, first, we introduce a manifold-aware baseline: the mean input embedding (computed on the validation set), which keeps {IG}’s interpolated points close to typical sequences in embedding space. Second, we introduce {GS-IG}, which preserves the straight path geometry but re-parameterizes the schedule $\alpha(t) = t^\theta$ and selects $\theta$ per input by minimizing a token-level $\ell_{2,1}$ (group-sparsity) objective, producing concise, practitioner-friendly explanations. On {MIMIC-IV} (incident heart failure) and {MDC} (early mortality), the manifold-aware baseline improves faithfulness (higher Comprehensiveness, lower Sufficiency), and {GS-IG} reduces token-level $\ell_{2,1}$ by 9–18% with negligible change in those metrics on the manifold-aware baseline. The method is lightweight and yields faithful, sparse, and actionable explanations.}
}

Endnote

%0 Conference Paper
%T Group-Sparse Manifold-Aware Integrated Gradients for Multimodal Transformers on EHR Trajectories
%A Ali Amirahmadi
%A Farzaneh Etminani
%A Mattias Ohlsson
%B Proceedings of the Fifth Machine Learning for Health Symposium
%C Proceedings of Machine Learning Research
%D 2026
%E Peniel Argaw
%E Haoran Zhang
%E Sarah Jabbour
%E Payal Chandak
%E Jerry Ji
%E Sumit Mukherjee
%E Olawale Salaudeen
%E Trenton Chang
%E Elizabeth Healey
%E Fabian Gröger
%E Amin Adibi
%E Stefan Hegselmann
%E Benjamin Wild
%E Ayush Noori	
%F pmlr-v297-amirahmadi26a
%I PMLR
%P 740--758
%U https://proceedings.mlr.press/v297/amirahmadi26a.html
%V 297
%X Integrated Gradients ({IG}) is a popular method for explaining clinical deep models—including widely used multimodal, pretrained Transformers—but its utility on {EHR} code sequences is hampered by (i) the lack of principled baselines for sequence of discrete tokens and (ii) dense, hard-to-interpret generated attributions. To address both, first, we introduce a manifold-aware baseline: the mean input embedding (computed on the validation set), which keeps {IG}’s interpolated points close to typical sequences in embedding space. Second, we introduce {GS-IG}, which preserves the straight path geometry but re-parameterizes the schedule $\alpha(t) = t^\theta$ and selects $\theta$ per input by minimizing a token-level $\ell_{2,1}$ (group-sparsity) objective, producing concise, practitioner-friendly explanations. On {MIMIC-IV} (incident heart failure) and {MDC} (early mortality), the manifold-aware baseline improves faithfulness (higher Comprehensiveness, lower Sufficiency), and {GS-IG} reduces token-level $\ell_{2,1}$ by 9–18% with negligible change in those metrics on the manifold-aware baseline. The method is lightweight and yields faithful, sparse, and actionable explanations.

APA

Amirahmadi, A., Etminani, F. & Ohlsson, M.. (2026). Group-Sparse Manifold-Aware Integrated Gradients for Multimodal Transformers on EHR Trajectories. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:740-758 Available from https://proceedings.mlr.press/v297/amirahmadi26a.html.

Group-Sparse Manifold-Aware Integrated Gradients for Multimodal Transformers on EHR Trajectories

Abstract

Cite this Paper

Related Material