Extend and Explain: Interpreting Very Long Language Models

Joel Stremmel; Brian L. Hill; Jeffrey Hertzberg; Jaime Murillo; Llewelyn Allotey; Eran Halperin

Extend and Explain: Interpreting Very Long Language Models

Joel Stremmel, Brian L. Hill, Jeffrey Hertzberg, Jaime Murillo, Llewelyn Allotey, Eran Halperin

Proceedings of the 2nd Machine Learning for Health symposium, PMLR 193:218-258, 2022.

Abstract

While Transformer language models (LMs) are state-of-the-art for information extraction, long text introduces computational challenges requiring suboptimal preprocessing steps or alternative model architectures. Sparse attention LMs can represent longer sequences, overcoming performance hurdles. However, it remains unclear how to explain predictions from these models, as not all tokens attend to each other in the self-attention layers, and long sequences pose computational challenges for explainability algorithms when runtime depends on document length. These challenges are severe in the medical context where documents can be very long, and machine learning (ML) models must be auditable and trustworthy. We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction, apply MSP in the context of predicting diagnoses from medical text, and validate our approach with a blind review by two clinicians. Our method identifies

$\approx 1.7\times$ more clinically informative text blocks than the previous state-of-the-art, runs up to

$100\times$ faster, and is tractable for generating important phrase pairs. MSP is particularly well-suited to long LMs but can be applied to any text classifier. We provide a general implementation here. https://github.com/Optum/long-medical-document-lms

Cite this Paper

BibTeX


@InProceedings{pmlr-v193-stremmel22a,
  title = 	 {Extend and Explain: Interpreting Very Long Language Models},
  author =       {Stremmel, Joel and Hill, Brian L. and Hertzberg, Jeffrey and Murillo, Jaime and Allotey, Llewelyn and Halperin, Eran},
  booktitle = 	 {Proceedings of the 2nd Machine Learning for Health symposium},
  pages = 	 {218--258},
  year = 	 {2022},
  editor = 	 {Parziale, Antonio and Agrawal, Monica and Joshi, Shalmali and Chen, Irene Y. and Tang, Shengpu and Oala, Luis and Subbaswamy, Adarsh},
  volume = 	 {193},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v193/stremmel22a/stremmel22a.pdf},
  url = 	 {https://proceedings.mlr.press/v193/stremmel22a.html},
  abstract = 	 {While Transformer language models (LMs) are state-of-the-art for information extraction, long text introduces computational challenges requiring suboptimal preprocessing steps or alternative model architectures. Sparse attention LMs can represent longer sequences, overcoming performance hurdles. However, it remains unclear how to explain predictions from these models, as not all tokens attend to each other in the self-attention layers, and long sequences pose computational challenges for explainability algorithms when runtime depends on document length. These challenges are severe in the medical context where documents can be very long, and machine learning (ML) models must be auditable and trustworthy. We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction, apply MSP in the context of predicting diagnoses from medical text, and validate our approach with a blind review by two clinicians. Our method identifies $\approx 1.7\times$ more clinically informative text blocks than the previous state-of-the-art, runs up to $100\times$ faster, and is tractable for generating important phrase pairs. MSP is particularly well-suited to long LMs but can be applied to any text classifier. We provide a general implementation here. https://github.com/Optum/long-medical-document-lms}
}

Endnote

%0 Conference Paper
%T Extend and Explain: Interpreting Very Long Language Models
%A Joel Stremmel
%A Brian L. Hill
%A Jeffrey Hertzberg
%A Jaime Murillo
%A Llewelyn Allotey
%A Eran Halperin
%B Proceedings of the 2nd Machine Learning for Health symposium
%C Proceedings of Machine Learning Research
%D 2022
%E Antonio Parziale
%E Monica Agrawal
%E Shalmali Joshi
%E Irene Y. Chen
%E Shengpu Tang
%E Luis Oala
%E Adarsh Subbaswamy	
%F pmlr-v193-stremmel22a
%I PMLR
%P 218--258
%U https://proceedings.mlr.press/v193/stremmel22a.html
%V 193
%X While Transformer language models (LMs) are state-of-the-art for information extraction, long text introduces computational challenges requiring suboptimal preprocessing steps or alternative model architectures. Sparse attention LMs can represent longer sequences, overcoming performance hurdles. However, it remains unclear how to explain predictions from these models, as not all tokens attend to each other in the self-attention layers, and long sequences pose computational challenges for explainability algorithms when runtime depends on document length. These challenges are severe in the medical context where documents can be very long, and machine learning (ML) models must be auditable and trustworthy. We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction, apply MSP in the context of predicting diagnoses from medical text, and validate our approach with a blind review by two clinicians. Our method identifies $\approx 1.7\times$ more clinically informative text blocks than the previous state-of-the-art, runs up to $100\times$ faster, and is tractable for generating important phrase pairs. MSP is particularly well-suited to long LMs but can be applied to any text classifier. We provide a general implementation here. https://github.com/Optum/long-medical-document-lms

APA


Stremmel, J., Hill, B.L., Hertzberg, J., Murillo, J., Allotey, L. & Halperin, E.. (2022). Extend and Explain: Interpreting Very Long Language Models. Proceedings of the 2nd Machine Learning for Health symposium, in Proceedings of Machine Learning Research 193:218-258 Available from https://proceedings.mlr.press/v193/stremmel22a.html.

Extend and Explain: Interpreting Very Long Language Models

Abstract

Cite this Paper

Related Material