Addressing Sample Size Challenges in Linked Data Through Data Fusion

Srikesh Arunajadai, Lulu Lee, Tom Haskell
Proceedings of the 5th Machine Learning for Healthcare Conference, PMLR 126:352-375, 2020.

Abstract

Linking secondary clinical data with patient-reported data at the patient-level brings together a comprehensive view of the patient but sample sizes can be a challenge. This study demonstrates the fusion of Patient Reported Outcomes in surveys with clinical data in claims enabling the study of associations between quality of life and disease-treatment interactions at scale especially for rare diseases. In this work, we show the ability to implement data fusion in a disease agnostic way thereby enabling the use of more advanced machine learning algorithms on larger data sets, while still being able to use the resulting fused data to perform disease specific analysis. This is in contrast to usual approaches where the data fusion might be attempted on disease specific data sets which can be too small to be amenable to analysis by advanced methods. The proposed data fusion methodology circumvents some of the assumptions typically imposed on the data fusion process that are untestable and usually invalid by taking advantage of the subset of the data that can be linked in the two data sources.

Cite this Paper


BibTeX
@InProceedings{pmlr-v126-arunajadai20a, title = {Addressing Sample Size Challenges in Linked Data Through Data Fusion}, author = {Arunajadai, Srikesh and Lee, Lulu and Haskell, Tom}, booktitle = {Proceedings of the 5th Machine Learning for Healthcare Conference}, pages = {352--375}, year = {2020}, editor = {Doshi-Velez, Finale and Fackler, Jim and Jung, Ken and Kale, David and Ranganath, Rajesh and Wallace, Byron and Wiens, Jenna}, volume = {126}, series = {Proceedings of Machine Learning Research}, month = {07--08 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v126/arunajadai20a/arunajadai20a.pdf}, url = {https://proceedings.mlr.press/v126/arunajadai20a.html}, abstract = {Linking secondary clinical data with patient-reported data at the patient-level brings together a comprehensive view of the patient but sample sizes can be a challenge. This study demonstrates the fusion of Patient Reported Outcomes in surveys with clinical data in claims enabling the study of associations between quality of life and disease-treatment interactions at scale especially for rare diseases. In this work, we show the ability to implement data fusion in a disease agnostic way thereby enabling the use of more advanced machine learning algorithms on larger data sets, while still being able to use the resulting fused data to perform disease specific analysis. This is in contrast to usual approaches where the data fusion might be attempted on disease specific data sets which can be too small to be amenable to analysis by advanced methods. The proposed data fusion methodology circumvents some of the assumptions typically imposed on the data fusion process that are untestable and usually invalid by taking advantage of the subset of the data that can be linked in the two data sources.} }
Endnote
%0 Conference Paper %T Addressing Sample Size Challenges in Linked Data Through Data Fusion %A Srikesh Arunajadai %A Lulu Lee %A Tom Haskell %B Proceedings of the 5th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2020 %E Finale Doshi-Velez %E Jim Fackler %E Ken Jung %E David Kale %E Rajesh Ranganath %E Byron Wallace %E Jenna Wiens %F pmlr-v126-arunajadai20a %I PMLR %P 352--375 %U https://proceedings.mlr.press/v126/arunajadai20a.html %V 126 %X Linking secondary clinical data with patient-reported data at the patient-level brings together a comprehensive view of the patient but sample sizes can be a challenge. This study demonstrates the fusion of Patient Reported Outcomes in surveys with clinical data in claims enabling the study of associations between quality of life and disease-treatment interactions at scale especially for rare diseases. In this work, we show the ability to implement data fusion in a disease agnostic way thereby enabling the use of more advanced machine learning algorithms on larger data sets, while still being able to use the resulting fused data to perform disease specific analysis. This is in contrast to usual approaches where the data fusion might be attempted on disease specific data sets which can be too small to be amenable to analysis by advanced methods. The proposed data fusion methodology circumvents some of the assumptions typically imposed on the data fusion process that are untestable and usually invalid by taking advantage of the subset of the data that can be linked in the two data sources.
APA
Arunajadai, S., Lee, L. & Haskell, T.. (2020). Addressing Sample Size Challenges in Linked Data Through Data Fusion. Proceedings of the 5th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 126:352-375 Available from https://proceedings.mlr.press/v126/arunajadai20a.html.

Related Material