Learning from Few Subjects with Large Amounts of Voice Monitoring Data

Jose Javier Gonzalez Ortiz, Daryush D. Mehta, Jarrad H. Van Stan, Robert Hillman, John V. Guttag, Marzeyeh Ghassemi
Proceedings of the 4th Machine Learning for Healthcare Conference, PMLR 106:704-720, 2019.

Abstract

Recently, researchers have started training high complexity machine learning models to clinical tasks, often improving upon previous benchmarks. However, more often than not, these methods require large amounts of supervision to provide good generalization guarantees. When applied to data coming from small cohorts and long monitoring periods these models are prone to overt to subject-identifying features. Since obtaining large amounts of labels is usually not practical in many scenarios, expert-driven knowledge of the task is a common technique to prevent overfitting. We present a two-step learning approach that is able to generalize under these circumstances when applied to a voice monitoring dataset. Our approach decouples the feature learning stage and performs it in an unsupervised manner, removing the need for laborious feature engineering. We show the effectiveness of our proposed model on two voice monitoring related tasks. We evaluate the extracted features for classifying between patients with vocal fold nodules and controls. We also demonstrate that the features capture pathology relevant information by showing that models trained on them are more accurate predicting vocal use for patients than for controls. Our proposed method is able to generalize to unseen subjects and across learning tasks while matching state-of-the-art results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v106-ortiz19a, title = {Learning from Few Subjects with Large Amounts of Voice Monitoring Data}, author = {Ortiz, Jose Javier Gonzalez and Mehta, Daryush D. and Van Stan, Jarrad H. and Hillman, Robert and Guttag, John V. and Ghassemi, Marzeyeh}, booktitle = {Proceedings of the 4th Machine Learning for Healthcare Conference}, pages = {704--720}, year = {2019}, editor = {Doshi-Velez, Finale and Fackler, Jim and Jung, Ken and Kale, David and Ranganath, Rajesh and Wallace, Byron and Wiens, Jenna}, volume = {106}, series = {Proceedings of Machine Learning Research}, month = {09--10 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v106/ortiz19a/ortiz19a.pdf}, url = {https://proceedings.mlr.press/v106/ortiz19a.html}, abstract = {Recently, researchers have started training high complexity machine learning models to clinical tasks, often improving upon previous benchmarks. However, more often than not, these methods require large amounts of supervision to provide good generalization guarantees. When applied to data coming from small cohorts and long monitoring periods these models are prone to overt to subject-identifying features. Since obtaining large amounts of labels is usually not practical in many scenarios, expert-driven knowledge of the task is a common technique to prevent overfitting. We present a two-step learning approach that is able to generalize under these circumstances when applied to a voice monitoring dataset. Our approach decouples the feature learning stage and performs it in an unsupervised manner, removing the need for laborious feature engineering. We show the effectiveness of our proposed model on two voice monitoring related tasks. We evaluate the extracted features for classifying between patients with vocal fold nodules and controls. We also demonstrate that the features capture pathology relevant information by showing that models trained on them are more accurate predicting vocal use for patients than for controls. Our proposed method is able to generalize to unseen subjects and across learning tasks while matching state-of-the-art results.} }
Endnote
%0 Conference Paper %T Learning from Few Subjects with Large Amounts of Voice Monitoring Data %A Jose Javier Gonzalez Ortiz %A Daryush D. Mehta %A Jarrad H. Van Stan %A Robert Hillman %A John V. Guttag %A Marzeyeh Ghassemi %B Proceedings of the 4th Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2019 %E Finale Doshi-Velez %E Jim Fackler %E Ken Jung %E David Kale %E Rajesh Ranganath %E Byron Wallace %E Jenna Wiens %F pmlr-v106-ortiz19a %I PMLR %P 704--720 %U https://proceedings.mlr.press/v106/ortiz19a.html %V 106 %X Recently, researchers have started training high complexity machine learning models to clinical tasks, often improving upon previous benchmarks. However, more often than not, these methods require large amounts of supervision to provide good generalization guarantees. When applied to data coming from small cohorts and long monitoring periods these models are prone to overt to subject-identifying features. Since obtaining large amounts of labels is usually not practical in many scenarios, expert-driven knowledge of the task is a common technique to prevent overfitting. We present a two-step learning approach that is able to generalize under these circumstances when applied to a voice monitoring dataset. Our approach decouples the feature learning stage and performs it in an unsupervised manner, removing the need for laborious feature engineering. We show the effectiveness of our proposed model on two voice monitoring related tasks. We evaluate the extracted features for classifying between patients with vocal fold nodules and controls. We also demonstrate that the features capture pathology relevant information by showing that models trained on them are more accurate predicting vocal use for patients than for controls. Our proposed method is able to generalize to unseen subjects and across learning tasks while matching state-of-the-art results.
APA
Ortiz, J.J.G., Mehta, D.D., Van Stan, J.H., Hillman, R., Guttag, J.V. & Ghassemi, M.. (2019). Learning from Few Subjects with Large Amounts of Voice Monitoring Data. Proceedings of the 4th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 106:704-720 Available from https://proceedings.mlr.press/v106/ortiz19a.html.

Related Material