[edit]
Multitask-Guided Self-Supervised Tabular Learning for Patient-Specific Survival Prediction
Proceedings of the 18th Machine Learning in Computational Biology meeting, PMLR 240:10-22, 2024.
Abstract
Survival prediction, central to the analysis of clinical trials, has the potential to be transformed by the availability of RNA-seq data as it reveals the underlying molecular and genetic mechanisms for disease and outcomes. However, the amount of RNA-seq samples available for understudied or rare diseases is often limited. To address this, leveraging data across different cancer types can be a viable solution, necessitating the application of self-supervised learning techniques. Yet, this wealth of data often comes in a tabular format without a known structure, hindering the development of a generally effective augmentation method for survival prediction. While traditional methods have been constrained by a one cancer-one model philosophy or have relied solely on a single modality, our approach, Guided-STab, on the contrary, offers a comprehensive approach through pretraining on all available RNA-seq data from various cancer types while guiding the representation by incorporating sparse clinical features as auxiliary tasks. With a multitask-guided self-supervised representation learning framework, we maximize the potential of vast unlabeled datasets from various cancer types, leading to genomic-driven survival predictions. These auxiliary clinical tasks then guide the learned representations to enhance critical survival factors. Extensive experiments reinforce the promise of our approach, as Guided-STab consistently outperforms established benchmarks on TCGA dataset.