CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation

Kevin Lam, William Daniels, Maxwell Douglas, Daniel Lai, Samuel Aparicio, Benjamin Bloem-Reddy, Yongjin Park
Proceedings of the 20th Machine Learning in Computational Biology meeting, PMLR 311:65-80, 2025.

Abstract

Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v311-lam25a, title = {CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation}, author = {Lam, Kevin and Daniels, William and Douglas, Maxwell and Lai, Daniel and Aparicio, Samuel and Bloem-Reddy, Benjamin and Park, Yongjin}, booktitle = {Proceedings of the 20th Machine Learning in Computational Biology meeting}, pages = {65--80}, year = {2025}, editor = {Knowles, David A and Koo, Peter K}, volume = {311}, series = {Proceedings of Machine Learning Research}, month = {10--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v311/main/assets/lam25a/lam25a.pdf}, url = {https://proceedings.mlr.press/v311/lam25a.html}, abstract = {Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.} }
Endnote
%0 Conference Paper %T CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation %A Kevin Lam %A William Daniels %A Maxwell Douglas %A Daniel Lai %A Samuel Aparicio %A Benjamin Bloem-Reddy %A Yongjin Park %B Proceedings of the 20th Machine Learning in Computational Biology meeting %C Proceedings of Machine Learning Research %D 2025 %E David A Knowles %E Peter K Koo %F pmlr-v311-lam25a %I PMLR %P 65--80 %U https://proceedings.mlr.press/v311/lam25a.html %V 311 %X Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.
APA
Lam, K., Daniels, W., Douglas, M., Lai, D., Aparicio, S., Bloem-Reddy, B. & Park, Y.. (2025). CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation. Proceedings of the 20th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 311:65-80 Available from https://proceedings.mlr.press/v311/lam25a.html.

Related Material