Model-based imputation enables improved resolution for identifying differential chromatin contacts in single-cell Hi-C data

Neda Shokraneh Kenari, Megan Andrews, Max Libbrecht
Proceedings of the 18th Machine Learning in Computational Biology meeting, PMLR 240:176-193, 2024.

Abstract

Recent advances in single-cell Hi-C (scHi-C) assays allow studying the chromatin conformation at the resolution of a single cell or a cluster of cells. A key question is to identify changes in the contact strength between two cell types, known as differential chromatin contacts (DCCs). While existing statistical methods can identify changes in contact strength in bulk Hi-C data, these methods cannot be effectively applied to scHi-C data due to its severe sparsity. Thus it is necessary to develop methods for identifying differential chromatin contacts in scHi-C data. Recently-developed scHi-C imputation approaches can mitigate the issue of sparsity. We propose an approach for identifying differential chromatin contacts using these imputation approaches. We build upon the existing SnapHiC-D method by replacing its imputation step with recent learning-based imputation approaches. We show that, via analysis of real scHi-C datasets with different coverages and at different resolutions, imputation approaches that consider the spatial correlation between bin pairs, Higashi, and random walk with restart, outperform other approaches. Furthermore, we show that careful considerations are needed when imputation is done in preprocessing steps as it may invalidate downstream statistical approaches. Finally, our results indicate that model-based imputations greatly improve performance when analyzing chromatin contacts at moderate resolution (100kb); however, current imputation approaches are inefficient in terms of both accuracy and computational complexity when being applied to high-resolution scHi-C resolution (10kb).

Cite this Paper


BibTeX
@InProceedings{pmlr-v240-shokraneh-kenari24a, title = {Model-based imputation enables improved resolution for identifying differential chromatin contacts in single-cell Hi-C data}, author = {Shokraneh Kenari, Neda and Andrews, Megan and Libbrecht, Max}, booktitle = {Proceedings of the 18th Machine Learning in Computational Biology meeting}, pages = {176--193}, year = {2024}, editor = {Knowles, David A. and Mostafavi, Sara}, volume = {240}, series = {Proceedings of Machine Learning Research}, month = {30 Nov--01 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v240/shokraneh-kenari24a/shokraneh-kenari24a.pdf}, url = {https://proceedings.mlr.press/v240/shokraneh-kenari24a.html}, abstract = {Recent advances in single-cell Hi-C (scHi-C) assays allow studying the chromatin conformation at the resolution of a single cell or a cluster of cells. A key question is to identify changes in the contact strength between two cell types, known as differential chromatin contacts (DCCs). While existing statistical methods can identify changes in contact strength in bulk Hi-C data, these methods cannot be effectively applied to scHi-C data due to its severe sparsity. Thus it is necessary to develop methods for identifying differential chromatin contacts in scHi-C data. Recently-developed scHi-C imputation approaches can mitigate the issue of sparsity. We propose an approach for identifying differential chromatin contacts using these imputation approaches. We build upon the existing SnapHiC-D method by replacing its imputation step with recent learning-based imputation approaches. We show that, via analysis of real scHi-C datasets with different coverages and at different resolutions, imputation approaches that consider the spatial correlation between bin pairs, Higashi, and random walk with restart, outperform other approaches. Furthermore, we show that careful considerations are needed when imputation is done in preprocessing steps as it may invalidate downstream statistical approaches. Finally, our results indicate that model-based imputations greatly improve performance when analyzing chromatin contacts at moderate resolution (100kb); however, current imputation approaches are inefficient in terms of both accuracy and computational complexity when being applied to high-resolution scHi-C resolution (10kb).} }
Endnote
%0 Conference Paper %T Model-based imputation enables improved resolution for identifying differential chromatin contacts in single-cell Hi-C data %A Neda Shokraneh Kenari %A Megan Andrews %A Max Libbrecht %B Proceedings of the 18th Machine Learning in Computational Biology meeting %C Proceedings of Machine Learning Research %D 2024 %E David A. Knowles %E Sara Mostafavi %F pmlr-v240-shokraneh-kenari24a %I PMLR %P 176--193 %U https://proceedings.mlr.press/v240/shokraneh-kenari24a.html %V 240 %X Recent advances in single-cell Hi-C (scHi-C) assays allow studying the chromatin conformation at the resolution of a single cell or a cluster of cells. A key question is to identify changes in the contact strength between two cell types, known as differential chromatin contacts (DCCs). While existing statistical methods can identify changes in contact strength in bulk Hi-C data, these methods cannot be effectively applied to scHi-C data due to its severe sparsity. Thus it is necessary to develop methods for identifying differential chromatin contacts in scHi-C data. Recently-developed scHi-C imputation approaches can mitigate the issue of sparsity. We propose an approach for identifying differential chromatin contacts using these imputation approaches. We build upon the existing SnapHiC-D method by replacing its imputation step with recent learning-based imputation approaches. We show that, via analysis of real scHi-C datasets with different coverages and at different resolutions, imputation approaches that consider the spatial correlation between bin pairs, Higashi, and random walk with restart, outperform other approaches. Furthermore, we show that careful considerations are needed when imputation is done in preprocessing steps as it may invalidate downstream statistical approaches. Finally, our results indicate that model-based imputations greatly improve performance when analyzing chromatin contacts at moderate resolution (100kb); however, current imputation approaches are inefficient in terms of both accuracy and computational complexity when being applied to high-resolution scHi-C resolution (10kb).
APA
Shokraneh Kenari, N., Andrews, M. & Libbrecht, M.. (2024). Model-based imputation enables improved resolution for identifying differential chromatin contacts in single-cell Hi-C data. Proceedings of the 18th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 240:176-193 Available from https://proceedings.mlr.press/v240/shokraneh-kenari24a.html.

Related Material