Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.scmFormer, a Transformer-based model, employs multi-task learning for single-cell multi-omics integration and unmeasured data generation. It excels in preserving shared information across diverse datasets, achieving a 54.5% higher average F1 score in cell-type label transfer. Impressively scalable, scmFormer seamlessly integrates millions of cells on personal computers, outperforming existing methods in generating unmeasured modalities and excelling in spatial multi-omic data analysis. image
Copyright 2002 - 2023 Wuhan Botanical Garden,Chinese Academy Of
Sciences
Email: wbgoffice@wbgcas.cn ICP: 05004779-1