摘要:Nowadays, biomedical data are generated exponentially, creating datasets for analysis with ultra-high dimensionality and complexity. An indicative example is emerging single-cell RNA-sequencing (scRNA-seq) technology, which isolates and measures individual cells. The analysis of scRNA-seq data consists of a major challenge because of its ultra-high dimensionality and complexity. Towards this direction, we study the generalization of the MRPV, a recently published ensemble classification algorithm, which combines multiple ultra-low dimensional random projected spaces with a voting scheme, while exposing its ability to enhance the performance of base classifiers. We empirically showed that we can design a reliable ensemble classification technique using random projected subspaces in an extremely small fixed number of dimensions, without following the restrictions of the classical random projection method. Therefore, the MPRV acquires the ability to efficiently and rapidly perform classification tasks even for data with extremely high dimensionality. Furthermore, through the experimental analysis in six scRNA-seq data, we provided evidence that the most critical advantage of MRPV is the dramatic reduction in data dimensionality that allows for the utilization of computational demanding classifiers that are considered as non-practical in real-life applications. The scalability, the simplicity, and the capabilities of our proposed framework render it as a tool-guide for single-cell RNA-seq data which are characterized by ultra-high dimensionality. MRPV is available on GitHub in MATLAB implementation.
关键词:ensemble classification; random projections; big biomedical data; single-cell RNA-seq; high dimensional data ensemble classification ; random projections ; big biomedical data ; single-cell RNA-seq ; high dimensional data