出版社:Vilnius University, University of Latvia, Latvia University of Agriculture, Institute of Mathematics and Informatics of University of Latvia
摘要:Due to dramatic progress in high-throughput sequencing technologies and widespread
of microarray assays over the last decade, gene expression data has been accumulating at an
accelerating pace. All this insured gene expression profiling to be extensively used as a powerful
technique for phenotype classification in many biological studies. However, this is not always
possible to replicate a particular experiment with various organisms or tissues to achieve sample
size that will be large enough to meet the assumptions of classical statistical methods used to
deliver reliable classification results. Small dataset size due to lack of sample objects can also
be a problem when trying to reuse the data from public databases submitted by other researchers
from their experiments. In this paper we introduce a two-step classification method for a specific
task of phenotype identification, which firstly clusters data and then performs classification within
each cluster. We apply this method to a real dataset for the purpose of bacterial gene-expression
analysis and present its results.