文章基本信息

标题：A Constrained Gaussian Mixture Model for Correlation-Based Cluster Analysis of Gene Expression Data
本地全文：下载
作者：Naoto Yukinawa ; Taku Yoshioka ; Kazuo Kobayashi 等
期刊名称：Information and Media Technologies
电子版ISSN：1881-0896
出版年度：2009
卷号：4
期号：4
页码：753-768
DOI：10.11185/imt.4.753
出版社：Information and Media Technologies Editorial Board
摘要：Clustering is a practical data analysis step in gene expression-based studies. Model-based clusterings, which are based on probabilistic generative models, have two advantages: the number of clusters can be determined based on statistical criteria, and the clusters are robust against the observation noises in data. Many existing approaches assume multi-variate Gaussian mixtures as generative models, which are analogous to the use of Euclidean or Mahalanobis type distance as the similarity measure. However, these types of similarity measures often fail to detect co-expressed gene groups. We propose a novel probabilistic model for cluster analyses based on the correlation between gene expression patterns. We also propose a “meta” cluster analysis method to eliminate the dependence of the clustering result on initial values of the clustering algorithm. In empirical studies with a time course gene expression dataset of Bacillus subtilis during sporulation, our method acquires more stable and informative results than the ordinary Gaussian mixture model-based clustering, k -means clustering and hierarchical clustering algorithms, which are widely used in this field. In addition, with the meta-cluster analysis, biologically-meaningful expression patterns are extracted from a set of clustering results. The constraints in our model worked more efficiently than those in the previous studies. In our experiment, such constraints contributed to the stability of the clustering results. Moreover, the clustering based on the Bayesian inference was found to be more stable than those by the conventional maximum likelihood estimation.