Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. Identification of coexpressed genes and coherent patterns is the central goal in microarray or gene expression data analysis and is an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying coexpressed genes, biologically relevant groupings of genes and samples. In this paper we propose an algorithm - Automatic Generation of Merge Factor for Isodata ? AGMFI, to cluster microarray data on the basis of ISODATA. The main idea of AGMFI is to generate initial values for merge factor, maximum merge times instead of selecting heuristic values as in ISODATA. One significant feature of AGMFI over K-means is that the initial number of clusters may be merged or split , and so the final number of clusters may be different from the number of clusters specified as part of the input. We evaluate it’s performance by applying on a well-known publicly available microarray data sets and on simulated data set [3]. We compared the results with those of K-means clustering. The experiments indicate that the proposed algorithm AGMFI increased the enrichment of genes of similar function within the cluster.
Bioinformatics, Microarray gene expression data, coexpressed genes ,clustering, K-means, ISODATA, AGMFI