摘要:Problem statement: Using microarray techniques one could monitor the expressions levels of thousands of genes simultaneously. One challenge was how to derive meaningful insights into expressed data. This might be carried out by clustering techniques such as hierarchical and k-means, but most of the clustering techniques were largely heuristic in nature and are associated with some unresolved issues like how to fix the precise number of clusters and how to visualize the results in a pictorial form. Approach: Determine accurate number of clusters from gene expression data and validate the results using correctness ratio and sum of squares criteria. A new approach suggested to addresses the primary issue of k-means clustering algorithm that predefining number of clusters. This approach provides accurate number of clusters by minimizing the squared error function and maximizing the correctness ratio value. Results: The experimental results have shown the efficiency of our method by calculating and comparing the sum of squares with different k values. It was concluded that the number of clusters were accurate with minimum sum of squares value and maximum value of correctness ratio. Conclusion: The results showed that the quality of clusters and performance of this new approach is improved.
关键词:Microarray; expectation maximization; clustering technique; squared error function