文章基本信息

标题：Minimum Spanning Tree-based Clustering Applied to Protein Sequences in Early Cancer Diagnosis
本地全文：下载
作者：Dr. T. Karthikeyan ; S. John Peter ; B. Praburaj 等
期刊名称：International Journal of Computer Science & Technology
印刷版ISSN：2229-4333
电子版ISSN：0976-8491
出版年度：2012
卷号：III
期号：I – Ver 4
出版社：Ayushmaan Technologies
摘要：Cancer molecular pattern efficient discovery is essential in the molecular diagnostics. The number of amino acid sequence is increasing very rapidly in the protein databases, but the structure of only some amino acid sequences are found in the protein data bank. Thus an important problem in genomics is automatically clustering homogeneous protein sequences when only sequence information is available. The characteristics of the protein expression data are challenging the traditional unsupervised classification algorithm. In this paper we use Minimum Spanning Tree based clustering algorithm for clustering amino acid sequences. A similarity graph is defined and a cluster in that graph corresponds to connected sub graph. Cluster analysis seeks grouping of amino acid sequence in to subsets based on Euclidean distance between pairs of sequences. Our goal is to find disjoint subsets, called clusters, such that two criteria are satisfied: homogeneity: sequences in the same cluster are highly similar to each other and separation: sequences in the different clusters have low similarities to each other. A thorough understanding of the genes is based on upon having adequate informat ion about t he pro tein s. Solv ing th e protein related problem has beco me o ne of the most important challenges in bioinformatics. In bioinformatics, number of protein sequences is more than half million, and it is necessary to find meaningful partition of them in order to detect their functions. The method which can enhance the structural recognition, classification and interpretation of proteins will be advantageous. Many methods have been adopted to solve such bioinformatics problem. Our Minimum Spanning Tree based clustering algorithm is useful and efficient method in the collective study of protein subset. The key feature of the algorithm is ability to predict the 3D structure of the unknown protein sequence
关键词：Euclidean Minimum Spanning Tree; Subtree; Eccentricity; Center; ;Hierarchical Clustering; Cluster Validity; Cluster Separation