期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2012
卷号:9
期号:4
出版社:IJCSI Press
摘要:Bioinformatics is the combination of Biology, Mathematics and Information Technology. It is a study of management and analysis of De-oxyribo Nucleic Acid, Ribo Nucleic Acid and protein sequence data. In Bioinformatics, motif finding is one of the most popular problems which have got lot of applications in diagnosing the diseases, drug designing and protein classification. It is essential to have an efficient technique to explore sequence motif from protein sequences. Data mining is one such technique. Bioinformatics dataset frequently contains large volume of segments generated from protein sequences. However, all the generated protein segments may not yield potential motif patterns. The segments have no labels or classes. Hence, one has to apply unsupervised segment selection method to select the potential segments. In this paper, two novel unsupervised segment selection methods are proposed for first time based on Shannon Entropy and Singular Value Decomposition (SVD) based - Entropy. The proposed methods are evaluated using the benchmark K-Means clustering method. It is found that the proposed SVD-Entropy based segment selection produces more number of highly structurally similar clusters, through which we are able to generate significant motif patterns.
关键词:Clustering; Data mining; protein sequence; Motif; SVD ; Entropy.