期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2008
卷号:8
期号:4
页码:225-231
出版社:International Journal of Computer Science and Network Security
摘要:Cluster analysis is the study of techniques for finding the most representative cluster prototypes. Linear relation of two sequences can be modeled perfectly through the classical linear regression model. Protein sequence clustering has many applications such as helps in classifying a new sequence, predicting the protein structure of unknown sequence and finding the family and subfamily relationships of protein sequences. To cluster a repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we have proposed a new technique named General Regression Model Technique (GRMT1) to test the linearity of the sequences. Later we have applied General Regression Model Technique Clustering Algorithm (GRMTCA) to cluster the protein sequences. The performance of the algorithm was evaluated with 50 protein sequences. We used BLAST to annotate the clusters obtained by GRMTCA. It is observed that the clusters have biological significance.
关键词:Clustering; BLAST; General Regression Model; Protein Sequences