期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2018
卷号:7
期号:1
页码:86-90
出版社:Shri Pannalal Research Institute of Technolgy
摘要:Growth of research articles publication in various streams of research is exponential. Searching for a particular article from the research repository is considered to be a tremendous one and also time consuming. Research articles classification based on their respective domain plays an important role for researchers to retrieve articles in a fast manner. Hence a popular search mechanism, namely keyword search has been applied to retrieve appropriate articles, documents, texts, graphs and even relational databases. When new domains of documents are added to the repository it has to identify keywords and add to the corresponding domains for proper classification. A numerical statistic called TF-IDF has been proposed to determine the relevance of word to a document corpus. Clustering algorithms namely Hierarchical, K-Means and Fuzzy C-Means have been used to cluster articles based on the relevance factor TF-IDF. The strength of Fuzzy C-Means clustering has been validated using Silhouette Cluster Validation technique. Finally, performance has been evaluated using Precision, Recall and F-measure and demonstrated that Fuzzy C-Means clustering depicts better accuracy compared to K-Means and Hierarchical clustering.