首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Effectiveness of Different Similarity Measures for Text Classification and Clustering
  • 本地全文:下载
  • 作者:Komal Maher ; Madhuri S. Joshi
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2016
  • 卷号:7
  • 期号:4
  • 页码:1715-1720
  • 出版社:TechScience Publications
  • 摘要:Present days humans are associated with largeamount of data on regular basis. The sole purpose ofgenerated data is to meet the immediate needs and no attemptin organizing the data for later efficient retrieval. Data miningis a concept of extracting knowledge from such an enormousamount of data.There are many techniques to classify andcluster the data which exists in the structured format, basedon similarity between documents in the text processing field.Clustering algorithms require a metric to quantify howdifferent two given documents are.This difference is oftenmeasured by some distance measure such as Euclideandistance, Cosine similarity, Jaccard correlation, Similaritymeasure for text processing to name a few. In this researchwork, we experiment with Euclidean distance, Cosinesimilarity and Similarity measure for text processing distancemeasures. The effectiveness of these three measures isevaluated on a real-world data set for text classification andclustering problems. The results show that the performanceobtained by the Similarity measure for text processingmeasure is better than that achieved by other measures.
  • 关键词:Document classification; document clustering;entropy; accuracy; classifiers; clustering algorithms
国家哲学社会科学文献中心版权所有