首页    期刊浏览 2024年09月19日 星期四
登录注册

文章基本信息

  • 标题:Text Clustering for Information Retrieval System Using Supplementary Information
  • 本地全文:下载
  • 作者:Chitra Kalyanasundaram ; Snehal Ahire ; Gaurav Jain
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2015
  • 卷号:6
  • 期号:2
  • 页码:1613-1615
  • 出版社:TechScience Publications
  • 摘要:Text clustering extends over wide range of applications from information retrieval system, pattern recognition, search engines to social networks, and other digital collections. Text data involved in such applications usually have ample of unused data associated with them. The paper focuses on handling this unused data, referred as supplementary information, to generate effective clusters. The supplementary information may include document provenance information, links in a document, index terms used within a document or any other data that is not generally used for clustering. In this paper, we perform document clustering using supplementary information along with the content for generating clusters with higher purity. We also identify the use of such supplementary information for clustering in applications involving other file types like audio, image, video, etc. The clustering performance may degrade if the supplementary information associated with pure content is noisy. Taking this into consideration, we use partitioningbased clustering algorithm and a probabilistic model. We present experimental result to justify the approach
  • 关键词:Clustering; Information Retrieval System;partition-based clustering algorithm; Probabilistic model;Supplementary information; Similarity measure
国家哲学社会科学文献中心版权所有