首页    期刊浏览 2025年05月02日 星期五
登录注册

文章基本信息

  • 标题:A NOVEL APPROACH FOR TEXT CLUSTERING USING MUST LINK AND CANNOT LINK ALGORITHM
  • 本地全文:下载
  • 作者:J.DAFNI ROSE ; DIVYA D. DEV ; C.R.RENE ROBIN
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2014
  • 卷号:60
  • 期号:1
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Text clustering is used to group documents with high levels of similarity. It has found applications in different areas of text mining and information retrieval. The digital data available nowadays has grown in huge volume and retrieving useful information from that is a big challenge. Text clustering has found an important application to organize the data and to extract useful information from the available corpus. In this paper, we have proposed a novel method for clustering the text documents. In the first phase features are selected using a genetic based method. In the next phase the extracted keywords are clustered using a hybrid algorithm. The clusters are classed under meaningful topics. The MLCL algorithm works in three phases. Firstly, the linked keywords of the genetic based extraction method are identified with a Must Link and Cannot Link algorithm (MLCL). Secondly, the MLCL algorithm forms the initial clusters. Finally, the clusters are optimized using Gaussian parameters. The proposed method is tested with datasets like Reuters-21578 and Brown Corpus. The experimental results prove that our proposed method has an improved performance than the fuzzy self-constructing feature clustering algorithm.
  • 关键词:Genetic Algorithm; Keyword Extraction; Text Clustering; MLCL Algorithm.
国家哲学社会科学文献中心版权所有