首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:EFFECTIVE FEATURE EXTRACTION FOR DOCUMENT CLUSTERING TO ENHANCE SEARCH ENGINE USING XML
  • 本地全文:下载
  • 作者:P.AJITHA ; DR. G. GUNASEKARAN
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2014
  • 卷号:68
  • 期号:1
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Clustering is done using lingo algorithm by extracting the data contents in the document. The data is stored in XML, which manages large volume of data. Lingo combines several existing methods to put special emphasis on meaningful cluster descriptions, apart from identifying document similarities. The steps involved in this process are designing the term-document matrix and then extracting the frequent phrase using suffix arrays. Readable and unambiguous descriptions of the thematic groups are an important factor of the overall quality of clustering. The Lingo algorithm consist of five phases, they are Pre-processing, Extraction of Frequent phrase, Induction of Cluster label, Discovery of Cluster content, Final cluster formation.
  • 关键词:Lingo; XML; SVD; LSI; Phrase matrix.
国家哲学社会科学文献中心版权所有