首页    期刊浏览 2024年09月21日 星期六
登录注册

文章基本信息

  • 标题:Explore Multidocument Text Clustering With Supervised And Unsupervised Constraints
  • 本地全文:下载
  • 作者:V.Shanmugapriya ; S.Krishnaveni
  • 期刊名称:International Journal of Engineering and Computer Science
  • 印刷版ISSN:2319-7242
  • 出版年度:2014
  • 卷号:3
  • 期号:10
  • 页码:8821-8822
  • 出版社:IJECS
  • 摘要:Clustering techniques are used for automatically organizing or summarizing a large collection of text; therehave been many approaches to clustering. As described below, for the purpose of the work, we areparticularly interested in two of them: coclustering and constrained clustering. This thesis proposes a novelconstrained coclustering method to achieve two goals. First, it combines information-theoretic coclusteringand constrained clustering to improve clustering performance. Second, it adopts both supervised andunsupervised constraints to demonstrate the effectiveness of the algorithm.The unsupervised constraints are automatically derived from existing knowledge sources, thus saving theeffort and cost of using manually labeled constraints. To achieve our first goal, we develop a two-sidedhidden Markov random field (HMRF) model to represent both document and word constraints. It then usedan alternating expectation maximization (EM) algorithm to optimize the model. It also proposes two novelmethods to automatically construct and incorporate document and word constraints to support unsupervisedconstrained clustering. 1) Automatically construct document constraints 2) Automatically construct wordconstraints The results of the evaluation demonstrates the superiority of our approaches against a number ofexisting approaches.Unlike existing approaches, this thesis applies stop word removal, stemming andsynonym word replacement to apply semantic similarity between words in the documents. In addition,content can be retrieved from text files, HTML pages as well as XML pages. Tags are eliminated fromHTML files. Attribute name and values are taken as normal paragraph words in XML files and thenpreprocessing (stop word removal, stemming and synonym word replacement) is applied.
  • 关键词:Constrained clustering; coclustering; unsupervised constraints; text clustering
国家哲学社会科学文献中心版权所有