首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Efficient Preprocessing and Patterns Identification Approach for Text Mining
  • 本地全文:下载
  • 作者:Pattan Kalesha ; M. Babu Rao ; Ch. Kavitha
  • 期刊名称:International Journal of Computer Trends and Technology
  • 电子版ISSN:2231-2803
  • 出版年度:2013
  • 卷号:6
  • 期号:2
  • 出版社:Seventh Sense Research Group
  • 摘要:Due to the rapid expansion of digital data , knowledge discovery and data mining have attracted significant amount of attention for turning such data into helpful information and knowledge. Text categorization is continuing to become the most researched NLP problems on account of the everincreasing levels of electronic documents and digital libraries. we present a novel text categorization method that puts together the decision on multiple attributes. Since the most of existing text mining methods adopted termbased approaches, all of these are affected by the difficulties of polysemy and synonymy. Existing pattern discovery technique includes the processes of pattern deploying and pattern evolving, to strengthen the impact of using and updating discovered patterns for looking for relevant and interesting information. But the current association Rules methods exist shortage in two aspects once it is used on patterns classification. a person is the strategy ignored the data about word's frequency in a text . The opposite happens to be the method need pruning rules whenever the mass rules are generated. Within this proposed work specific documents are preprocessed before placing patterns discovery. Preprocessing the document dataset using tokenization, stemming, and probability filtering approaches. Proposed approach gives better decision rules compare to existing approach.
  • 关键词:Pa t t e r n s; Rules; Stemming; Probability
国家哲学社会科学文献中心版权所有