首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Frequency and Compactness for Text Categorization
  • 本地全文:下载
  • 作者:Sumayya Hasan Osmani ; T.Naresh Kumar
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2011
  • 卷号:2
  • 期号:5
  • 页码:1947-1950
  • 出版社:TechScience Publications
  • 摘要:Text categorization is the task of assigning predefined categories to natural language text. With the widely used “bagof- word” representation, previous researches usually assign a word with values that express whether this word appears in the document concerned or how frequently this word appears. Although these values are useful for text categorization, we also use naval values assigned to a word are called distributional features, which include the compactness of the appearance of a word, and the position of the first appearance of the word, but experiments show that the first position of the appeared is not enough to categorized the text because in some documents last appeared word can be more important than the first appeared. different features are combined using ensemble learning technique. Further analysis shows that the distributional features are especially useful when documents are long and the writing style is casual.
  • 关键词:Text categorization; machine learning;distributional feature
国家哲学社会科学文献中心版权所有