期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2011
卷号:2
期号:5
页码:1947-1950
出版社:TechScience Publications
摘要:Text categorization is the task of assigning predefined categories to natural language text. With the widely used “bagof- word” representation, previous researches usually assign a word with values that express whether this word appears in the document concerned or how frequently this word appears. Although these values are useful for text categorization, we also use naval values assigned to a word are called distributional features, which include the compactness of the appearance of a word, and the position of the first appearance of the word, but experiments show that the first position of the appeared is not enough to categorized the text because in some documents last appeared word can be more important than the first appeared. different features are combined using ensemble learning technique. Further analysis shows that the distributional features are especially useful when documents are long and the writing style is casual.