期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2012
卷号:9
期号:4
出版社:IJCSI Press
摘要:The process of text categorization assigns labels or categories to each text document according to the semantic content of the document. The traditional approaches to text categorization used features from the text like: words, phrases, and concepts hierarchies to represent and reduce the dimensionality of the documents. Recently, researchers addressed this brittleness by incorporating background knowledge into document representation by using some external knowledge base for example WordNet, Open Project Directory (OPD) and Wikipedia. In this paper we have tried to enhance text categorization by integrating knowledge from Wikitology. Wikitology is a knowledge repository which extracts knowledge from Wikipedia in structured/unstructured forms with a warping of ontological structure. We have augmented text document by exploring Wikitology fields like: {Bag of Words, titles, redirects, entity types, categories and linked entities}. We also propose and evaluate different text representations and text enrichment technique. The classification is performed by using Support Vector Machine (SVM and we have validated this experiment on 4-fold cross-validation.