首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:Content-based Text Categorization using Wikitology
  • 本地全文:下载
  • 作者:Muhammad Rafi ; Sundus Hassan ; Muhammad Shahid Shaikh
  • 期刊名称:International Journal of Computer Science Issues
  • 印刷版ISSN:1694-0784
  • 电子版ISSN:1694-0814
  • 出版年度:2012
  • 卷号:9
  • 期号:4
  • 出版社:IJCSI Press
  • 摘要:The process of text categorization assigns labels or categories to each text document according to the semantic content of the document. The traditional approaches to text categorization used features from the text like: words, phrases, and concepts hierarchies to represent and reduce the dimensionality of the documents. Recently, researchers addressed this brittleness by incorporating background knowledge into document representation by using some external knowledge base for example WordNet, Open Project Directory (OPD) and Wikipedia. In this paper we have tried to enhance text categorization by integrating knowledge from Wikitology. Wikitology is a knowledge repository which extracts knowledge from Wikipedia in structured/unstructured forms with a warping of ontological structure. We have augmented text document by exploring Wikitology fields like: {Bag of Words, titles, redirects, entity types, categories and linked entities}. We also propose and evaluate different text representations and text enrichment technique. The classification is performed by using Support Vector Machine (SVM and we have validated this experiment on 4-fold cross-validation.
  • 关键词:Text Categorization; Machine Learning; Wikitology; Support Vector Machine; 20; Newsgroup. Reuters;21578
国家哲学社会科学文献中心版权所有