首页    期刊浏览 2025年06月15日 星期日
登录注册

文章基本信息

  • 标题:Ontology Based Document Clustering Using MapReduce
  • 本地全文:下载
  • 作者:Abdelrahman Elsayed ; Hoda M. O. Mokhtar ; Osama Ismail
  • 期刊名称:International Journal of Database Management Systems
  • 印刷版ISSN:0975-5985
  • 电子版ISSN:0975-5705
  • 出版年度:2015
  • 卷号:7
  • 期号:2
  • 页码:1
  • DOI:10.5121/ijdms.2015.7201
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:Nowadays, document clustering is considered as a data intensive task due to the dramatic, fast increase inthe number of available documents. Nevertheless, the features that represent those documents are also toolarge. The most common method for representing documents is the vector space model, which representsdocument features as a bag of words and does not represent semantic relations between words. In thispaper we introduce a distributed implementation for the bisecting k-means using MapReduce programmingmodel. The aim behind our proposed implementation is to solve the problem of clustering intensive datadocuments. In addition, we propose integrating the WordNet ontology with bisecting k-means in order toutilize the semantic relations between words to enhance document clustering results. Our presentedexperimental results show that using lexical categories for nouns only enhances internal evaluationmeasures of document clustering; and decreases the documents features from thousands to tens features.Our experiments were conducted using Amazon Elastic MapReduce to deploy the Bisecting k-meansalgorithm
  • 关键词:Document clustering; Ontology; Text Mining; Distributed Computing
国家哲学社会科学文献中心版权所有