首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:An Approach of Semantic Similarity Measure between Documents Based on Big Data
  • 其他标题:An Approach of Semantic Similarity Measure between Documents Based on Big Data
  • 本地全文:下载
  • 作者:Mohammed Erritali ; Abderrahim Beni-Hssane ; Marouane Birjali
  • 期刊名称:International Journal of Electrical and Computer Engineering
  • 电子版ISSN:2088-8708
  • 出版年度:2016
  • 卷号:6
  • 期号:5
  • 页码:2454-2461
  • DOI:10.11591/ijece.v6i5.pp2454-2461
  • 语种:English
  • 出版社:Institute of Advanced Engineering and Science (IAES)
  • 摘要:Semantic indexing and document similarity is an important information retrieval system problem in Big Data with broad applications. In this paper, we investigate MapReduce programming model as a specific framework for managing distributed processing in a large of amount documents. Then we study the state of the art of different approaches for computing the similarity of documents. Finally, we propose our approach of semantic similarity measures using WordNet as an external network semantic resource. For evaluation, we compare the proposed approach with other approaches previously presented by using our new MapReduce algorithm. Experimental results review that our proposed approach outperforms the state of the art ones on running time performance and increases the measurement of semantic similarity.
  • 其他摘要:Semantic indexing and document similarity is an important information retrieval system problem in Big Data with broad applications. In this paper, we investigate MapReduce programming model as a specific framework for managing distributed processing in a large of amount documents. Then we study the state of the art of different approaches for computing the similarity of documents. Finally, we propose our approach of semantic similarity measures using WordNet as an external network semantic resource. For evaluation, we compare the proposed approach with other approaches previously presented by using our new MapReduce algorithm. Experimental results review that our proposed approach outperforms the state of the art ones on running time performance and increases the measurement of semantic similarity.
  • 关键词:distributed processing; Hadoop cluster; HDFS; Big Data; Simantic similarity; Parallel algorithm; Mapreduce programming; Wordnet
国家哲学社会科学文献中心版权所有