首页    期刊浏览 2025年02月20日 星期四
登录注册

文章基本信息

  • 标题:Big Data Full-Text Search Index Minimization Using Text Summarization
  • 本地全文:下载
  • 作者:Waheed Iqbal ; Waqas Ilyas Malik ; Faisal Bukhari
  • 期刊名称:European Integration Studies
  • 印刷版ISSN:2335-8831
  • 出版年度:2021
  • 卷号:50
  • 期号:2
  • 页码:375-389
  • DOI:10.5755/j01.itc.50.2.25470
  • 语种:English
  • 出版社:Kaunas University of Technology
  • 摘要:An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storagecost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertainfull-text search queries with good performance. It also incurs overhead to store, manage, and update the largesize index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text searchover Big Data using an automatic extractive-based text summarization method. To evaluate the effectivenessof the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets usingApache Lucene and studied average simple overlapping, Spearman’s rho correlation, and average rankingscore measures of search results obtained using different search queries. Our experimental evaluation showsthat automatic text summarization is an effective method to reduce the index size significantly. We obtained amaximum of 82% reduction in index size with 42% higher relevance of the search results using the proposedsolution to minimize the full-text index size.
  • 关键词:Big Data;Indexing;Searching;Index Minimization;Text Summarization
国家哲学社会科学文献中心版权所有