首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Efficient Data Replication Scheme based on Hadoop Distributed File System
  • 本地全文:下载
  • 作者:Jungha Lee ; Jaehwa Chung ; Daewon Lee
  • 期刊名称:International Journal of Software Engineering and Its Applications
  • 印刷版ISSN:1738-9984
  • 出版年度:2015
  • 卷号:9
  • 期号:12
  • 页码:177-186
  • DOI:10.14257/ijseia.2015.9.12.16
  • 出版社:SERSC
  • 摘要:Hadoop distributed file system (HDFS) is designed to store huge data set reliably, has been widely used for processing massive-scale data in parallel. In HDFS, the data locality problem is one of critical problem that causes the performance decrement of a file system. To solve the data locality problem, we propose an efficient data replication scheme based on access count prediction in a Hadoop framework. By the previous data access count, the existing data replication scheme predicts the next access count of data files using Lagrange's interpolation. Then, the proposed data replication scheme determines the replication factor with the predicted data access count, whether it generates a new replica or it uses the loaded data as cache selectively. Finally, the proposed scheme provides improvement of data locality. By performance evaluation, proposed efficient data replication scheme is compared with default data replication setting of Hadoop that shows proposed scheme reduces averagely 8.9% of the task completion time in the map phase. Regarding the data locality, proposed scheme provides the increase of node locality by 6.6% and the decrease of rack and rack-off locality by 38.9% and 56.5%.
  • 关键词:Hadoop; Data locality; Access Prediction; Data Replication; Data ; Placement
国家哲学社会科学文献中心版权所有