期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2017
卷号:8
期号:2
页码:153-158
出版社:TechScience Publications
摘要:The Hadoop Distributed File System (HDFS)component of Apache Hadoop helps in distributed storage ofbig data with a cluster of commodity hardware. HDFS ensuresavailability of data by replicating data to different nodes.However, the replication policy of HDFS does not consider thepopularity of data. The popularity of the files tend to changeover time. Hence, maintaining a fixed replication factor willaffect the storage efficiency of HDFS. In this paper we proposean efficient dynamic data replication management system,which consider the popularity of files stored in HDFS beforereplication. This strategy dynamically classifies the files to hotdata or cold data based on its popularity and increases thereplica of hot data by applying erasure coding for cold data.The experiment results show that the proposed methodeffectively reduces the storage utilization up to 40% withoutaffecting the availability and fault tolerance in HDFS.
关键词:Big Data; Hadoop Distributed File System;Dynamic data replication.