文章基本信息

标题：Efficient Ways to Improve the Performance of HDFS for Small Files
本地全文：下载
作者：Parth Gohil ; Bakul Panchal
期刊名称：Computer Engineering and Intelligent Systems
印刷版ISSN：2222-1727
电子版ISSN：2222-2863
出版年度：2014
卷号：5
期号：1
页码：45-49
语种：English
出版社：International Institute for Science, Technology Education
摘要：Hadoop , an open-source implementation of MapReduce dealing with big data is widely used for short jobs that require low response time. Facebook, Yahoo, Google etc. makes use of Hadoop to process more than 15 terabytes of new data per day. MapReduce gathers the results across the multiple nodes and return a single result or set. The fault tolerance is offered by MapReduce platform and is entirely transparent to the programmers. HDFS (Hadoop Distributed File System), is a single master and multiple slave frameworks. It is one of the core component of Hadoop and it does not perform well for small files as huge numbers of small files pose a heavy burden on NameNode of HDFS and decreasing the performance of HDFS. HDFS is a distributed file system which can process large amounts of data. It is designed to handle large files and suffers performance penalty while dealing with large number of small files. This paper introduces about HDFS, small file problems and ways to deal with it.
关键词：Hadoop; Hadoop Distributed File System; MapReduce; small files