首页    期刊浏览 2025年12月05日 星期五
登录注册

文章基本信息

  • 标题:Efficient Ways to Improve the Performance of HDFS for Small Files
  • 本地全文:下载
  • 作者:Parth Gohil ; Bakul Panchal
  • 期刊名称:Computer Engineering and Intelligent Systems
  • 印刷版ISSN:2222-1727
  • 电子版ISSN:2222-2863
  • 出版年度:2014
  • 卷号:5
  • 期号:1
  • 页码:45-49
  • 语种:English
  • 出版社:International Institute for Science, Technology Education
  • 摘要:Hadoop , an open-source implementation of MapReduce dealing with big data is widely used for short jobs that require low response time. Facebook, Yahoo, Google etc. makes use of Hadoop to process more than 15 terabytes of new data per day. MapReduce gathers the results across the multiple nodes and return a single result or set. The fault tolerance is offered by MapReduce platform and is entirely transparent to the programmers. HDFS (Hadoop Distributed File System), is a single master and multiple slave frameworks. It is one of the core component of Hadoop and it does not perform well for small files as huge numbers of small files pose a heavy burden on NameNode of HDFS and decreasing the performance of HDFS. HDFS is a distributed file system which can process large amounts of data. It is designed to handle large files and suffers performance penalty while dealing with large number of small files. This paper introduces about HDFS, small file problems and ways to deal with it.
  • 关键词:Hadoop; Hadoop Distributed File System; MapReduce; small files
国家哲学社会科学文献中心版权所有