首页    期刊浏览 2024年11月09日 星期六
登录注册

文章基本信息

  • 标题:Measuring the Performance of Data Placement Structures for MapReduce-based Data Warehousing Systems
  • 本地全文:下载
  • 作者:S. Kami Makki ; M. Rakibul Hasan
  • 期刊名称:International Journal of New Computer Architectures and their Applications
  • 印刷版ISSN:2220-9085
  • 出版年度:2018
  • 卷号:8
  • 期号:1
  • 页码:11-20
  • DOI:10.17781/P002371
  • 出版社:Society of Digital Information and Wireless Communications
  • 摘要:The exponential growth of data requires systems that are able to provide a scalable and fault-tolerant infrastructure for storage and processing of vast amount of data efficiently. Hive is a MapReduce-based data warehouse for data aggregation and query analysis. This data warehousing system can arrange millions of rows of data into tables, and its data placement structures play a significant role for increasing the performance of this data warehouse. Hive also provides SQL-like language called HiveQL, which is able to compile MapReduce jobs into queries on Hadoop. In this paper, we measure the efficiency of these data placement structures (Record Columnar File (RCFile) and Optimize Record Columnar File (ORCFile)) in terms of data loading, storage and query processing using MapReduce framework. The experimental results showed the effectiveness of these data placement structures for Hive data warehousing systems.
国家哲学社会科学文献中心版权所有