文章基本信息

标题：Measuring the Performance of Data Placement Structures for MapReduce-based Data Warehousing Systems
本地全文：下载
作者：S. Kami Makki ; M. Rakibul Hasan
期刊名称：International Journal of New Computer Architectures and their Applications
印刷版ISSN：2220-9085
出版年度：2018
卷号：8
期号：1
页码：11-20
DOI：10.17781/P002371
出版社：Society of Digital Information and Wireless Communications
摘要：The exponential growth of data requires systems that are able to provide a scalable and fault-tolerant infrastructure for storage and processing of vast amount of data efficiently. Hive is a MapReduce-based data warehouse for data aggregation and query analysis. This data warehousing system can arrange millions of rows of data into tables, and its data placement structures play a significant role for increasing the performance of this data warehouse. Hive also provides SQL-like language called HiveQL, which is able to compile MapReduce jobs into queries on Hadoop. In this paper, we measure the efficiency of these data placement structures (Record Columnar File (RCFile) and Optimize Record Columnar File (ORCFile)) in terms of data loading, storage and query processing using MapReduce framework. The experimental results showed the effectiveness of these data placement structures for Hive data warehousing systems.