期刊名称:International Journal of Computer Science & Technology
印刷版ISSN:2229-4333
电子版ISSN:0976-8491
出版年度:2015
卷号:6
期号:3
页码:75-81
语种:English
出版社:Ayushmaan Technologies
摘要:Big data is the combination of large datasets and the management of this large dataset is very difficult. So, we require some new techniques to handle such huge data. The challenge is to collect or extract the data from multiple sources, process or transform it according to our analytical need and then load it for analysis, this process is known as “Extract, TransformandLoad” (ETL). In this research paper, firstly implementation of hadoop in pseudodistributed mode is done and then implement hive on hadoop to analyze the large dataset. In this paper, we consider the data from Book-Crossing dataset and take only BX-Books.csv file from dataset. Over this dataset we perform query by executing hive on command line to calculate the frequency of books which are published each year. Then, comparison of hive code is done with the mapreduce code. And, finally this paper shows that how hive is better than map reduce.