文章基本信息

标题：Extracting Structue Data From UnStructured Data Through HiveQL
本地全文：下载
作者：K. Balakrishna ; Smt. S. Jessica Saritha ; C. Penchalaiah 等
期刊名称：International Journal of Engineering and Computer Science
印刷版ISSN：2319-7242
出版年度：2015
卷号：4
期号：4
页码：11322-11331
出版社：IJECS
摘要：RDBMS can store structured data up to some GB of data. Processing of large data is very difficult to handleand also time consumption process. To overcome these problems made of using Hadoop. Apache Hadoop is aframework for big data management and analysis. The Hadoop core provides storing of structured, unstructuredand semi structured data with the Hadoop Distributed File System(HDFS) and a simple MapReduce programmingmodel to process and analyze data in comparable, the data stored in this distributed system. Apache Hive is a datawarehouse built on top of Hadoop that allows you to query and manage large sets in scattered storage space usinga SQL-like lingo call HiveQL, Hive translate queries into a series of MapReduce jobs. In existing system unstructureddata stored in HDFS can’t be retrieve into structured format through HiveQL. In this project It is converting twitterdata into a structured format by using HiveQL with SerDe. HDFS can stores twitter data by using data streamingprocess.
关键词：Big Data; Apache Hadoop; HDFS; MapReduce; Data Streaming process; HiveQL; SerDe.