首页    期刊浏览 2025年06月19日 星期四
登录注册

文章基本信息

  • 标题:Extracting Structue Data From UnStructured Data Through HiveQL
  • 本地全文:下载
  • 作者:K. Balakrishna ; Smt. S. Jessica Saritha ; C. Penchalaiah
  • 期刊名称:International Journal of Engineering and Computer Science
  • 印刷版ISSN:2319-7242
  • 出版年度:2015
  • 卷号:4
  • 期号:4
  • 页码:11322-11331
  • 出版社:IJECS
  • 摘要:RDBMS can store structured data up to some GB of data. Processing of large data is very difficult to handleand also time consumption process. To overcome these problems made of using Hadoop. Apache Hadoop is aframework for big data management and analysis. The Hadoop core provides storing of structured, unstructuredand semi structured data with the Hadoop Distributed File System(HDFS) and a simple MapReduce programmingmodel to process and analyze data in comparable, the data stored in this distributed system. Apache Hive is a datawarehouse built on top of Hadoop that allows you to query and manage large sets in scattered storage space usinga SQL-like lingo call HiveQL, Hive translate queries into a series of MapReduce jobs. In existing system unstructureddata stored in HDFS can’t be retrieve into structured format through HiveQL. In this project It is converting twitterdata into a structured format by using HiveQL with SerDe. HDFS can stores twitter data by using data streamingprocess.
  • 关键词:Big Data; Apache Hadoop; HDFS; MapReduce; Data Streaming process; HiveQL; SerDe.
国家哲学社会科学文献中心版权所有