文章基本信息

标题：Data Mining Over large Datasets using HADOOP in Cloud Environment
本地全文：下载
作者：V.Nappinna lakshmi ; N. Revathi
期刊名称：International Journal of Computer Science and Communication Networks
电子版ISSN：2249-5789
出版年度：2013
卷号：3
期号：2
页码：73-78
出版社：Technopark Publications
摘要：There is a drastic growth of data’s in the web applications and social networking and such data’s are said be as Big Data. The Hive queries with the integration of Hadoop are used to generate the report analysis for thousands of datasets. It requires huge amount of time consumption to retrieve those datasets. It lacks in performance analysis. To overcome this problem the Market Basket Analysis a very popular Data Mining Algorithm is used in Amazon cloud environment by integrating it with Hadoop Ecosystem and Hbase. The objective is to store the data persistently along with the past history of the data set and performing the report analysis of those data set. The main aim of this system is to improve performance through parallelization of various operations such as loading the data, index building and evaluating the queries. Thus the performance analysis is done with the minimum of three nodes with in the Amazon cloud environment. Hbase is a open source, non-relational and distributed database model. It runs on the top of the Hadoop. It consists of a single key with multiple values. Looping is avoided in retrieving a particular data from huge datasets and it consumes less amount of time for executing the data. HDFS file system is used to store the data after performing the map reduce operations and the execution time is decreased when the number of nodes gets increased. The performance analysis is tuned with the parameters such as the HBase Heap Memory and Caching Parameter.
关键词：HBase; Cloud computing; Hadoop ecosystem; mining algorithm