期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2015
卷号:3
期号:5
DOI:10.15680/ijircce.2015.0305074
出版社:S&S Publications
摘要:Web log file is log file automatically created and maintained by a web server.Analyzing web serveraccess logs files will offer valuable insight into website usage. Because of the tremendous usage of web, the web logfiles are growing at faster rate and the size is becoming huge. Processing this explosive growth of log files usingrelational database technology has been facing a bottle neck. To analyze such large datasets we need parallel processingsystem and reliable data storage mechanism. Hadoop rides the big data where massive quantity of information isprocessed using cluster of commodity hardware. In this paper based on the architecture of Hadoop Distributed FileSystem and HadoopMapReduce framework and HiveQL query language, we present the methodology used in preprocessingof huge volume of web log files and finding the statics of website and learning the user behavior.
关键词:big data; hadoop; mapreduce; web server logs; log analysis; hive