期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2013
卷号:1
期号:2
出版社:S&S Publications
摘要:Data mining or the analysis phase of the knowledge discovery process is the computational process ofdiscovering patterns in large data sets that involves methods at the intersection of artificial intelligence, machinelearning, statistics, and database system. The classical goal of the data mining and machine learning process is to fetchand extract information from a data set and transform it into an understandable structure for further use. Besides rawanalysis step, it involves database and data management aspects, data preprocessing, model and inferenceconsiderations, interestingness metrics, complexity considerations, post-processing of discovered structures,visualization, and online updating. Web Usage Mining is the type of data mining technique to discover interesting usagepatterns from web data, in order to discover useful pattern and better serve the needs of web-based applications. Usagedata captures the identity or origin of web users along with their browsing behavior at a web site. Web usage miningitself may be classified further depending on the kind of usage data considered. They are web server data, applicationserver data and application level data. Web server data correspond to the user logs that are collected at web server.Some of the typical data collected and saved at a web server include IP addresses, page references, and access time ofthe users. In this paper a new technique is proposed to discover the web usage patterns of websites from the server logfiles with the foundation of clustering and improved Apriori algorithm.
关键词:Apriori algorithm; association rule mining; clustering; rule learning; web server log data; web usage mining