文章基本信息

标题：AN OVERVIEW OF PREPROCESSING OF WEB LOG FILES FOR WEB USAGE MINING
作者：C.P. SUMATHI ; R. PADMAJA VALLI ; T. SANTHANAM 等
期刊名称：Journal of Theoretical and Applied Information Technology
印刷版ISSN：1992-8645
电子版ISSN：1817-3195
出版年度：2011
卷号：34
期号：2
页码：178-185
出版社：Journal of Theoretical and Applied
摘要：With the Internet usage gaining popularity and the steady growth of users, the World Wide Web has become a huge repository of data and serves as an important platform for the dissemination of information. The users; accesses to Web sites are stored in Web server logs. However, the data stored in the log files do not present an accurate picture of the users; accesses to the Web site. Hence, preprocessing of the Web log data is an essential and pre-requisite phase before it can be used for knowledge-discovery or mining tasks. The preprocessed Web data can then be suitable for the discovery and analysis of useful information referred to as Web mining. Web usage mining, a classification of Web mining, is the application of data mining techniques to discover usage patterns from clickstream and associated data stored in one or more Web servers. This paper presents an overview of the various steps involved in the preprocessing stage.
关键词：Web Server; Data Cleaning; User Identification; Session Identification; Path Completion