文章基本信息

标题：A Comparative Study of Mining Web Usage Patterns Using Variants of k-Means ClusteringAlgorithm
本地全文：下载
作者：Zahid Ansari ; A. Vinaya Babu ; Waseem Ahmedand Mohammed Fazle Azeem 等
期刊名称：International Journal of Computer Science and Information Technologies
电子版ISSN：0975-9646
出版年度：2011
卷号：2
期号：4
页码：1407-1413
出版社：TechScience Publications
摘要：The explosive growth in the information available on the Web has prompted the need for developing Web personalization systems that understand and exploit user preferences to dynamically serve customized content to individual users [1]. To reveal information about user preferences from Web usage data, Data Mining techniques can be naturally applied, leading to the so-called Web Usage Mining (WUM) [2]. Clustering is widely used in WUM to capture similar interests and trends among users accessing a Web site [3]. k-Means clustering is a popular clustering algorithm based on the partitioning of data. However one of the drawbacks of it is that it requires the user to specify the number of clusters at the beginning and also it is sensitive to the initial selection of cluster centres. The global k-Means algorithm proposed by Likas [4] provides an incremental approach to clustering by dynamically adding one cluster centre at a time through a deterministic global search procedure. It does not depend on any initial conditions and considerably outperforms the k-Means algorithms, but the problem associated with this algorithm is its heavy computational effort. A faster version of global k-Means algorithm substantially reduces the execution time by improving the way of creating the next cluster centre in the global k-Means algorithm. We implemented and tested these algorithms against the web usage data in order to discover the user navigational session clusters. In this paper we present the implementation details of each of the above mentioned k-Means clustering techniques along with the underlying mathematical foundations. The results are presented with a comparison of different techniques. Our results show that the fast global k-Means clustering algorithm significantly reduces the computational time without affecting the performance of the global k-Means algorithm. It also outperforms the global K-meansalgorithm. It also outperforms the global K-means algorithm in terms of validity measure.
关键词：web usage mining; k-Means clustering; global k-;Means clustering; fast global k-Means clustering.