摘要:Problem statement: Clustering is one of the most important research areas in the field of data mining. Clustering means creating groups of objects based on their features in such a way that the objects belonging to the same groups are similar and those belonging to different groups are dissimilar. Clustering is an unsupervised learning technique. The main advantage of clustering is that interesting patterns and structures can be found directly from very large data sets with little or none of the background knowledge. Clustering algorithms can be applied in many domains. Approach: In this research, the most representative algorithms K-Means and K-Medoids were examined and analyzed based on their basic approach. The best algorithm in each category was found out based on their performance. The input data points are generated by two ways, one by using normal distribution and another by applying uniform distribution. Results: The randomly distributed data points were taken as input to these algorithms and clusters are found out for each algorithm. The algorithms were implemented using JAVA language and the performance was analyzed based on their clustering quality. The execution time for the algorithms in each category was compared for different runs. The accuracy of the algorithm was investigated during different execution of the program on the input data points. Conclusion: The average time taken by K-Means algorithm is greater than the time taken by K-Medoids algorithm for both the case of normal and uniform distributions. The results proved to be satisfactory.
关键词:K-Means clustering; K-Medoids clustering; data clustering; cluster analysis