期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2014
卷号:68
期号:1
出版社:Journal of Theoretical and Applied
摘要:Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in real world consist of both numerical and categorical data. Categorical data are not analyzed as numerical data because of the absence of inherit ordering. This paper describes about occurrence based categorical data clustering (OBCDC) technique based on cosine similarity measure and simple binary matching similarity measure. The OBCDC system consists of four modules, such as data pre-processing, similarity matrix generation, cluster formation and validation. Similarity matrix generation uses three functions, namely FrequencyComputation, OccurranceBasedCosine and OccurranceBasedSBMS. The time complexity of various algorithms are discussed and its performance on real world data are measured using accuracy and error rate