文章基本信息

标题：Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching
本地全文：下载
作者：Basel Alshaikhdeeb ; Kamsuriah Ahmad
期刊名称：Journal of Computer Science
印刷版ISSN：1549-3636
出版年度：2015
卷号：11
期号：3
页码：484-489
DOI：10.3844/jcssp.2015.484.489
出版社：Science Publications
摘要：Holistic schema matching is the process of carrying off several number of schemas as an input and outputs the correspondences among them. Treating large number of schemas may consume longer time with poor quality. Therefore, several clustering approaches have been proposed in order to reduce the search space by partitioning the data into smaller portions which can facilitate the matching process. However, there is still a demand for improving the partitioning mechanism by avoiding the random initial solutions (centroids) re-sulted from the clustering process. Such random solutions have a significant impact on the matching results. This study aims to integrate correlation clustering and agglomerative hierarchical clustering toward improving the effectiveness of holistic schema matching. The proposed integrated method avoids the random initial so-lutions and the predefined number of centroids. Several preprocessing steps have been performed with using auxiliary information (domain dictionary). The experiments have been carried out on Airfare, Auto and Book datasets from UIUC Web Integration Repository. The proposed method has been compared with K-means and K-medoids clustering methods. As a results the proposed method has outperformed K-means and K-medoids by achieving 0.9, 0.93 and 0.9 of accuracy for Airfare, Auto and Book respectively.
关键词：Schema Integration; Holistic Schema Matching; Correlation Clustering; Agglomerative Hierar-Chical Clustering