期刊名称:International Journal of Data Mining & Knowledge Management Process
印刷版ISSN:2231-007X
电子版ISSN:2230-9608
出版年度:2013
卷号:3
期号:5
DOI:10.5121/ijdkp.2013.3503
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:Data fusion in the virtual data integration environment starts after detecting and clustering duplicated records from the different integrated data sources. It refers to the process of selecting or fusing attribute values from the clustered duplicates into a single record representing the real world object. In this paper, a statistical technique for data fusion is introduced based on some probabilistic scores from both data sources and clustered duplicates