期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2012
卷号:1
期号:8
页码:160-163
出版社:Shri Pannalal Research Institute of Technolgy
摘要:Deduplication is the key operation in data integration from multiple data sources. To achieve higher quality information and more simplified data representation, data preprocessing is required. Data cleaning is one among the data preprocessing steps. Data cleaning includes the process of parsing, data transformation, duplicate elimination and statistical methods. If two records represent the same real world entity then it is called duplicated records. The problem of detecting and eliminating duplicate records is called record deduplication. This paper presents an analysis of record deduplication techniques and algorithms that detect and remove the duplicate records.
关键词:Deduplication; Data cleaning; Data ; preprocessing; Record Linkage; ; ; Record matching