期刊名称:International Journal of Computer Science & Technology
印刷版ISSN:2229-4333
电子版ISSN:0976-8491
出版年度:2014
卷号:5
期号:1
页码:191-194
语种:English
出版社:Ayushmaan Technologies
摘要:Deduplication is the key operation in data integration from multiple data sources. To achieve higher quality information and more simplified data representation, data preprocessing is required. Data cleaning is one among the data preprocessing steps. Data cleaning includes the process of parsing, data transformation, duplicate elimination and statistical methods. If two records represent the same real world entity then it is called duplicated records. The problem of detecting and eliminating duplicate records is called record deduplication. This paper presents a new combination algorithm called Tabu Artificial Bee Colony. It improves the optimization performance in detecting and removing the duplicate records.