期刊名称:International Journal of Engineering and Computer Science
印刷版ISSN:2319-7242
出版年度:2014
卷号:3
期号:12
页码:9766-9773
出版社:IJECS
摘要:Duplicate detection consists in detecting multiple type of representations of a same object, andthat for every object represented in a database source. Duplicate detection is relevant in data cleaning anddata integration applications and has been studied extensively for relational data describing a single typeof object in a single data table. The main aim of the project is to detect the duplicate in the structureddata. Proposed system focus on a specific type of error, namely fuzzy duplicates, or duplicates for shortname .The problem of detecting duplicate entities that describe the same real-world object is an importantdata cleansing task, which is important to improve data quality. The data which stored in a flat relationhas numerous solutions to such type of problem exist.Duplicate detection, which is an important subtask of data cleaning, which includes identifying multiplerepresentations of a same real-world object. Numerous approaches are there for relational and XML data.Their goal is to either on improving the quality of the detected duplicates (effectiveness) or on savingcomputation time (efficiency)
关键词:Duplicate detection; record linkage;xml; Baysesian networks.Data cleaning; Dogmatix