期刊名称:Journal of Emerging Trends in Computing and Information Sciences
电子版ISSN:2079-8407
出版年度:2011
卷号:2
期号:11
页码:615-626
出版社:ARPN Publishers
摘要:Potential capital losses and heightened exposure are inherent in the usage of poor data quality management. Existing efforts like treating data as products; capturing metadata to manage data quality; statistical techniques; source calculus and algebra; data stewardship and dimensional gap analysis all failed in inculcating the contextual factors which a fuzzy in nature. The conventional manner of using information requires discrete values which are precise and devoid of ambiguity, however, this is not realizable as human being employs imprecise expression with high level of uncertainty or no clear boundaries to describe a situation e.g I am very hungry, it is going to be cloudy today. The bulk of the challenges to dirty data can be seen to stem from the “not missing, but wrong data”. These result from different data across database, ambiguous data, use of abbreviation or incomplete text and non-standard data which engulf different representation of compound data. This research employs fuzzy model to facilitate retrieval despite these myriads of dirty data problems.
关键词:Dirty data; Fuzzy search; Fuzzy string matching; Data quality