文章基本信息

标题：Automatic Data Cleansing Of Incorrect City Names In Spatial Databases Using LCS Algorithm
本地全文：下载
作者：M. Ben Swarup ; B. Leela Priyanka
期刊名称：International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN：2278-1323
出版年度：2014
卷号：3
期号：4
页码：1393-1396
出版社：Shri Pannalal Research Institute of Technolgy
摘要：Data cleansing algorithms can increase the quality of data while at the same time reduce the overall efforts for data collection. If the data quality is poor, wrong conclusions may be drawn from the data and the consequences may be tragic. Hence, poor data quality may lead to completely unexpected results. In this paper, Longest Common Subsequence (LCS) algorithm is implemented for automatic city name correction to cleanse a large spatial database without requiring human intervention or aprior knowledge of the context. The longest common subsequence (LCS) problem is to find the longest subsequence common to all sequences in a set of sequences (often just two). The Longest Common Subsequence algorithm achieves a precision of 90% which is significantly better than the traditional Levenshtein distance.
关键词：Data Cleansing; LCS; Levenshtein ; Distance