期刊名称:International Journal of Computer Science, Engineering and Applications (IJCSEA)
印刷版ISSN:2231-0088
电子版ISSN:2230-9616
出版年度:2014
卷号:4
期号:3
DOI:10.5121/ijcsea.2014.4303
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:In a system with a large database, there always has been a problem that names may not be spelled well ormight not be spelled in a way that one expected. So, data in the database gets degraded. In this case it isrequired to search the duplicates and merge them in the single entity. In doing so, one problem is that theway in which the strings would be compared. In such cases rather than looking for exact match,approximate string matching would be appreciable. One of the string matching techniques is Phoneticmatching which is used to compare the name based on the pronunciation of the words. The similarsounding words could be retrieved from the large database using different phonetic matching algorithmand best known algorithm is Soundex algorithm. Phonetic matching is needed when many people fromdifferent culture come together. They either speak with different pronunciation or their writing habits aredifferent. This scenario is very common in India, as we have many different languages like Hindi, Gujarati,Marathi, Tamil etc. In this research work Soundex algorithm is used for Hindi and Gujarati language andapplied on the names along with their variations in order to retrieve the output with minimum false hits.
关键词:Phonetic matching; SoundEx algorithm; Name variations