期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN:2347-6710
电子版ISSN:2319-8753
出版年度:2016
卷号:5
期号:12
页码:20618
DOI:10.15680/IJIRSET.2016.0512087
出版社:S&S Publications
摘要:There are several problems in NLP, data mining, information retrieval can be formalized as stringtransformation, which is a task as follows. Given an input string, the system generates the k similar stringscorresponding to the given string. We propose an approach to find a string using string transformation, which is bothaccurate and efficient. The approach includes the use of 0-1 Knapsack problem, a method for training the model, and analgorithm for generating the nearest string, whether there is or is not a predefined dictionary. The learning methodemploys maximum likelihood estimation for parameter estimation. The proposed method is applied to correction ofspelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale datashows that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracyand efficiency in different settings.
关键词:Natural language processing; Levenshtein distance; Knapsack problem; edit distance