期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2016
卷号:87
期号:1
出版社:Journal of Theoretical and Applied
摘要:Stoplist is one part of input for information retrieval system that can affect information retrieval quality. The existence of words that are not meaningful can make the retrieval declining. The standart dictionary-based information stoplist also has problem when implemented in a corpus with specific domains. For example the word "recipe" is not a stopword but when using it in domain cuisine, recipes will appear in almost every document. We build dynamic stoplist using Indonesian recipes documents, this documents has non standart dictionary-based stoplist that interesting to study. This paper use three methods to generate stoplist. We use poisson and binomial probability distribution approach and we also use simple frequency distribution approach for classifying candidate stopword. For measuring the result we also employ recently proposed RAKE algorithm. All three of these methods have the same weakness, the stoplist can be generated appropriately if the entire population of the all corpus vocabulary has processed, unlike the stoplist dictionary which can already detect stopword at the stage of pre-processing. The results of the frequency distribution is better than the other methods, but this method requires a longer process than poisson and negative binomial method.
关键词:Keyword Extraction; Indonesian Cuisine;Auto Generated Stoplist