期刊名称:International Journal of Advanced Computer Research
印刷版ISSN:2249-7277
电子版ISSN:2277-7970
出版年度:2016
卷号:6
期号:25
页码:138-145
出版社:Association of Computer Communication Education for National Triumph (ACCENT)
摘要:Keyword extraction is an important task in text mining. In this paper a novel, unsupervised, domain independent and language independent approach for automatic keyword extraction from single documents have been proposed. We have used the word intermediate distance vector and its mean value to extract keywords. We have compared our approach with results from the standard deviation of intermediate distances approach as standard and found that there is heavy overlapping between the results of both approaches with the advantage that our approach is faster, especially in case of long documents as it removes the need to compute the standard deviation of word intermediate distance vector. Two famous works viz. “Origin of Species” and “A Brief History of Time” to demonstrate the experimental results have been used. Experiments show that the proposed approach works almost as better as the standard deviation approach and the percentage overlap between top 30 extracted keywords is more than 50%.
关键词:Keyword extraction; Word means intermediate distance; Clustering; Standard deviation.