期刊名称:International Journal of Soft Computing & Engineering
电子版ISSN:2231-2307
出版年度:2011
卷号:1
期号:5
页码:367-371
出版社:International Journal of Soft Computing & Engineering
摘要:We propose a novel approach to classify documents into different categories using lexical chaining. In this paper we present a text categorization technique that extracts lexical features of words occurring in a document. Two kinds of lexical chains based on the WordNet and Wikipedia reference sources are created using the semantic neighborhood of tokens. The strength of each lexical chain is determined with the help of TF/IDF, category keyword strength and relative position of tokens in the document. Each category is assigned a weight depending upon the value obtained after the lexical chain computation. Fuzzy logic is incorporated to generate a range for each category using a triangular membership function. The document belongs to the category which satisfies the range criteria. Lexical chaining has large applicability in automated email spam filtering, topic spotting, email routing.