期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2011
卷号:8
期号:4
出版社:IJCSI Press
摘要:Field Association (FA) words or phrases are serving to identify document fields by reading only some specific words. Document fields can be decided efficiently if there are many rank 1 FA words (words that direct connect to terminal fields) and if the frequency rate is high. This paper proposes a new method for increasing rank 1 FA words using declinable words and concurrent words which relate to narrow association categories and eliminate FA word ambiguity. Concurrent words become Concurrent Field Association Words (CFA words) if there is a little field overlap. Usually, efficient CFA words are difficult to extract using only frequency, so this paper proposes weighting according to degree of importance of concurrent words. The new weighting method causes Precision and Recall to be significantly increased by 30% and 40% than by using frequency alone. Moreover, combining CFA words with FA words allow our new system to append automatically around 28% of CFA words to the existence FA word Dictionary. Furthermore, Recall is improved by 21% over the recall of the traditional method.
关键词:FA Words; Declinable Words; Concurrent Words; CFA words; Recall; Precision