文章基本信息

标题：Domain Keyword Extraction Technique: A New Weighting Method Based on Frequency Analysis
本地全文：下载
作者：Rakhi Chakraborty
期刊名称：Computer Science & Information Technology
电子版ISSN：2231-5403
出版年度：2013
卷号：3
期号：2
页码：109-118
DOI：10.5121/csit.2013.3211
出版社：Academy & Industry Research Collaboration Center (AIRCC)
摘要：On-line text documents rapidly increase in size with the growth of World Wide Web. To manage such a huge amount of texts,several text miningapplications came into existence. Those applications such as search engine, text categorization, summarization, and topic detection are based on feature extraction.It is extremely time consuming and difficult task to extract keyword or feature manually.So an automated process that extracts keywords or features needs to be established.This paper proposes a new domain keyword extraction technique that includes a new weighting method on the base of the conventional TF•IDF. Term frequency-Inverse document frequency is widely used to express the documentsfeature weight, which can’t reflect the division of terms in the document, and then can’t reflect the significance degree and the difference between categories. This paper proposes a new weighting method to which a new weight is added to express the differences between domains on the base of original TF•IDF.The extracted feature can represent the content of the text better and has a better distinguished ability.
关键词：Text mining;Feature extraction;weighting method; Term Frequency Inverse Document ;Frequency (TF•IDF); Domain keyword extraction.