首页    期刊浏览 2024年11月25日 星期一
登录注册

文章基本信息

  • 标题:Corpus based Automatic Text Summarization System with HMM Tagger
  • 本地全文:下载
  • 作者:M.Suneetha ; S. Sameen Fatima
  • 期刊名称:International Journal of Soft Computing & Engineering
  • 电子版ISSN:2231-2307
  • 出版年度:2011
  • 卷号:1
  • 期号:3
  • 页码:118-123
  • 出版社:International Journal of Soft Computing & Engineering
  • 摘要:The rapid growth of the data in the Internet has overloaded the user with enormous amounts of information which is more difficult to access huge volumes of documents. Automatic text summarization technique is an important activity in the analysis of high volume text documents. Text Summarization is condensing the source text into a shorter version preserving its information content and overall meaning. In this paper a frequent term based text summarization technique with HMM tagger is designed and implemented in java. The proposed system generates a summary for a given input document based on identification and extraction of important sentences in the document. The model consists of four stages. In first stage, the system decomposes the given text into its constituent sentences, assigning the POS (tag) for each word in the text and stores the result in a table. The second stage removes the stop words, stemming the text and applying lemmatization. Feature term identification is done in third stage. Finally each sentence is ranked depending on feature terms. This stage reduced the amount of the sentences in the summary in order to produce a qualitative summary.
  • 关键词:Text Summarization; HMM Tagger; Brown;Corpus; POS tagging.
国家哲学社会科学文献中心版权所有