文章基本信息

标题：Corpus based Automatic Text Summarization System with HMM Tagger
本地全文：下载
作者：M.Suneetha ; S. Sameen Fatima
期刊名称：International Journal of Soft Computing & Engineering
电子版ISSN：2231-2307
出版年度：2011
卷号：1
期号：3
页码：118-123
出版社：International Journal of Soft Computing & Engineering
摘要：The rapid growth of the data in the Internet has overloaded the user with enormous amounts of information which is more difficult to access huge volumes of documents. Automatic text summarization technique is an important activity in the analysis of high volume text documents. Text Summarization is condensing the source text into a shorter version preserving its information content and overall meaning. In this paper a frequent term based text summarization technique with HMM tagger is designed and implemented in java. The proposed system generates a summary for a given input document based on identification and extraction of important sentences in the document. The model consists of four stages. In first stage, the system decomposes the given text into its constituent sentences, assigning the POS (tag) for each word in the text and stores the result in a table. The second stage removes the stop words, stemming the text and applying lemmatization. Feature term identification is done in third stage. Finally each sentence is ranked depending on feature terms. This stage reduced the amount of the sentences in the summary in order to produce a qualitative summary.
关键词：Text Summarization; HMM Tagger; Brown;Corpus; POS tagging.