文章基本信息

标题：Review on Query Focused Summarization using TF-IDF, K-Mean Clustering and HMM
本地全文：下载
作者：Sonali Gandhi ; Praveen Sharma
期刊名称：International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN：2320-9798
电子版ISSN：2320-9801
出版年度：2017
卷号：5
期号：4
页码：6974
DOI：10.15680/IJIRCCE.2017.0504059
出版社：S&S Publications
摘要：Numerous approaches for distinguishing necessary content for automatic text summarization aredeveloped till date. Query focused summarization illustration approach is first derive from an intermediate illustrationof the text that captures the topics mentioned within the input and supported these illustration of topics, sentenceswithin the input document whereas impact or influence measure scored for importance factors are be calculated. Indistinction with machine learning indicator illustration approaches, the text is represented by a various set of doableindicators of importance that inculcate at discovering interestingness. These indicators of machine learning approachedmeasure and combined various techniques which finally optimize the text based on various choices and select the mosteffective set of sentences to create a outline ort summary. Subsequently, in this scheme we propose the effectivetechnique using TF-IDF, K-Mean Clustering and Hidden Markov Model with amalgamation to produce enhances betterQuery Focused Summarization Model for better ready reference and perusal.In query-focused summarization, the importance of each sentence will be determined by a combination of two factors:how relevant is that sentence to the user question and how important is the sentence in the context of the input in whichit appears. There are two classes of approaches to this problem. The first adapts techniques for generic summarizationof news. For example, an approach using topic signature words is extended for query-focused summarization byassuming that the words that should appear in a summary have the following probability: a word has probability zero ofappearing in a summary for a user defined topic if it neither appears in the user query nor is a topic signature word forthe input; the probability of the word to appear in the summary is five percent if it either appears in the user query or isa topic signature, but not both; and the probability of a word to appear in a summary is, if it is both in the user queryand in the list of topic signature words for the input. These probabilities are arbitrarily chosen, but in fact work wellwhen used to assign weights to sentences equal to the average probability of words in the sentence. Cluster basedapproaches have also been adapted for query-focused summarization with technical modifications. In the scheme wepropose new mechanism with existing artifacts for identifying relevant and salient sentences.
关键词：Term Frequency – Inverse Document Frequency (TF-IDF); Machine Learning (ML); Web Mining;K-Mean Clustering; Hidden Markov Model (HMM).