期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2015
卷号:76
期号:1
出版社:Journal of Theoretical and Applied
摘要:The diversity of natural language for meaning representation in documents is one of the causes of information overload in information retrieval. Headline generation is an automatic text summarization technique that can reduce or address such a problem. This research discusses an experimental study on the determination of Malay language characteristics from news genre documents. A Malay news corpus comprising 140 news documents was chosen from the BERNAMA news archive. The selection criteria were limited to hard news, a word count of 50 to 250 words, published between 2007 and 2012, and with news genres of economy, crime, education, or sports only. Three Malay linguistic experts were selected to produce a reference headline for each news document manually. Experiment results identify three characteristics. First, the first two sentences of a news document are suitable candidates for the most important sentences; second, sentences that contain an acronym definition also have the potential to become the most important sentences; and third, the ideal length of a headline is six words. Considering these characteristics will generate intelligent headlines for Malay news.
关键词:Headline Generation; Text Summarization; Malay News