首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:MALAY TEXT FEATURES FOR AUTOMATIC NEWS HEADLINE GENERATION
  • 本地全文:下载
  • 作者:MOHD SABRI HASAN ; SHAHRUL AZMAN MOHD NOAH ; NAZLENA MOHAMAD ALI
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2015
  • 卷号:76
  • 期号:1
  • 出版社:Journal of Theoretical and Applied
  • 摘要:The diversity of natural language for meaning representation in documents is one of the causes of information overload in information retrieval. Headline generation is an automatic text summarization technique that can reduce or address such a problem. This research discusses an experimental study on the determination of Malay language characteristics from news genre documents. A Malay news corpus comprising 140 news documents was chosen from the BERNAMA news archive. The selection criteria were limited to hard news, a word count of 50 to 250 words, published between 2007 and 2012, and with news genres of economy, crime, education, or sports only. Three Malay linguistic experts were selected to produce a reference headline for each news document manually. Experiment results identify three characteristics. First, the first two sentences of a news document are suitable candidates for the most important sentences; second, sentences that contain an acronym definition also have the potential to become the most important sentences; and third, the ideal length of a headline is six words. Considering these characteristics will generate intelligent headlines for Malay news.
  • 关键词:Headline Generation; Text Summarization; Malay News
国家哲学社会科学文献中心版权所有