首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Characteristics of Malay translated hadith corpus
  • 本地全文:下载
  • 作者:Siti Syakirah Sazali ; Nurazzah Abdul Rahman ; Zainab Abu Bakar
  • 期刊名称:Journal of King Saud University @?C Computer and Information Sciences
  • 印刷版ISSN:1319-1578
  • 出版年度:2022
  • 卷号:34
  • 期号:5
  • 页码:2151-2160
  • 语种:English
  • 出版社:Elsevier
  • 摘要:Annotated corpus can greatly assist in the natural language processing field. For example, computers can understand more of the document context, and indexing and clustering in information retrieval can be done precisely with less or no ambiguity of words. However, there are only a few annotated corpora in Malay language, which are not publicly shared. In this paper, we delve into analysing and annotating Malay translated hadith documents in terms of tagging and entities. There are three phases, which are manual filtering and cleaning, analysing the corpus and creating the benchmark. As the result, an analysis and benchmark of Malay translated hadith corpus were produced in term of part-of-speech and named entities tags that follows the Zipf’s law distribution.
国家哲学社会科学文献中心版权所有