首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:A REVIEW OF NAMED ENTITY RECOGNITION AND CLASSIFICATION ON UNSTRUCTURED MALAY DATA
  • 本地全文:下载
  • 作者:ROSMAYATI MOHEMAD ; NAZRATUL NAZIAH MOHD MUHAIT ; NOOR MAIZURA MOHAMAD NOOR
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2020
  • 卷号:98
  • 期号:23
  • 页码:3741-3756
  • 出版社:Journal of Theoretical and Applied
  • 摘要:In recent years, due to the emergence of various social network platforms, a massive amount of data is continuously generated and shared. The majority of the data is unstructured, which contains information that might be crucial and valuable if analyzed. Effective use of these unstructured data is a tedious and labor-intensive task. Information extraction is one of the on-going research areas to extract potentially useful information out of voluminous data. Several different techniques and methods for information extraction have been proposed to understand the content and context of any available unstructured data at the low-level structure. However, there are limited studies conducted to investigate the challenges of Named Entity Recognition and Classification (NERC) on unstructured Malay data, which is known as one of the main subtasks in information extraction. Therefore, this paper addresses a comprehensive review of the existing NERC techniques for processing unstructured Malay data along with its limitations and challenges. The contributions of this paper are twofold. The primary contribution is it presents the overview of prior studies on NERC techniques of unstructured Malay data. Second, it scrutinizes the limitations and challenges of theses existing techniques due to the voluminous, dimensionality, and heterogeneity of unstructured Malay data. The findings show that most of the previous studies using a machine learning-based approach produce a satisfactory result rather than a rule-based approach. Furthermore, the challenges in terms of the different morphological of Malay language compared to resource-rich languages such as English, limitation of Malay corpus and annotated Malay text, and Malay text ambiguities could influence the performance of Malay NERC system efficiency, which should be carefully considered during the design of the systems.
  • 关键词:Information Extraction;Malay Language;Named Entity Recognition and Classification;Unstructured Data
国家哲学社会科学文献中心版权所有