期刊名称:International Journal of Advances in Soft Computing and Its Applications
印刷版ISSN:2074-8523
出版年度:2020
卷号:12
期号:1
页码:49-64
出版社:International Center for Scientific Research and Studies
摘要:Code-switching sentence contains a mixture of two or more languages within a single constructed sentence. Code-switching is a new trend of language that is widely used in open platform such as blogs and social medias. Consequently, code-switching which has become a new challenge to natural language processing (NLP). The challenge is due to the limitation of the existing NLP systems which were designed for mono-lingual system. Therefore, a new NLP system is needed to deal with code-switching sentences. However, system that segregate code-switching sentences from mono-lingual sentences must be developed prior to the code-switching sentences are used in the NLP systems. This paper considers the segregation is essential because firstly the current NLP systems deals only with mono-lingual sentences. Secondly the current NLP systems treats switching words as meaningless thus will lead to inaccurate result. This paper segregates code-switching sentences from mono-lingual sentences using rule-based technique and dictionaries. This paper used the ratio of word presence to segregate the sentences. The rule-based technique performed with accuracy of more than 87.00% for Malay-English code-switching (MY-EN-CS) sentences.