首页    期刊浏览 2024年09月19日 星期四
登录注册

文章基本信息

  • 标题:Urdu News Classification using Application of Machine Learning Algorithms on News Headline
  • 本地全文:下载
  • 作者:Muhammad Badruddin Khan
  • 期刊名称:International Journal of Computer Science and Network Security
  • 印刷版ISSN:1738-7906
  • 出版年度:2021
  • 卷号:21
  • 期号:2
  • 页码:229-237
  • DOI:10.22937/IJCSNS.2021.21.2.27
  • 出版社:International Journal of Computer Science and Network Security
  • 摘要:Our modern ‘information-hungry’ age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Na?ve Bayes (Bernoulli NB) and Multinomial Na?ve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.
  • 关键词:Text categorization; Machine learning; Na?ve Bayes; Support vector machine; Logistic regression; Word Cloud; Urdu language
国家哲学社会科学文献中心版权所有