首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:EFFECT OF CLUSTERING DATA IN IMPROVING MACHINE LEARNING MODEL ACCURACY
  • 本地全文:下载
  • 作者:SAMIH M. MOSTAFA ; HIROFUMI AMANO
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2019
  • 卷号:97
  • 期号:21
  • 页码:2973-2981
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Supervised machine learning algorithms consider the relationship between dependent and independent variables rather than the relationship between the instances. Machine learning algorithms try to learn the relationship between the input and output from the historical data in order to attain precise predictions about unseen future. Conventional foretelling algorithms are usually based on a model learned and trained from historical data. The instances in the historical data may vary in its characteristics. The variation may be a result of difference in case's pertinence degree to some cases compared to others. However, the problem with such machine learning algorithms is their dealing with the whole data without considering this variation. This paper presents a novel technique to the trained model to improve the prediction accuracy. The proposed method clusters the data using K-means clustering algorithm, and then applies the prediction algorithm to every cluster. The value of K which gives the highest accuracy is selected. The authors performed comparative study of the proposed technique and popular prediction methods namely Linear Regression, Ridge, Lasso, and Elastic. On analysing on five datasets with different sizes and different number of clusters, it was observed that the accuracy of the proposed technique is better from the point of view of Root Mean Square Error (RMSE), and coefficient of determination 〖(R〗^2).
  • 关键词:Prediction accuracy; K-means; clustering; regression; machine learning algorithms
国家哲学社会科学文献中心版权所有