首页    期刊浏览 2025年06月02日 星期一
登录注册

文章基本信息

  • 标题:AI Testing: Ensuring a Good Data Split Between Data Sets (Training and Test) using K-means Clustering and Decision Tree Analysis
  • 本地全文:下载
  • 作者:Kishore Sugali ; Chris Sprunger ; Venkata N Inukollu
  • 期刊名称:International Journal on Soft Computing
  • 电子版ISSN:2229-7103
  • 出版年度:2021
  • 卷号:12
  • 期号:1
  • 页码:1-11
  • DOI:10.5121/ijsc.2021.12101
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:Artificial Intelligence and Machine Learning have been around for a long time. In recent years, there has been a surge in popularity for applications integrating AI and ML technology. As with traditional development, software testing is a critical component of a successful AI/ML application. The development methodology used in AI/ML contrasts significantly from traditional development. In light of these distinctions, various software testing challenges arise. The emphasis of this paper is on the challenge of effectively splitting the data into training and testing data sets. By applying a k-Means clustering strategy to the data set followed by a decision tree, we can significantly increase the likelihood of the training data set to represent the domain of the full dataset and thus avoid training a model that is likely to fail because it has only learned a subset of the full data domain.
  • 其他摘要:Artificial Intelligence and Machine Learning have been around for a long time. In recent years, there has been a surge in popularity for applications integrating AI and ML technology. As with traditional development, software testing is a critical component of a successful AI/ML application. The development methodology used in AI/ML contrasts significantly from traditional development. In light of these distinctions, various software testing challenges arise. The emphasis of this paper is on the challenge of effectively splitting the data into training and testing data sets. By applying a k-Means clustering strategy to the data set followed by a decision tree, we can significantly increase the likelihood of the training data set to represent the domain of the full dataset and thus avoid training a model that is likely to fail because it has only learned a subset of the full data domain.
国家哲学社会科学文献中心版权所有