文章基本信息

标题：A Set of Comprehensive Evaluation System for Different Data Augmentation Methods
本地全文：下载
作者：Can Zhang ; Xu Zhang ; Dawei Tu 等
期刊名称：Mobile Information Systems
印刷版ISSN：1574-017X
出版年度：2022
卷号：2022
DOI：10.1155/2022/8572852
语种：English
出版社：Hindawi Publishing Corporation
摘要：Data augmentation is an effective method to prevent model overfitting in deep learning, especially in medical image classification where data samples are small and difficult to obtain. In recent years, different data augmentation methods, such as those based on single data transformation, multiple data mixing, and learning data distribution, have been proposed one after another, but there has never been a systematic system to evaluate various data augmentation methods. An impartial and comprehensive data augmentation evaluation system not only can assess the benefits and drawbacks of existing augmentation approaches in a specific medical image classification but also can provide an effective research direction for the subsequent proposal of new medical image data augmentation methods, thereby advancing the development of auxiliary diagnosis technology based on medical images. Therefore, this paper proposes an objective and universal evaluation system for different data augmentation methods. In this method, different augmented methods are evaluated objectively and comprehensively in terms of classification accuracy and data diversity by using existing large public data sets. The method is universal and easy to operate. To imitate the prevalent small-sized data sets in deep learning, an equal-interval sampling technique based on similarity ranking is presented to select samples from large public data sets and construct a subset that can fully reflect the original set. The augmented data sets are then created using various data augmentation approaches based on the small-sized data sets. Finally, different data augmentation strategies are objectively and fully evaluated based on the comprehensive scores of classification accuracy and data diversity following data augmentation. The validity and feasibility of the suggested sampling method and assessment system in this study are demonstrated by experimental findings on numerous data sets.