摘要:AbstractFault prediction in manufacturing systems has consistently been an important theme in engineering research. Data-driven methods to deliver this service are gaining momentum due to developments regarding information and communication technologies. Particularly, fault prediction may be interpreted as a supervised learning classification problem, in which algorithms trained by operational data gathered from the shop-floor are capable of informing managers whether a machine might enter in a failure state or not. Despite the relevance of this approach, implementations are hindered by several challenges. In this work, we review approaches aimed to deal with four of these challenges, namely: limited amount of training data, unbalanced training data sets, uncertainty regarding which variables should be monitored, and uncertainty regarding how exactly historical data should be employed in the algorithm’s training. To deal with training sets with limited size, learning procedures observed to perform well with a lower volume of training data can be used, such as the Random Forests technique. Alternatively, transfer learning techniques can be utilized to adapt models trained in a virtual domain with abundant synthetic data to the real manufacturing system domain. To deal with unbalance among classification classes, cost-sensitive learning methods can be employed to alter the penalties incurred when misclassifications occurs in the minority class. Alternatively, resampling methods can be applied before learning occurs. Lastly, both the decisions regarding which variables to track, and to what extent historical data should be included in the training process, can be addressed through the use of specific feature selection methods.