摘要:In the era of “big data”, it is becoming more of a challenge to not only build state-of-the-artpredictive models, but also gain an understanding of what’s really going on in the data. For example,it is often of interest to know which, if any, of the predictors in a fitted model are relatively influentialon the predicted outcome. Some modern algorithms—like random forests (RFs) and gradient boosteddecision trees (GBMs)—have a natural way of quantifying the importance or relative influence ofeach feature. Other algorithms—like naive Bayes classifiers and support vector machines—are notcapable of doing so and model-agnostic approaches are generally used to measure each predictor’simportance. Enter vip , an R package for constructing variable importance scores/plots for manytypes of supervised learning algorithms using model-specific and novel model-agnostic approaches.We’ll also discuss a novel way to display both feature importance and feature effects together usingsparklines, a very small line chart conveying the general shape or variation in some feature that canbe directly embedded in text or tables.