文章基本信息

标题：Variable Importance Plots—An Introduction to the vip Package
本地全文：下载
作者：Brandon M. Greenwell ; Bradley C. Boehmke
期刊名称：R News
印刷版ISSN：1609-3631
出版年度：2020
卷号：12
期号：1
页码：343-366
语种：English
出版社：The R Foundation for Statistical Computing
摘要：In the era of “big data”, it is becoming more of a challenge to not only build state-of-the-artpredictive models, but also gain an understanding of what’s really going on in the data. For example,it is often of interest to know which, if any, of the predictors in a fitted model are relatively influentialon the predicted outcome. Some modern algorithms—like random forests (RFs) and gradient boosteddecision trees (GBMs)—have a natural way of quantifying the importance or relative influence ofeach feature. Other algorithms—like naive Bayes classifiers and support vector machines—are notcapable of doing so and model-agnostic approaches are generally used to measure each predictor’simportance. Enter vip , an R package for constructing variable importance scores/plots for manytypes of supervised learning algorithms using model-specific and novel model-agnostic approaches.We’ll also discuss a novel way to display both feature importance and feature effects together usingsparklines, a very small line chart conveying the general shape or variation in some feature that canbe directly embedded in text or tables.