摘要:Selecting a useful list of variables for consideration in a predictive model is a critical step in themodeling process and can result in better models. Sifting through and selecting from a long list of candidatevariables can be onerous and ineffective, particularly with the increasingly wide variety of external factors nowavailable from third-party providers. This paper explores a variety of variable selection techniques, applied tofrequency and severity models of homeowner insurance claims, developed on a dataset with over 350 initialcandidate variables. The techniques are evaluated using multiple criteria, including the predictive power of aresulting model (measured using out-of-sample data) and ease of use. A method based on Elastic Net performswell. Random selections perform as well as some more sophisticated methods, for sufficiently long shortlists.
关键词:variable selection; frequency and severity models; homeowners; Elastic Net regularization