文章基本信息

标题：Extreme Multiclass Classification Criteria
本地全文：下载
作者：Anna Choromanska ; Ish Kumar Jain
期刊名称：Computation
电子版ISSN：2079-3197
出版年度：2019
卷号：7
期号：1
页码：16-34
DOI：10.3390/computation7010016
出版社：MDPI Publishing
摘要：We analyze the theoretical properties of the recently proposed objective function for efficient online construction and training of multiclass classification trees in the settings where the label space is very large. We show the important properties of this objective and provide a complete proof that maximizing it simultaneously encourages balanced trees and improves the purity of the class distributions at subsequent levels in the tree. We further explore its connection to the three well-known entropy-based decision tree criteria, i.e., Shannon entropy, Gini-entropy and its modified variant, for which efficient optimization strategies are largely unknown in the extreme multiclass setting. We show theoretically that this objective can be viewed as a surrogate function for all of these entropy criteria and that maximizing it indirectly optimizes them as well. We derive boosting guarantees and obtain a closed-form expression for the number of iterations needed to reduce the considered entropy criteria below an arbitrary threshold. The obtained theorem relies on a weak hypothesis assumption that directly depends on the considered objective function. Finally, we prove that optimizing the objective directly reduces the multi-class classification error of the decision tree.
关键词：multiclass classification; decision trees; boosting multiclass classification ; decision trees ; boosting