期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2021
卷号:118
期号:43
DOI:10.1073/pnas.2103091118
语种:English
出版社:The National Academy of Sciences of the United States of America
摘要:Significance
The remarkable development of deep learning over the past decade relies heavily on sophisticated heuristics and tricks. To better exploit its potential in the coming decade, perhaps a rigorous framework for reasoning about deep learning is needed, which, however, is not easy to build due to the intricate details of neural networks. For near-term purposes, a practical alternative is to develop a mathematically tractable surrogate model, yet maintaining many characteristics of neural networks. This paper proposes a model of this kind that we term the Layer-Peeled Model. The effectiveness of this model is evidenced by, among others, its ability to reproduce a known empirical pattern and to predict a hitherto-unknown phenomenon when training deep-learning models on imbalanced datasets.
In this paper, we introduce the
Layer-Peeled Model, a nonconvex, yet analytically tractable, optimization program, in a quest to better understand deep neural networks that are trained for a sufficiently long time. As the name suggests, this model is derived by isolating the topmost layer from the remainder of the neural network, followed by imposing certain constraints separately on the two parts of the network. We demonstrate that the Layer-Peeled Model, albeit simple, inherits many characteristics of well-trained neural networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep-learning training. First, when working on class-balanced datasets, we prove that any solution to this model forms a simplex equiangular tight frame, which, in part, explains the recently discovered phenomenon of neural collapse [V. Papyan, X. Y. Han, D. L. Donoho,
Proc. Natl. Acad. Sci. U.S.A. 117, 24652–24663 (2020)
关键词:deep learning; neural collapse; class imbalance