期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2022
卷号:119
期号:27
DOI:10.1073/pnas.2115229119
语种:English
出版社:The National Academy of Sciences of the United States of America
摘要:Significance
Unlike humans, artificial neural networks rapidly forget previously learned information when learning something new and must be retrained by interleaving the new and old items; however, interleaving all old items is time-consuming and might be unnecessary. It might be sufficient to interleave only old items having substantial similarity to new ones. We show that training with similarity-weighted interleaving of old items with new ones allows deep networks to learn new items rapidly without forgetting, while using substantially less data. We hypothesize how similarity-weighted interleaving might be implemented in the brain using persistent excitability traces on recently active neurons and attractor dynamics. These findings may advance both neuroscience and machine learning.
Understanding how the brain learns throughout a lifetime remains a long-standing challenge. In artificial neural networks (ANNs), incorporating novel information too rapidly results in catastrophic interference, i.e., abrupt loss of previously acquired knowledge. Complementary Learning Systems Theory (CLST) suggests that new memories can be gradually integrated into the neocortex by interleaving new memories with existing knowledge. This approach, however, has been assumed to require interleaving all existing knowledge every time something new is learned, which is implausible because it is time-consuming and requires a large amount of data. We show that deep, nonlinear ANNs can learn new information by interleaving only a subset of old items that share substantial representational similarity with the new information. By using such similarity-weighted interleaved learning (SWIL), ANNs can learn new information rapidly with a similar accuracy level and minimal interference, while using a much smaller number of old items presented per epoch (fast and data-efficient). SWIL is shown to work with various standard classification datasets (Fashion-MNIST, CIFAR10, and CIFAR100), deep neural network architectures, and in sequential learning frameworks. We show that data efficiency and speedup in learning new items are increased roughly proportionally to the number of nonoverlapping classes stored in the network, which implies an enormous possible speedup in human brains, which encode a high number of separate categories. Finally, we propose a theoretical model of how SWIL might be implemented in the brain.