A Linear Network Theory of Iterated Learning

Abstract

Language provides one of the primary examples of human’s ability to systematically generalize — reasoning about new situations by combining aspects of previous experiences. Consequently modern machine learning has drawn much inspiration from linguistics. A recent example is iterated learning, a procedure where generations of networks learn from the output of earlier learners. The result is a refinement of the network’s ``language’’ or output labels for given inputs towards compositional structure. Yet, studies of iterated learning and its application to machine learning have remained empirical. Here we theoretically study the emergence of compositional language, and the ability of simple neural networks to leverage this compositionality to systematically generalize. We build on prior theoretical work on linear networks, which mathematically defines systematic generalization, by extending the analysis of shallow and deep linear network learning dynamics to the iterated learning procedure by deriving exact dynamics to the learning over generations. Our results confirm a long standing conjecture: that multiple generations of iterated learning are required for compositional structure to emerge, which can outperform a single generation network trained with optimal early-stopping. Finally, we show that IL requires depth in the network architecture to be effective and that IL is able to extract modules which systematically generalize.

Publication
Workshop on Compositional Learning at NeurIPS
Devon Jarvis
Devon Jarvis
Lecturer

I am a lecturer at Wits interested in studying systematic generalization and the emergence of modularity in the brain and machines.

Richard Klein
Richard Klein
PRIME Lab Director

I am an Associate Professor in the School of Computer Science and Applied Mathematics at the University of the Witwatersrand in Johannesburg, and a co-PI of the PRIME lab.

Benjamin Rosman
Benjamin Rosman
Lab Director

I am a Professor in the School of Computer Science and Applied Mathematics at the University of the Witwatersrand in Johannesburg. I work in robotics, artificial intelligence, decision theory and machine learning.