A Linear Network Theory of Iterated Learning

Devon Jarvis, Richard Klein, Benjamin Rosman, Andrew Saxe

December 2024

Abstract

Language provides one of the primary examples of human’s ability to systematically generalize — reasoning about new situations by combining aspects of previous experiences. Consequently modern machine learning has drawn much inspiration from linguistics. A recent example is iterated learning, a procedure where generations of networks learn from the output of earlier learners. The result is a refinement of the network’s ``language’’ or output labels for given inputs towards compositional structure. Yet, studies of iterated learning and its application to machine learning have remained empirical. Here we theoretically study the emergence of compositional language, and the ability of simple neural networks to leverage this compositionality to systematically generalize. We build on prior theoretical work on linear networks, which mathematically defines systematic generalization, by extending the analysis of shallow and deep linear network learning dynamics to the iterated learning procedure by deriving exact dynamics to the learning over generations. Our results confirm a long standing conjecture: that multiple generations of iterated learning are required for compositional structure to emerge, which can outperform a single generation network trained with optimal early-stopping. Finally, we show that IL requires depth in the network architecture to be effective and that IL is able to extract modules which systematically generalize.

Type

Workshop paper

Publication

Workshop on Compositional Learning at NeurIPS

A Linear Network Theory of Iterated Learning

Abstract

Devon Jarvis

Lecturer

Richard Klein

PRIME Lab Director

Benjamin Rosman

Lab Director