Toy Networks Blog

Recent Posts

Is induction a memorized or generalized capability?
28 May 2026
We probe whether the repetition capability of our toy transformer reflects genuine generalisation or memorisation of the training distribution. A single-token experiment reveals an apparent illusion of generalised induction, a cautionary finding for evaluations of larger LLMs.
toy-models induction
How much data does a transformer need to learn repetition?
27 May 2026
We systematically degrade the repetition signal in the training data, token by token, and row by row, and find a critical threshold below which induction heads cease to form. Even 10% of tokens in repeated sequences is enough.
training-data induction
Token distribution drives repetition learning
26 May 2026
We surgically replace the tokens inside repeated sequences with random tokens, while keeping the sequence structure fixed to investigate the impact on repetition performance.
training-data induction
Is natural language special for learning repetition?
24 May 2026
We reverse all tokens in the Pile dataset and find that a transformer trained on completely unnatural data still learns to repeat sequences suggesting linguistic structure is not required for induction head formation.
training-data induction
Repetition is surprisingly ubiquitous in tokenized natural language
23 May 2026
55% of tokens in the tokenized Pile dataset are part of repeated sequences, defined as either A or B in ...AB...AB, and we characterise the structure of those repetitions in detail.
training-data induction
An introduction to our investigation into repetition capability in toy transformer models
22 May 2026
Why we want to study repetition in toy transformer models and what we aim to investigate
toy-models induction

Is induction a memorized or generalized capability?

How much data does a transformer need to learn repetition?

Token distribution drives repetition learning

Is natural language special for learning repetition?

Repetition is surprisingly ubiquitous in tokenized natural language

An introduction to our investigation into repetition capability in toy transformer models