Research Paper

The "Spatial Ceiling": Can AI handle primary school visual problems?

10 February 2026

Mohamed Huti, Alasdair Mackintosh, Amy Waldock, Dominic Andrews, Maxime Lelièvre, Moritz Boos, Tobias Murray, Paul Atherton, Robin A. A. Ince, Oliver G. B. Garrod

The spatial ceiling hero

While AI models now exceed human performance on text-based tests, they frequently stumble when the answer is hidden in a picture. The Visual Reasoning Benchmark is a new dataset of 701 authentic questions sourced from primary school exams in Zambia and India.

Unlike other benchmarks that help the AI with text descriptions, The Visual Reasoning Benchmark uses unedited, minimal-text images to see if models can handle the messy, real-world visual problems students face every day.

Our research identifies a "jagged frontier" of AI capability: models are becoming experts at counting and scaling objects, but they hit a "spatial ceiling" when asked to mentally fold, rotate or reflect a shape.

Learn More

See our benchmarks leaderboard for up-to-date model performance against our AI for education benchmarks for LMICs.