EdTech Quality Resources

Research papers, guidance notes and use cases related to EdTech Quality.

Benchmarks

Benchmark

Visual reasoning ability is crucial in foundational numeracy where interpreting visual patterns and shapes is a key step for learning.

3 September 2025
Benchmark

This benchmark tests against a subset of questions from the pedagogy benchmark specifically testing SEND pedagogy.

31 March 2025
Benchmark

While leading AI models are now acing the International Mathematical Olympiad, all models are still struggling with the kind of early grade visual maths taught in low- and middle-income countries.

7 March 2025
Benchmark

The world's first benchmark to test whether LLMs can pass teacher exams. Based on a set of questions from the Chilean Ministry of Education.

18 December 2024

Guides

Guide

A brief history of artificial intelligence with all of the development steps that have got us to where we are today.

4 March 2024

Research Papers

Research Paper

We made The Visual Reasoning Benchmark to test whether AI models can help with primary school visual maths. This paper details how we built it.

10 February 2026
Research Paper

The big AI labs claim that their models can handle more and more languages, but how well can they actually support teaching in those languages?

12 January 2026
Research Paper

As AI use for education proliferates, our priority is to ensure that AI tools are high quality. This means thinking about evidence and, in AI, about 'benchmarks'.

21 November 2025
Research Paper

We built the Pedagogy Benchmark to fill a critical gap in assessing models' understanding of pedagogy. This paper details how we built it.

3 July 2025
Research Paper

We mapped out what AI benchmarks currently exist for education and where the gaps are, and used this to inform our work on use case quality.

11 June 2025

Use Cases

Use Case

There is a huge gap in the evaluation of AI-generated lesson plans, so we're now working on a scalable automated method of measurement.

Use Case

We are developing AI benchmark evaluations for testing the quality of AI-generated lesson plans.

6 November 2025