EdTech Quality Resources

Research papers, guidance notes and use cases related to EdTech Quality.

Benchmarks

Benchmark

Visual reasoning ability is crucial in foundational numeracy where interpreting visual patterns and shapes is a key step for learning.

Benchmark

This benchmark tests against a subset of questions from the pedagogy benchmark specifically testing SEND pedagogy.

Benchmark

While leading AI models are now acing the International Mathematical Olympiad, all models are still struggling with the kind of early grade visual maths taught in low- and middle-income countries.

Benchmark

The world's first benchmark to test whether LLMs can pass teacher exams. Based on a set of questions from the Chilean Ministry of Education.

Guides

Guide

A brief history of artificial intelligence with all of the development steps that have got us to where we are today.

Research Papers

Research Paper

We made The Visual Reasoning Benchmark to test whether AI models can help with primary school visual maths. This paper details how we built it.

Research Paper

The big AI labs claim that their models can handle more and more languages, but how well can they actually support teaching in those languages?

Research Paper

As AI use for education proliferates, our priority is to ensure that AI tools are high quality. This means thinking about evidence and, in AI, about 'benchmarks'.

Research Paper

We built the Pedagogy Benchmark to fill a critical gap in assessing models' understanding of pedagogy. This paper details how we built it.

Research Paper

We mapped out what AI benchmarks currently exist for education and where the gaps are, and used this to inform our work on use case quality.

Use Cases

Use Case

There is a huge gap in the evaluation of AI-generated lesson plans, so we're now working on a scalable automated method of measurement.

Use Case

We are developing AI benchmark evaluations for testing the quality of AI-generated lesson plans.

EdTech Quality Resources

Benchmarks

Visual Reasoning Benchmark

SEND Pedagogy Benchmark

Visual Maths Benchmark

Pedagogy Benchmark

Guides

Introduction to AI

Research Papers

The "Spatial Ceiling": Can AI handle primary school visual problems?

Talking teachers' language: Testing multilingual pedagogy ability of AI

Context counts: Measuring how AI reflects local realities in education

Benchmarking the Pedagogical Knowledge of Large Language Models

Mapping AI Benchmarks for Education: What exists and where are the gaps for measuring AI output quality?

Use Cases

Task-specific AI Evaluations

Lesson Plans

EdTech Quality

Implementation

Sign up for AI-for-Education.org news updates