EdTech Quality Resources
Research papers, guidance notes and use cases related to EdTech Quality.
Benchmarks
The world's first benchmark to test whether LLMs can pass teacher exams. Based on a set of questions from the Chilean Ministry of Education.
This benchmark tests against a subset of questions from the pedagogy benchmark specifically testing SEND pedagogy.
While leading AI models are now acing the international maths olympiad, all models are still struggling with the kind of early grade visual maths taught in low- and middle-income countries.
Visual reasoning ability is crucial in foundational numeracy where interpreting visual patterns and shapes is a key step for learning.
Guides
A brief history of artificial intelligence with all of the development steps that have got us to where we are today.
Research Papers
We made The Visual Reasoning Benchmark to test whether AI models can help with primary school visual maths. This paper details how we built it.
The big AI labs claim that their models can handle more and more languages, but how well can they actually support teaching in those languages?
As AI use for education proliferates, our priority is to ensure that AI tools are high quality. This means thinking about evidence and, in AI, about 'benchmarks'.
We mapped out what AI benchmarks currently exist for education and where the gaps are, and used this to inform our work on use case quality.
Use Cases
There is a huge gap in the evaluation of AI-generated lesson plans, so we're now working on a scalable automated method of measurement.
We are developing AI benchmark evaluations for testing the quality of AI-generated lesson plans.