Use Case
Task-specific AI Evaluations
Task-specific AI evaluations and efficacy studies for key educational use cases such as lesson plans.
Use cases
We're working on evaluations to test several educational use cases, for now here's one example.

Use Case Quality
Lesson plans
We are developing AI benchmark evaluations for testing the quality of AI-generated lesson plans.
Benchmarks vs evals
AI benchmarks enable us to test the core educational abilities we need from AI models. From there, we must also narrow in on specific education use cases to ensure they deliver for teachers and students.
Use case specific AI evals can be thought of as like the in-school inspection - measuring an AI solution's ability at helping teachers or students on specific tasks.
This also goes beyond the AI solution, solving the learning crisis won't happen without measuring the impacts on students' learning outcomes. Given how long it takes to carry out Randomised Control Trials (RCTs) in education, and the fast-changing environment of AI solutions where new models are released regularly, we're testing methods of carrying out rapid efficacy studies to support wider evaluation.
AI evaluations
Through our mapping of existing benchmarks and evals and ongoing research, we're identifying key gaps for task-specific AI evaluations.
“There are very few fully automated benchmarks aimed at education use cases.”
Related resources
Here are some additional resources you may find useful.

Research Paper
Mapping AI benchmarks for education
We mapped out what AI benchmarks currently exist for education and where the gaps are and used this to inform our work on use case quality.