Use Case

Task-specific AI Evaluations

Task-specific AI evaluations and efficacy studies for key educational use cases such as lesson plans.

Use cases

We're working on evaluations to test several educational use cases, for now here's one example.

banner

Use Case Quality

Lesson plans

We are developing AI benchmark evaluations for testing the quality of AI-generated lesson plans.

Learn More

Benchmarks vs evals

AI benchmarks enable us to test the core educational abilities we need from AI models. From there, we must also narrow in on specific education use cases to ensure they deliver for teachers and students.

Use case specific AI evals can be thought of as like the in-school inspection - measuring an AI solution's ability at helping teachers or students on specific tasks.

This also goes beyond the AI solution, solving the learning crisis won't happen without measuring the impacts on students' learning outcomes. Given how long it takes to carry out Randomised Control Trials (RCTs) in education, and the fast-changing environment of AI solutions where new models are released regularly, we're testing methods of carrying out rapid efficacy studies to support wider evaluation.

AI Evaluations background

AI evaluations

Through our mapping of existing benchmarks and evals and ongoing research, we're identifying key gaps for task-specific AI evaluations.

“There are very few fully automated benchmarks aimed at education use cases.”
Alasdair Mackintosh
Benchmarks Lead, AI-for-Education.org
See our benchmarks
Back to top