Use Case

Task-specific AI Evaluations

Task-specific AI evaluations and efficacy studies for key educational use cases such as lesson plans.

Use cases

We're working on evaluations to test several educational use cases, for now here's one example.

Use Case Quality

We are developing AI benchmark evaluations for testing the quality of AI-generated lesson plans.

Benchmarks vs evals

AI benchmarks enable us to test the core educational abilities we need from AI models. From there, we must also narrow in on specific education use cases to ensure they deliver for teachers and students.

Use case specific AI evals can be thought of as like the in-school inspection – measuring an AI solution's ability at helping teachers or students on specific tasks.

This also goes beyond the AI solution, solving the learning crisis won't happen without measuring the impacts on students' learning outcomes. Given how long it takes to carry out Randomised Control Trials (RCTs) in education, and the fast-changing environment of AI solutions where new models are released regularly, we're testing methods of carrying out rapid efficacy studies to support wider evaluation.

AI evaluations

Through our mapping of existing benchmarks and evals and ongoing research, we're identifying key gaps for task-specific AI evaluations.

“There are very few fully automated benchmarks aimed at education use cases.”

Alasdair Mackintosh

Benchmarks Lead, AI-for-Education.org

See our benchmarks

Use Case

Task-specific AI Evaluations

Use cases

Use Case Quality

Lesson plans

Benchmarks vs evals

AI evaluations

EdTech Quality

Implementation

Sign up for AI-for-Education.org news updates

Use Case

Task-specific AI Evaluations

Use cases

Use Case Quality

Lesson plans

Benchmarks vs evals

AI evaluations

Related resources

Research Paper

Mapping AI benchmarks for education