What is worth measuring? The future of evaluation in the age of AI

While traditional assessments often yield boilerplate answers, AI tools are better at reproducing patterns. Instead, assessments should prioritize open-ended challenges that require innovation, such as designing novel solutions to local environmental problems or creating stories with unpredictable developments. These tasks challenge students to think beyond the templates and training data that AI relies on and fosters originality. For best results, AI tools should be used to enhance creativity and combined with human oversight. Assessment also includes how students carefully craft prompts to produce useful AI output, how students creatively iterate and improve AI-generated ideas, and how students critically evaluate the accuracy and relevance of AI-generated information. Without this fundamental rethinking of assessment, over-reliance on AI could undermine students’ foundational skills like writing and basic math.

One promising direction is to shift the focus of evaluation from the final product to the learning process. Growth processes become more important when the final product (essay, answer, creative work, etc.) becomes a less reliable indicator of student learning. This approach emphasizes documentation of thinking, repetition, and metacognitive reflection. Students may maintain process journals, submit assignments in multiple stages, participate in structured peer reviews, and participate in recorded think-aloud protocols. These process-oriented artifacts reveal how students approach problems, reconsider assumptions, integrate feedback, and develop ideas over time. These are aspects of learning that cannot be easily simulated with AI. This is also consistent with real-world problem solving, where a single performance rarely goes without revision. Instead, it typically requires exploration, collaboration, incorporating feedback, and multiple iterations. These are exactly the kinds of processes that AI cannot easily replicate when deployed as a simple answer generation tool.

Another promising direction involves evaluation through dialogue and defense. Students display difficult intellectual property rights when they have to clarify their understanding, explain their reasoning, answer unexpected questions, or defend their conclusions against challenge in real-time conversations. This approach draws inspiration from thesis defenses, Socratic seminars, and oral examinations. Modern implementations may include (i) structured interviews in which students explain key concepts and their applications, (ii) group dialogues in which students build on and challenge each other’s ideas, (iii) presentation formats in which they spontaneously respond to audience questions, and (iv) conversational assessments in which students explain their thought processes. These types of assessments are inherently difficult to outsource to AI because they require the integration of multiple cognitive and social abilities in real time. Therefore, rather than trying to discourage the use of AI, an increasingly futile effort, educators should design assessments that assess how effectively students collaborate with AI as an intellectual tool. The ability to effectively prompt, evaluate, refine, and synthesize AI-generated content represents a new form of literacy.

Yet another direction involves designing assessments around truly complex issues in students’ personal situations. When assignments require students to connect concepts to their own unique experiences, communities, and observations, a natural barrier to large-scale AI replacement is created. For example, rather than assigning a standard analysis of a literary work, students might examine how themes within the text manifest in their own communities. Instead of solving common physics problems, students can analyze physical phenomena that they have personally observed and documented. Such approaches inherently require the integration of knowledge and course concepts that are not readily available in AI systems. These contextual assessments also tend to increase student engagement. When assessments are connected to students’ lives and interests, it promotes intrinsic motivation rather than compliance.

Source link