AI is rapidly entering education, but its role in grading remains contested. While automation promises efficiency and consistency, questions persist about whether AI can replicate the accuracy and fairness of human evaluators. This thesis evaluates three grading models: Additive, Deductive and Tournament, tested on Artificial Owl9s platform using responses from the 2024 Portuguese national exam (Prova 85). The analysis focused on two open-ended questions with different structures: Q7, a short cutoff-based task, and Q19, a longer essay with multidimensional criteria. The models were compared with human scores using correlation, mean absolute error and agreement metrics. Results show that absolute models (Additive and Deductive) aligned more closely with human graders than the relative Tournament model. Accuracy was higher for Q19, showing that longer responses give richer evaluation signals, while strict cutoffs expose AI’s limitations. Across models, ranking responses was more reliable than reproducing exact scores, underscoring the importance of rubric anchoring and human-like evaluation logic. The study positions AI as a support tool, not a replacement, especially for ranking and formative feedback, and recommends hybrid human-AI systems for platforms such as Artificial Owl. By combining empirical evidence with practical recommendations, the thesis contributes to both theory and practice: it shows that rubric flexibility shapes AI alignment as much as task complexity, and it outlines pathways for designing grading systems that are accurate, transparent and trustworthy.
| Date of Award | 15 Oct 2025 |
|---|
| Original language | English |
|---|
| Awarding Institution | - Universidade Católica Portuguesa
|
|---|
| Supervisor | Rute Xavier (Supervisor) |
|---|
- Artificial intelligence
- Automated essay scoring (AES)
- Grading accuracy
- Evaluation models
- Hybrid systems
- Portuguese national exam
- Mestrado em Gestão e Administração de Empresas (mestrado internacional)
From potential to practice: evaluating AI grading models against human standards on Artificial Owl’s platform
Vogt, B. H. (Student). 15 Oct 2025
Student thesis: Master's Thesis