From ITS to Agentic Educational AI: A Survey on LLM-Driven Paradigm Shifts
Comprehensive survey tracing the evolution of educational AI from rule-based Intelligent Tutoring Systems through LLM-based adaptive reasoning to fully agentic AI tutors, with evaluation frameworks and future directions.
Overview
Educational AI has undergone three distinct paradigm shifts over the past three decades. The first generation — rule-based Intelligent Tutoring Systems (ITS) — used expert-defined knowledge structures and production rules to automate error diagnosis and procedural feedback. The second emerged with large language models, which replaced brittle rule sets with semantic understanding capable of handling open-ended student responses. The third, now emerging, treats AI tutors as autonomous agents that plan, use tools, and adapt over multi-session learning trajectories.
This survey provides a systematic account of all three transitions, analyzing what changed technically, what changed pedagogically, and what remains unsolved.
Structure and Coverage
The survey is organized around three evolutionary stages, examined through both technical and educational theory lenses.
Rule-Based ITS (AutoTutor, OATutor): These systems excelled at structured problem-solving but failed at natural language understanding and metacognitive feedback. Their architecture — domain model, student model, pedagogical module, and interface — remains the foundational vocabulary of the field, even as implementations have changed entirely.
LLM-Based Adaptive Reasoning: Models such as LearnLM, SocraticLM, and TeachTune demonstrate how LLMs can support explanation generation, Socratic questioning, and learner-adaptive scaffolding. The survey analyzes how constructivism, cognitive apprenticeship, metacognition, and social learning theory map onto the design choices made in these systems, and documents evaluation results from recent benchmark studies.
Agentic Educational AI: The frontier involves agents that perceive learner state, select pedagogical strategies, use external tools (search, code execution, assessment rubrics), and maintain coherent behavior across sessions. The survey examines planning-capable architectures (ReAct, tool-augmented LLMs), multi-agent tutoring ecosystems, and the distinct evaluation challenges that arise when the AI is no longer responding to isolated prompts but managing an ongoing educational relationship.
Evaluation Framework Shifts
A significant contribution of the survey is documenting the transition from accuracy-based evaluation (is the answer correct?) to process-quality evaluation (is the pedagogical interaction effective?). New benchmarks that capture reasoning quality, Socratic faithfulness, and pedagogical effectiveness are reviewed, including criteria for measuring Zone of Proximal Development alignment and metacognitive support.
Future Directions
The survey identifies three high-priority research directions: Agentic RAG for education (grounding tutoring in curriculum materials through retrieval-augmented generation), metacognitive feedback loops (where the tutor explicitly models and develops student self-regulation), and self-improving AI tutors (systems that update their pedagogical models based on interaction outcomes across learner populations).
Significance
This is among the first surveys to frame educational AI evolution explicitly through the lens of agentic AI, connecting the pedagogical theory literature to cutting-edge LLM architecture research. By mapping where the field has been and where the hard problems remain, it provides a structured roadmap for researchers building the next generation of intelligent learning environments.
Published through the Korean Association of Computer Education (KACE), 2025–2026.