How I built a controllable agent pipeline to reduce hallucinations and improve usability in education

Game-based learning is often presented as an effective way to increase engagement and improve retention. In practice, however, many educators struggle to adopt it because creating high-quality game content is time-consuming and technically demanding.

In this project, I built a bounded multi-agent system (MAS) that supports teachers in generating usable quiz-based learning games from their own material. The system takes a PDF and a topic as input, produces a structured quiz with explanations and difficulty labels, validates the output for consistency, and delivers the final content to a Unity WebGLlearning game through an API.

My focus is not to build an autonomous “black box,” but a reliable and controllable pipeline that turns human input into game-ready educational content.

Introduction

Game-based learning is widely regarded as a promising approach to increase learner engagement and support knowledge retention. Despite this potential, its adoption in everyday educational practice remains limited. One of the main reasons is the effort required to create high-quality, didactically meaningful game content. Teachers often face time constraints, lack technical expertise, or struggle to adapt existing learning material into interactive formats.

In this project, I designed and implemented a bounded multi-agent system (MAS) that supports the creation of quiz-based learning games from human-provided input. The system allows educators to upload their own learning material—typically in the form of a PDF—define a topic and basic parameters, and receive a validated, game-ready quiz that can be played in a Unity-based learning environment.

The goal of this system is not full automation of teaching or content design. Instead, the focus lies on building a reliable, controllable, and transparent agent pipeline that assists teachers without replacing their pedagogical judgment. Special attention was given to reducing hallucinations, improving output consistency, and ensuring that all generated content remains grounded in the provided learning material.

What I Built: A Bounded Multi-Agent System for Educational Game Content

What the system does

At its core, the system I built is a bounded multi-agent pipeline that transforms educational source material into playable quiz-based learning content. The input typically consists of a PDF document provided by a teacher, combined with a topic description and a small set of parameters such as the desired number of questions. From this input, the system generates a structured quiz that includes multiple-choice questions, answer options, explanations, hints, and difficulty levels. The final output is delivered through an API and directly consumed by a Unity WebGL learning game.

The system is designed as an end-to-end pipeline rather than a single generative model. Each step—from content analysis to quiz generation and validation—is handled by a dedicated agent with a clearly defined responsibility. This modular structure makes the overall process easier to understand, debug, and extend.

What “bounded” means in this context

A central design decision was to implement the system as a bounded multi-agent system. In this context, bounded means that each agent operates strictly within a predefined scope. Agents do not modify their own behavior, do not learn autonomously beyond their assigned task, and do not act without explicit orchestration by the system.

This boundedness is intentional. In educational applications, unpredictability is a serious risk. Teachers need to understand how content is produced, why a specific question was generated, and on which source material it is based. By limiting agent autonomy and enforcing clear interfaces between agents, the system remains transparent, predictable, and controllable.

Why bounded agents matter in education

Many AI-driven educational tools rely on large, monolithic models that generate content in a single step. While this can be powerful, it often leads to problems such as hallucinated facts, inconsistent difficulty levels, or content that is not clearly grounded in the original learning material.

By contrast, the bounded multi-agent approach prioritizes reliability over creativity. Each agent contributes a small, verifiable transformation: extracting relevant concepts, generating questions based on retrieved context, or validating outputs against the source material. This design significantly reduces the risk of hallucinations and makes the system suitable for real educational settings, where correctness and trust are more important than novelty.

System Architecture: Separating Gameplay from AI Logic

High-level architectural design

The system follows a strict client–server architecture with a clear separation of responsibilities between the frontend and the backend. This separation was a deliberate design choice to ensure robustness, maintainability, and practical deployability.

The frontend is implemented as a Unity WebGL application and focuses exclusively on user interaction and gameplay. It handles tasks such as user input, file uploads, and the presentation of quiz questions during the game. Importantly, the frontend contains no AI logic. It does not generate content, make pedagogical decisions, or process learning material beyond displaying results.

All intelligence is located on the backend, which is implemented as a Python-based web API. This backend hosts the complete multi-agent system and is responsible for orchestrating agents, processing documents, generating quizzes, validating outputs, and managing persistent data.

Why this separation matters

This architectural separation offers several practical advantages. First, it allows the AI system to be developed, tested, and debugged independently of the game client. During development, the backend could be fully tested via direct API calls without any frontend connected, which significantly simplified iteration and error tracing.

Second, the frontend remains lightweight and replaceable. Although Unity WebGL is used in the current implementation, the same backend could serve other clients, such as mobile applications or web-based dashboards, without any changes to the multi-agent logic.

Finally, from an educational perspective, this separation increases trust. Teachers interact with a game-like interface, while all complex AI processes remain hidden but controlled on the server side. This reduces cognitive load for users and avoids exposing technical complexity where it is not needed.

Backend responsibilities and external services

The backend acts as the central control layer of the system. It receives user input, validates requests, and coordinates the execution of individual agents in a predefined sequence. It also manages all data storage, including learning materials, generated quizzes, and student responses.

To support these tasks, the backend integrates several external services. A database is used for persistent storage, a vector database enables semantic retrieval of relevant content from uploaded documents, and a large language model is used for natural language processing tasks within the agents. Access to all external services is strictly limited to the backend, ensuring that the frontend never communicates directly with AI models or databases.

The Multi-Agent Pipeline: From PDF to Playable Game

This section describes the core workflow of the system. The pipeline is designed as a sequence of well-defined steps, each executed by a specialized agent. Together, these steps transform raw educational material into a validated quiz that can be played directly in the game.


Step 1: Teacher input and document upload

The process begins with human input. A teacher uploads a PDF document that contains the learning material and specifies a topic and basic parameters, such as the number of questions to generate. This input defines both the content boundaries and the pedagogical focus of the quiz.At this stage, no interpretation or generation takes place. The system only validates the input and prepares it for processing. This ensures that all subsequent steps operate on clearly defined, teacher-controlled data.


Step 2: Curriculum extraction and content preparation

Once the document is uploaded, the Curriculum Agent processes the full text of the PDF. The document is split into manageable chunks, which are then analyzed individually. For each chunk, the agent extracts concise summaries, key concepts, and keywords.These structured representations are stored in a vector database. This step is crucial: instead of passing the entire document to later agents, the system builds a semantic representation that allows precise and context-aware retrieval. This reduces noise and prevents later agents from relying on irrelevant or overly broad information.


Step 3: Context retrieval based on the topic

When quiz generation is requested, the system does not use the full document. Instead, it performs a semantic search in the vector database to retrieve only those content chunks that are most relevant to the specified topic.This retrieval step acts as a hard constraint for generation. All downstream agents are explicitly restricted to the retrieved context. By limiting the information available to the generative agent, the system significantly reduces the risk of hallucinated or unsupported content.


Step 4: Quiz generation with structural constraints

The Quiz Generator Agent uses the retrieved context to generate multiple-choice questions. Each question follows a strict schema and includes:

  • four answer options,
  • a clearly phrased question,
  • a correct answer label,
  • a short hint,
  • an explanation grounded in the source material,
  • and a difficulty classification.

The agent is instructed to use only the provided context and to output structured data in a predefined format. This ensures that the generated quiz can be reliably processed by the frontend and interpreted by both teachers and learners.


Step 5: Validation and correction of generated questions

Before any content is delivered to the game, it is passed to the Validator Agent. This agent checks each question for consistency, structural correctness, and alignment with the source material.If a question is found to be problematic—for example, because it is not clearly supported by the context—the system attempts to repair it automatically using the same constrained information. Questions that cannot be fixed are excluded. This validation step is essential for ensuring that the final output meets basic educational quality standards.


Step 6: Delivery to the game client

After validation, the finalized quiz is exposed through an API endpoint. The Unity WebGL frontend retrieves the quiz data and presents it as an interactive learning game. From the perspective of the player, the system behaves like a conventional quiz game, even though the content was generated dynamically.This completes the core pipeline: from teacher-provided material to a playable, validated learning experience.

The Agents – Roles, Responsibilities, and Boundaries

A key reason the system remains reliable is that it is not a single “all-in-one” model. Instead, it is composed of specialized agents. Each agent has a clear responsibility, a bounded scope, and a well-defined input/output format. This reduces complexity, makes errors easier to trace, and improves maintainability.


1) Curriculum Agent — turning raw material into structured knowledge

Purpose: Prepare the uploaded learning material for reliable retrieval and downstream generation.

What it does:

  • Extracts the text from the uploaded PDF.
  • Splits the content into manageable chunks.
  • For each chunk, produces:
    • a short summary,
    • key concepts,
    • and keywords.
  • Stores chunk representations in a vector database to enable semantic retrieval.

Why it matters:
This agent creates the foundation for grounding. Later agents can operate on relevant fragments instead of the entire document, which reduces noise and improves factual alignment.


2) Retrieval Component — selecting only the relevant context

Although retrieval is not “creative,” it functions as a control mechanism.

Purpose: Ensure downstream agents work only with topic-relevant content.

What it does:

  • Uses the topic to query the vector database.
  • Returns only the most relevant chunks, including metadata (e.g., chunk position).

Why it matters:
Retrieval acts as a hard constraint. If something is not in the retrieved context, it should not appear in the generated quiz. This is one of the strongest safeguards against hallucinations.


3) Quiz Generator Agent — producing structured quiz content

Purpose: Generate multiple-choice questions from the retrieved context.

What it does:

  • Generates a fixed number of MCQs based strictly on the retrieved chunks.
  • Outputs each item in a structured schema with:
    • question text,
    • four answer choices,
    • correct answer,
    • explanation,
    • hint,
    • difficulty label.

Why it matters:
The generator is not evaluated on creativity, but on producing usable, game-ready items. Strict formatting and clear constraints make the output predictable and easy to integrate into the Unity game.


4) Validator Agent — checking correctness and fixing failures

Purpose: Enforce minimum quality and consistency before gameplay.

What it does:

  • Validates each generated question against the retrieved context.
  • Flags problematic items, such as:
    • missing fields,
    • unclear phrasing,
    • or content that appears unsupported by the source material.
  • Optionally attempts to regenerate or repair invalid questions under the same context constraints.

Why it matters:
This agent is the main reliability layer. It converts “best-effort generation” into “quality-controlled output,” which is essential in education, where incorrect items reduce trust immediately.


5) Adaptive Agent — adjusting difficulty based on learner behaviour (optional)

Purpose: Personalize question difficulty using performance signals.

What it does:

  • Uses learner response history (e.g., time spent, hints used) to infer whether difficulty should increase, decrease, or remain stable.
  • Generates new questions under the same retrieval constraints, but with difficulty guidance.

Why it matters:
Adaptivity makes the system more than a static quiz generator. It enables a controlled feedback loop that can support learners at different levels while maintaining transparency and bounded agent behaviour.

Current Development Status: What Works Today

This system was not designed as a conceptual prototype, but as a fully functioning end-to-end implementation. Both the multi-agent backend and the Unity-based frontend are operational and connected through a stable API. The current development status reflects a system that can already be used in realistic testing scenarios.

End-to-end functionality

At present, the complete pipeline is implemented and working:

  • Teachers can upload PDF-based learning material and define a topic and quiz parameters.
  • The backend processes the document, extracts and stores structured curriculum information, and generates quiz questions using the multi-agent pipeline.
  • Generated questions are validated before being made available.
  • The Unity WebGL frontend retrieves the validated quiz and presents it as a playable learning game.
  • Learners can play the quiz without being exposed to any of the underlying AI logic.

This confirms that the system is not limited to isolated agent experiments, but functions as an integrated learning application.

Stability and robustness improvements during development

Several development iterations focused specifically on stability and reliability, rather than adding new features. These included:

  • stricter JSON schemas and output sanitation to prevent parsing errors,
  • improved separation between agents to avoid cross-agent interference,
  • structured logging of agent outputs and processing times,
  • validation layers before and after quiz generation,
  • background processing to prevent UI freezes during long-running tasks such as PDF analysis.

These measures were essential to move from experimental AI behavior to a system that behaves predictably in repeated runs.

Frontend maturity and usability

On the frontend side, the Unity WebGL application evolved from a simple prototype into a usable interface. Key improvements included:

  • a chat-based interaction model for uploading files and receiving feedback,
  • upload progress indicators and file previews,
  • session management and chat history persistence,
  • the ability to interrupt long-running AI responses,
  • a clean separation between teacher and learner workflows.

Although the frontend is intentionally minimal, these features significantly improve usability and reduce friction for non-technical users.

What this stage represents

The current state of the project represents a stable baseline. The core research question—whether a bounded multi-agent system can reliably generate quiz-based learning game content from human input—can already be answered using this implementation.

The focus at this stage is therefore not on adding complexity, but on consolidating what works, understanding limitations, and preparing the system for evaluation and future refinement.

Key Strengths of the Multi-Agent Approach

The current implementation highlights several strengths that result directly from the chosen bounded multi-agent design. These strengths are not theoretical advantages, but practical properties that emerged during development and testing.

Reliability through explicit validation

One of the most important strengths of the system is its focus on reliability. Quiz questions are not delivered directly after generation. Instead, they pass through a validation stage that checks structural correctness and alignment with the source material.

This additional step significantly reduces the likelihood of incorrect, misleading, or poorly formed questions. In an educational context, this is critical: even a small number of faulty items can undermine trust in the entire system. The validator agent therefore acts as a quality gate that transforms probabilistic generation into controlled output.

Reduced hallucinations through retrieval grounding

Another key strength is the strict grounding of all generated content in retrieved document context. Generative agents are not allowed to rely on general knowledge or assumptions. They can only operate on content that has been explicitly extracted from the uploaded learning material.

This design choice proved effective in reducing hallucinations. When errors occurred, they were usually traceable to retrieval quality rather than uncontrolled generation. As a result, improvements could be targeted at specific pipeline stages instead of adjusting the entire system.

Modularity and traceability

Because each agent has a clearly defined role, the system is highly modular. Errors, performance bottlenecks, or quality issues can be traced back to individual agents or processing steps. This makes debugging and refinement substantially easier than in monolithic AI systems.

From a research perspective, this modularity also improves explainability. It is possible to describe not only what the system produces, but how and why a specific output was generated.

Usability for non-technical users

Although the backend system is technically complex, the user-facing workflow is intentionally simple. Teachers interact with the system by uploading material, defining a topic, and receiving a quiz identifier. They do not need to understand agent orchestration, retrieval mechanisms, or model prompting.

This separation between technical complexity and user experience is a major strength. It allows advanced AI techniques to be applied without increasing cognitive or technical load for educators.

Practical deployability

Finally, the system is designed to be deployable under realistic constraints. It does not require specialized infrastructure, autonomous learning loops, or continuous retraining. The backend can operate independently of the frontend, and the frontend can be replaced without modifying the multi-agent logic.

This makes the system suitable not only as a research prototype, but also as a foundation for further applied development.