Hierarchical Multi-Agent Decision System for Wargaming

Production-grade C3 hierarchy multi-agent LLM system functioning as a Battle Management Assistant for real-time wargaming.

GitHub Live Demo

Avg turn latency< 5 seconds (5 LLM calls)

Token efficiency~3,000–4,000 tokens/turn

Coordinated Agents4 Role-Based

ObservabilityFull Logfire trace per turn

Introduction

Designed and built a production-grade multi-agent LLM system that functions as a Battle Management Assistant (BMA) for real-time wargaming. The architecture implements a true C3 (Command, Control, Communication) hierarchy - a Strategic Commander (C3-Agent) orchestrating role-specialized Tactical Executors that mimic real-world military command structure (global strategic view → per-unit tactical decisions).

Currently extending the agentic architecture into a 3D Game environment with Tacview visualization and a custom 3D simulation environment.

The Problem

Building an AI agent that plays a real-time wargame is fundamentally a system design challenge, rather than a prompting problem:

Spatial grounding: LLMs have no inherent sense of a grid - distance, bearing, and geometry must be pre-computed and encoded in token-efficient formats before any LLM call
Hallucination mitigation: LLMs can suggest illegal moves (collisions, shooting friendlies, double-targeting). The system must guarantee game physics are never violated
Dual-horizon planning: Strategic long-horizon cohesion AND per-unit precision decisions, simultaneously
Strict coordination: No friendly-fire, no target duplication, no collision - enforced via a shared Pythonic coordination state
BMA requirements: Explainable decisions, momentum tracking, memory persistence, graceful degradation

System Demonstration

Core Architecture

Layer	Role	Key Design
Perception Layer	Raw game state → LLM-optimized tokens	Pre-computes bearings, threat tiers, hit-probabilities, enemy intent
Strategic Commander	Global battlefield analysis, per-role directives	Chain-of-thought + cloze-style prompting, momentum-aware, memory-augmented
Tactical Executors	Per-role action decisions	4 role agents (AWACS, SAM, Aircraft, Decoy) with sequential execution + skip logic
Coordination State	Multi-unit and role deconfliction	Pythonic shared memory - prevents double-targeting, friendly-fire, collision
Memory Layer	Strategic persistence	Rolling outcomes, narrative history, missing-enemy tracking, momentum scores
Observability	Full production tracing	Logfire spans per turn + Pydantic validation on all LLM outputs

Key Engineering Decisions

Grounding LLMs in Spatial Reasoning

Language models have no inherent sense of a grid. All spatial data - distance, bearing, threat classification, hit probability - is pre-computed in Python before any LLM call and encoded in token-efficient natural-language format.

Hierarchical > Monolithic Prompting

Splitting into a single strategic call + per-role tactical calls yields better coordination, easier debugging, and clear scalability - each role agent sees only what it needs.

Cloze-Style Prompting for Hallucination Control

Prompt templates use a JSON skeleton with <PLACEHOLDER> tokens. The LLM fills in the blanks within a constrained structure - dramatically reducing illegal output formats and out-of-bounds moves.

Cost Optimization via Skip Logic

SAM agents are skipped entirely if no enemies are in radar range (~25% token reduction per turn). Smart routing means LLM cost scales with battlefield density, not turn count.

Adaptive Memory + Momentum

The strategic agent receives momentum score, recent narrative history, and missing-enemy intelligence - enabling consistent multi-turn strategy without a long context window.

Key Learnings

Hybrid hierarchical architecture is the right abstraction: scales cleanly, each layer independently debuggable
LLMs cannot understand spatial environment: the perception layer is not optional; it is essential to let LLMs understand the real-world visual space
Observability important: Logfire per-turn tracing turned debugging tactical failures from guesswork into root-cause analysis

Scaling to 3D (In Progress)

Realistic 3D kinematics environment, terrain occlusion, multi-domain operations (air + ground)
Exploring the latest innovations (novel LLM architectures, world models) to integrate in the Agent workflow
Leveraging LLMs to transform spatial environment into NATO-standard OPORDs directives for 3D mission planning
Custom 3D environment hosted on internal black network with simulated warfare entities (5th generation fighter jet, Kamikaze, UAV, SAM)

Tech Stack

Orchestration

PydanticAIHierarchical AgentsWorld ModelsSkip Logic

LLMs & Routing

OpenRouterLlama-3.3-70BCloze Prompting

Observability

LogfireSpan TracingMomentum Scoring

Infrastructure

FastAPIReactPydanticuvPython

Deployment & CI/CD

GitHub ActionsRailwayCloudflare Pages

Interested in Collaborating?

If you are building real-time autonomous decision systems, memory-augmented agents, or complex AI workflows, let's talk about how I can bring this expertise to your project to build it.

✉ Get in Touch

Next Project Knowledge Graph RAG for Aerospace Technical Documentation