Back to Projects

Hierarchical Multi-Agent Decision System for Wargaming

Production-grade C3 hierarchy multi-agent LLM system functioning as a Battle Management Assistant for real-time wargaming.

Avg turn latency< 5 seconds (5 LLM calls)
Token efficiency~3,000–4,000 tokens/turn
Coordinated Agents4 Role-Based
ObservabilityFull Logfire trace per turn

Introduction

Designed and built a production-grade multi-agent LLM system that functions as a Battle Management Assistant (BMA) for real-time wargaming. The architecture implements a true C3 (Command, Control, Communication) hierarchy - a Strategic Commander (C3-Agent) orchestrating role-specialized Tactical Executors that mimic real-world military command structure (global strategic view → per-unit tactical decisions).

Currently extending the agentic architecture into a 3D Game environment with Tacview visualization and a custom 3D simulation environment.

The Problem

Building an AI agent that plays a real-time wargame is fundamentally a system design challenge, rather than a prompting problem:

  • Spatial grounding: LLMs have no inherent sense of a grid - distance, bearing, and geometry must be pre-computed and encoded in token-efficient formats before any LLM call
  • Hallucination mitigation: LLMs can suggest illegal moves (collisions, shooting friendlies, double-targeting). The system must guarantee game physics are never violated
  • Dual-horizon planning: Strategic long-horizon cohesion AND per-unit precision decisions, simultaneously
  • Strict coordination: No friendly-fire, no target duplication, no collision - enforced via a shared Pythonic coordination state
  • BMA requirements: Explainable decisions, momentum tracking, memory persistence, graceful degradation

System Demonstration

Core Architecture

LayerRoleKey Design
Perception LayerRaw game state → LLM-optimized tokensPre-computes bearings, threat tiers, hit-probabilities, enemy intent
Strategic CommanderGlobal battlefield analysis, per-role directivesChain-of-thought + cloze-style prompting, momentum-aware, memory-augmented
Tactical ExecutorsPer-role action decisions4 role agents (AWACS, SAM, Aircraft, Decoy) with sequential execution + skip logic
Coordination StateMulti-unit and role deconflictionPythonic shared memory - prevents double-targeting, friendly-fire, collision
Memory LayerStrategic persistenceRolling outcomes, narrative history, missing-enemy tracking, momentum scores
ObservabilityFull production tracingLogfire spans per turn + Pydantic validation on all LLM outputs

Key Engineering Decisions

Grounding LLMs in Spatial Reasoning

Language models have no inherent sense of a grid. All spatial data - distance, bearing, threat classification, hit probability - is pre-computed in Python before any LLM call and encoded in token-efficient natural-language format.

Hierarchical > Monolithic Prompting

Splitting into a single strategic call + per-role tactical calls yields better coordination, easier debugging, and clear scalability - each role agent sees only what it needs.

Cloze-Style Prompting for Hallucination Control

Prompt templates use a JSON skeleton with <PLACEHOLDER> tokens. The LLM fills in the blanks within a constrained structure - dramatically reducing illegal output formats and out-of-bounds moves.

Cost Optimization via Skip Logic

SAM agents are skipped entirely if no enemies are in radar range (~25% token reduction per turn). Smart routing means LLM cost scales with battlefield density, not turn count.

Adaptive Memory + Momentum

The strategic agent receives momentum score, recent narrative history, and missing-enemy intelligence - enabling consistent multi-turn strategy without a long context window.

Key Learnings

  • Hybrid hierarchical architecture is the right abstraction: scales cleanly, each layer independently debuggable
  • LLMs cannot understand spatial environment: the perception layer is not optional; it is essential to let LLMs understand the real-world visual space
  • Cloze-style prompting: reduces hallucination rate significantly vs. open-ended generation in structured-output scenarios
  • Memory module: (momentum + narrative) produces measurably more consistent multi-turn strategy
  • Observability important: Logfire per-turn tracing turned debugging tactical failures from guesswork into root-cause analysis

Scaling to 3D (In Progress)

  • Realistic 3D kinematics environment, terrain occlusion, multi-domain operations (air + ground)
  • Exploring the latest innovations (novel LLM architectures, world-models) to integrate in the Agent workflow
  • Leveraging LLMs to transform spatial environment into NATO-standard OPORDs directives for 3D mission planning
  • Custom 3D environment hosted on internal black network with simulated warfare entities (5th generation fighter jet, Kamikaze, UAV, SAM)

Tech Stack

Orchestration

PydanticAIHierarchical AgentsSkip Logic

LLMs & Routing

OpenRouterLlama-3.3-70BCloze Prompting

Observability

LogfireSpan TracingMomentum Scoring

Infrastructure

FastAPIPydanticuvPython

Interested in Collaborating?

If you are building real-time autonomous decision systems, memory-augmented agents, or complex AI workflows, let's talk about how I can bring this expertise to your project to build it.

✉ Get in Touch