Hugin: A State Machine Framework for Agentic Reasoning
Based on a four-part blog series on state machines for multi-agent systems. For more details on Hugin see the Hugin website and the Hugin GitHub repo.
Most agent frameworks treat LLMs as the agent itself. Hugin instead treats them as oracles β one component in a larger reasoning system. The framework is built around a simple idea: if you want agents that reason well over long horizons, you need explicit structure around how they reason, not just what they reason about.
This is achieved through a state machine architecture where every interaction is an immutable entry on a stack, making branching, debugging, replay, and multi-agent coordination natural rather than bolted on.
The problem with current agent frameworks
The dominant approach to building agents is to wrap an LLM in a loop: prompt, get response, execute tools, repeat. This works for short tasks but breaks down for longer-running, creative reasoning:
- No memory of process β the agent has no structured record of what it tried and why (at least, not natively) and no obvious path to self-improvement.
- No backtracking β if a reasoning path fails, the only option is to start over or rely on the LLM to self-correct. It is difficult to roll back to a previous state and restart from there.
- Opaque debugging β the difficulty of backtracking combined with bolt-on multi-agent patterns makes debugging difficult at scale and especially for long-running, complex reasoning tasks.
- Brittle orchestration β coordinating multiple agents requires ad-hoc message passing that often struggles with combining asynchronous and synchronous communication in one framework.
The state machine
Hugin models every agent as a state machine with a well-defined lifecycle. An agent moves through states β receiving a task, consulting the oracle (LLM), executing tools, calling sub-agents, asking a human β and every transition produces an immutable interaction pushed onto a stack.
Each interaction has a type that captures what happened:
- TaskDefinition / TaskResult β the start and end of a task
- AskOracle / OracleResponse β consulting the LLM and receiving its response
- ToolCall / ToolResult β executing a tool and capturing what it returned
- AskHuman / HumanResponse β requesting and receiving human input
- AgentCall / AgentResult β spawning a sub-agent and receiving its output
- TaskChain β transitioning to a new task in a pipeline
The stack is the complete, ordered history of everything the agent has done. Because itβs immutable, every state is preserved β you can inspect any point, replay from any point, or branch from any point.
Three building blocks
An agent in Hugin is defined by three core elements:
- Configuration β How the agent behaves. Tools, model, system prompt, templates β all dynamically rendered at each step, so the agentβs capabilities can change mid-task. Configurations can define state machines with transitions, giving agents different tools and prompts in different phases.
- Task β What the agent does. The initial prompt and parameters. Tasks can chain into pipelines: one taskβs result becomes the next taskβs input, with different configurations for each phase.
- Stack β What the agent has done. The immutable history of interactions. The context window is re-rendered from the stack at every LLM call β the agent never manages its own memory.
Configuration and task are just declarative YAML. The configuration says how the agent behaves; the task says what it should do β and can override pieces of the config, like the model:
# basic_agent.yaml β the Configuration
name: basic_agent
system_template: basic_system
llm_model: haiku-latest
tools:
- builtins.save_insight:save_insight
- builtins.finish:finish
# hello_world.yaml β the Task
name: hello_world
prompt: |
The user asked:
Answer thoughtfully, save the answer with save_insight,
then call finish to complete the task.
llm_model: sonnet-latest # override the config's model for this task
The Stack isnβt authored β itβs produced as the agent runs.
This separation is key. The LLM never manages its own state β the framework does. The LLM is consulted as an oracle: given the current stack (rendered as context), what should we do next? This makes the system far more adaptable, predictable and debuggable than approaches where the LLM is expected to self-manage.
Tools
Agents interact with the world through tools. The state machine treats tool calls as first-class transitions β the agent requests a tool call, the framework executes it, and the result is pushed onto the stack.
Every tool receives the full stack as its first parameter, giving it access to the agentβs history, environment, shared state, and storage. This means tools can be deeply context-aware without the agent needing to pass information explicitly. A tool is just a Python function that returns a ToolResponse:
from gimle.hugin.interaction.stack import Stack
from gimle.hugin.tools.tool import ToolResponse
def extract_text(document: str, stack: Stack, **kwargs) -> ToolResponse:
sentences = [s.strip() for s in document.split(".") if s.strip()]
return ToolResponse(
is_error=False,
content={"word_count": len(document.split()), "sentences": sentences},
)
Hugin ships with built-in tools that cover the core capabilities:
- finish β complete the current task with a result
- save_insight / query_artifacts / get_artifact_content β long-term memory (more on this below)
- ask_user β pause and request human input (creates the
AskHumaninteraction) - launch_agent / list_agents / list_running_agents β spawn and track sub-agents for specialised subtasks
Custom tools are straightforward: a Python function plus a definition, either via inline decorators or YAML. The function receives the stack and any parameters; the YAML describes the toolβs name, description, and parameter schema, and points at the implementation:
name: extract_text
description: Extract and clean text from a document.
implementation_path: tools.document_tools:extract_text
parameters:
document:
type: string
required: true
Tools can also chain deterministically via a next_tool mechanism β useful for pipelines where one toolβs output always feeds into another.
Steering agents
The state machine gives us precise control over how agents reason, without constraining what they reason about.
- Configuration state machines β An agentβs configuration can itself be a state machine. A planning phase might use one set of tools and a reasoning-optimised model; execution switches to a different tool set and a faster model. Transitions between states can be triggered by the agent or by the framework.
- Task chaining β Tasks compose into pipelines. Extract β analyse β summarise, where each stepβs result flows to the next via
pass_result_as. Each step can use a different configuration, so a research task chains into a writing task with different tools and prompts. - Self-reflection β After completing a task, an agent can review its own stack β a complete record of its reasoning β and produce a critique. Sub-agent reflection takes this further: a second agent reviews the first agentβs work with fresh context and different instructions.
- Human-in-the-loop β The
ask_usertool pauses the agent and requests input, creating anAskHumaninteraction. The human response is pushed onto the stack as aHumanResponseinteraction β part of the permanent record. This supports approval gates, clarification requests, and collaborative reasoning without breaking the state machine model.
Task chaining (point 2) is just two fields on a task β next_task names the next phase, and pass_result_as decides which parameter the result lands in:
# extract_text.yaml β first step of an extract β analyse β summarise pipeline
name: extract_text
prompt: "Extract text from: "
next_task: analyze_content # hand off to the next phaseβ¦
pass_result_as: extracted_text # β¦as its `extracted_text` parameter
tools:
- extract_text
- builtins.finish:finish
Multi-agent coordination
A session in Hugin can host multiple agents, each with its own stack, configuration, and task. Agents coordinate through two mechanisms:
Sub-agents. An agent can spawn a sub-agent via call_agent. The parent pauses, the sub-agent runs to completion with its own stack and configuration, and the result is returned as an AgentResult interaction on the parentβs stack. The sub-agent is fully isolated β its own context window, its own tools, its own model.
Shared state. Agents in the same session can communicate through namespaces β key-value stores with fine-grained access control. A producer agent writes findings to a namespace; a consumer agent reads them. This decouples agents without requiring direct message passing, and the access control prevents unintended interference.
Branching: parallel exploration
The most powerful consequence of the stack architecture is branching. At any point in an agentβs reasoning, you can create a branch β a copy of the stack that diverges from that point. The branch shares all history up to the fork, then explores independently.
This is not just convenience β itβs a fundamentally different approach to reasoning. Instead of committing to a single path and hoping the LLM self-corrects, you can explore multiple hypotheses in parallel, evaluate each, and select the best. The architecture supports this natively because branching is just stack manipulation.
Memory
For long-running tasks, the context window is not enough. Hugin provides two memory mechanisms:
Dynamic context β the stack, re-rendered at each LLM call. This is short-term memory: what happened in this task. Itβs automatic and complete, but bounded by the context window.
Artifacts β a persistent store the agent writes to and queries via three built-in tools: save_insight stores a finding with metadata; query_artifacts retrieves relevant artifacts by semantic search; get_artifact_content fetches a specific artifact in full. Artifacts include quality ratings and feedback, so the agent can assess the reliability of what it recalls.
The separation matters. Dynamic context captures everything but forgets at the context boundary. Artifacts are selective β the agent must decide whatβs worth remembering β but persist indefinitely.
Improving reasoning at test time
Once you have this architecture, how do you make agents reason better?
Local vs global reasoning. Most agent frameworks optimise locally β making each individual step as good as possible. But the quality of the final output depends on the sequence of steps: the global reasoning trajectory. A locally optimal step can lead to a globally poor outcome.
Evaluators. The framework supports pluggable evaluators β heuristic-based, LLM-as-a-judge, or learned β that score agent outputs. These evaluators enable:
- Rejection sampling β generate multiple outputs, keep the best
- Monte Carlo rollouts β branch from the current state, roll out several completions, evaluate each, and pick the most promising path to continue
- Comparative ranking β present pairs of outputs to a judge and build a preference ordering
This is structurally analogous to how AlphaGo, and similar RL systems, combine a policy network (the LLM choosing actions) with a value network (the evaluator scoring positions) and tree search (branching and rollout). The stack architecture makes this natural: each rollout is just a branch, and branches are cheap.
Debugging and monitoring
Because every interaction is on the stack, debugging is qualitatively different from traditional agent frameworks. Hugin provides three interfaces:
- Web monitor (
hugin run --monitor) β a dashboard showing the stack in real time, with the ability to inspect any interaction - Terminal UI (
hugin run -i) β an interactive TUI for watching and intervening in agent runs - Replay β rewind to any point in the stack and re-run from there, with different configuration if needed
This means when something goes wrong, you can see exactly what the agent saw, what it decided, and why β then replay from before the mistake with adjusted parameters.
Getting started
Once youβve written the YAML and Python above, running it is one command β --monitor opens the live stack dashboard:
pip install gimle-hugin
hugin run --task hello_world --task-path examples/basic_agent --monitor
Or drive it from Python, which is where multi-agent orchestration lives:
from gimle.hugin.agent.environment import Environment
from gimle.hugin.agent.session import Session
from gimle.hugin.storage.local import LocalStorage
env = Environment.load("examples/basic_agent", storage=LocalStorage("./data"))
session = Session(environment=env)
config = env.config_registry.get("basic_agent")
task = env.task_registry.get("hello_world")
session.create_agent_from_task(config, task)
session.run() # runs every agent in the session to completion
Hugin is open source. The best places to dig in:
- Hugin website β documentation, concepts, and a guided walkthrough of the architecture.
- Hugin on GitHub β the source, runnable example apps, and YAML/Python tool definitions to copy from. Clone it, run
uv run hugin ..., and watch a stack build in real time. - State machines for multi-agent systems β the four-part blog series this article is based on, for the deeper rationale behind the design.
eriksfunhouse.com