Erik eriksfunhouse.com

Where the fun never stops!

Hugin: A State Machine Framework for Agentic Reasoning

March 21, 2026 Long

Based on a four-part blog series on state machines for multi-agent systems. For more details on Hugin see the Hugin website and the Hugin GitHub repo.

Most agent frameworks treat LLMs as the agent itself. Hugin instead treats them as oracles β€” one component in a larger reasoning system. The framework is built around a simple idea: if you want agents that reason well over long horizons, you need explicit structure around how they reason, not just what they reason about.

This is achieved through a state machine architecture where every interaction is an immutable entry on a stack, making branching, debugging, replay, and multi-agent coordination natural rather than bolted on.

The problem with current agent frameworks

The dominant approach to building agents is to wrap an LLM in a loop: prompt, get response, execute tools, repeat. This works for short tasks but breaks down for longer-running, creative reasoning:

  • No memory of process β€” the agent has no structured record of what it tried and why (at least, not natively) and no obvious path to self-improvement.
  • No backtracking β€” if a reasoning path fails, the only option is to start over or rely on the LLM to self-correct. It is difficult to roll back to a previous state and restart from there.
  • Opaque debugging β€” the difficulty of backtracking combined with bolt-on multi-agent patterns makes debugging difficult at scale and especially for long-running, complex reasoning tasks.
  • Brittle orchestration β€” coordinating multiple agents requires ad-hoc message passing that often struggles with combining asynchronous and synchronous communication in one framework.

The state machine

Hugin models every agent as a state machine with a well-defined lifecycle. An agent moves through states β€” receiving a task, consulting the oracle (LLM), executing tools, calling sub-agents, asking a human β€” and every transition produces an immutable interaction pushed onto a stack.

Each interaction has a type that captures what happened:

  • TaskDefinition / TaskResult β€” the start and end of a task
  • AskOracle / OracleResponse β€” consulting the LLM and receiving its response
  • ToolCall / ToolResult β€” executing a tool and capturing what it returned
  • AskHuman / HumanResponse β€” requesting and receiving human input
  • AgentCall / AgentResult β€” spawning a sub-agent and receiving its output
  • TaskChain β€” transitioning to a new task in a pipeline

The stack is the complete, ordered history of everything the agent has done. Because it’s immutable, every state is preserved β€” you can inspect any point, replay from any point, or branch from any point.

Three building blocks

An agent in Hugin is defined by three core elements:

  • Configuration β€” How the agent behaves. Tools, model, system prompt, templates β€” all dynamically rendered at each step, so the agent’s capabilities can change mid-task. Configurations can define state machines with transitions, giving agents different tools and prompts in different phases.
  • Task β€” What the agent does. The initial prompt and parameters. Tasks can chain into pipelines: one task’s result becomes the next task’s input, with different configurations for each phase.
  • Stack β€” What the agent has done. The immutable history of interactions. The context window is re-rendered from the stack at every LLM call β€” the agent never manages its own memory.

Configuration and task are just declarative YAML. The configuration says how the agent behaves; the task says what it should do β€” and can override pieces of the config, like the model:

# basic_agent.yaml β€” the Configuration
name: basic_agent
system_template: basic_system
llm_model: haiku-latest
tools:
  - builtins.save_insight:save_insight
  - builtins.finish:finish
# hello_world.yaml β€” the Task
name: hello_world
prompt: |
  The user asked: 
  Answer thoughtfully, save the answer with save_insight,
  then call finish to complete the task.
llm_model: sonnet-latest   # override the config's model for this task

The Stack isn’t authored β€” it’s produced as the agent runs.

This separation is key. The LLM never manages its own state β€” the framework does. The LLM is consulted as an oracle: given the current stack (rendered as context), what should we do next? This makes the system far more adaptable, predictable and debuggable than approaches where the LLM is expected to self-manage.

Tools

Agents interact with the world through tools. The state machine treats tool calls as first-class transitions β€” the agent requests a tool call, the framework executes it, and the result is pushed onto the stack.

Every tool receives the full stack as its first parameter, giving it access to the agent’s history, environment, shared state, and storage. This means tools can be deeply context-aware without the agent needing to pass information explicitly. A tool is just a Python function that returns a ToolResponse:

from gimle.hugin.interaction.stack import Stack
from gimle.hugin.tools.tool import ToolResponse

def extract_text(document: str, stack: Stack, **kwargs) -> ToolResponse:
    sentences = [s.strip() for s in document.split(".") if s.strip()]
    return ToolResponse(
        is_error=False,
        content={"word_count": len(document.split()), "sentences": sentences},
    )

Hugin ships with built-in tools that cover the core capabilities:

  • finish β€” complete the current task with a result
  • save_insight / query_artifacts / get_artifact_content β€” long-term memory (more on this below)
  • ask_user β€” pause and request human input (creates the AskHuman interaction)
  • launch_agent / list_agents / list_running_agents β€” spawn and track sub-agents for specialised subtasks

Custom tools are straightforward: a Python function plus a definition, either via inline decorators or YAML. The function receives the stack and any parameters; the YAML describes the tool’s name, description, and parameter schema, and points at the implementation:

name: extract_text
description: Extract and clean text from a document.
implementation_path: tools.document_tools:extract_text
parameters:
  document:
    type: string
    required: true

Tools can also chain deterministically via a next_tool mechanism β€” useful for pipelines where one tool’s output always feeds into another.

Steering agents

The state machine gives us precise control over how agents reason, without constraining what they reason about.

  1. Configuration state machines β€” An agent’s configuration can itself be a state machine. A planning phase might use one set of tools and a reasoning-optimised model; execution switches to a different tool set and a faster model. Transitions between states can be triggered by the agent or by the framework.
  2. Task chaining β€” Tasks compose into pipelines. Extract β†’ analyse β†’ summarise, where each step’s result flows to the next via pass_result_as. Each step can use a different configuration, so a research task chains into a writing task with different tools and prompts.
  3. Self-reflection β€” After completing a task, an agent can review its own stack β€” a complete record of its reasoning β€” and produce a critique. Sub-agent reflection takes this further: a second agent reviews the first agent’s work with fresh context and different instructions.
  4. Human-in-the-loop β€” The ask_user tool pauses the agent and requests input, creating an AskHuman interaction. The human response is pushed onto the stack as a HumanResponse interaction β€” part of the permanent record. This supports approval gates, clarification requests, and collaborative reasoning without breaking the state machine model.

Task chaining (point 2) is just two fields on a task β€” next_task names the next phase, and pass_result_as decides which parameter the result lands in:

# extract_text.yaml β€” first step of an extract β†’ analyse β†’ summarise pipeline
name: extract_text
prompt: "Extract text from: "
next_task: analyze_content       # hand off to the next phase…
pass_result_as: extracted_text   # …as its `extracted_text` parameter
tools:
  - extract_text
  - builtins.finish:finish

Multi-agent coordination

A session in Hugin can host multiple agents, each with its own stack, configuration, and task. Agents coordinate through two mechanisms:

Sub-agents. An agent can spawn a sub-agent via call_agent. The parent pauses, the sub-agent runs to completion with its own stack and configuration, and the result is returned as an AgentResult interaction on the parent’s stack. The sub-agent is fully isolated β€” its own context window, its own tools, its own model.

Shared state. Agents in the same session can communicate through namespaces β€” key-value stores with fine-grained access control. A producer agent writes findings to a namespace; a consumer agent reads them. This decouples agents without requiring direct message passing, and the access control prevents unintended interference.

Branching: parallel exploration

The most powerful consequence of the stack architecture is branching. At any point in an agent’s reasoning, you can create a branch β€” a copy of the stack that diverges from that point. The branch shares all history up to the fork, then explores independently.

This is not just convenience β€” it’s a fundamentally different approach to reasoning. Instead of committing to a single path and hoping the LLM self-corrects, you can explore multiple hypotheses in parallel, evaluate each, and select the best. The architecture supports this natively because branching is just stack manipulation.

Memory

For long-running tasks, the context window is not enough. Hugin provides two memory mechanisms:

Dynamic context β€” the stack, re-rendered at each LLM call. This is short-term memory: what happened in this task. It’s automatic and complete, but bounded by the context window.

Artifacts β€” a persistent store the agent writes to and queries via three built-in tools: save_insight stores a finding with metadata; query_artifacts retrieves relevant artifacts by semantic search; get_artifact_content fetches a specific artifact in full. Artifacts include quality ratings and feedback, so the agent can assess the reliability of what it recalls.

The separation matters. Dynamic context captures everything but forgets at the context boundary. Artifacts are selective β€” the agent must decide what’s worth remembering β€” but persist indefinitely.

Improving reasoning at test time

Once you have this architecture, how do you make agents reason better?

Local vs global reasoning. Most agent frameworks optimise locally β€” making each individual step as good as possible. But the quality of the final output depends on the sequence of steps: the global reasoning trajectory. A locally optimal step can lead to a globally poor outcome.

Evaluators. The framework supports pluggable evaluators β€” heuristic-based, LLM-as-a-judge, or learned β€” that score agent outputs. These evaluators enable:

  • Rejection sampling β€” generate multiple outputs, keep the best
  • Monte Carlo rollouts β€” branch from the current state, roll out several completions, evaluate each, and pick the most promising path to continue
  • Comparative ranking β€” present pairs of outputs to a judge and build a preference ordering

This is structurally analogous to how AlphaGo, and similar RL systems, combine a policy network (the LLM choosing actions) with a value network (the evaluator scoring positions) and tree search (branching and rollout). The stack architecture makes this natural: each rollout is just a branch, and branches are cheap.

Debugging and monitoring

Because every interaction is on the stack, debugging is qualitatively different from traditional agent frameworks. Hugin provides three interfaces:

  • Web monitor (hugin run --monitor) β€” a dashboard showing the stack in real time, with the ability to inspect any interaction
  • Terminal UI (hugin run -i) β€” an interactive TUI for watching and intervening in agent runs
  • Replay β€” rewind to any point in the stack and re-run from there, with different configuration if needed

This means when something goes wrong, you can see exactly what the agent saw, what it decided, and why β€” then replay from before the mistake with adjusted parameters.

Getting started

Once you’ve written the YAML and Python above, running it is one command β€” --monitor opens the live stack dashboard:

pip install gimle-hugin

hugin run --task hello_world --task-path examples/basic_agent --monitor

Or drive it from Python, which is where multi-agent orchestration lives:

from gimle.hugin.agent.environment import Environment
from gimle.hugin.agent.session import Session
from gimle.hugin.storage.local import LocalStorage

env = Environment.load("examples/basic_agent", storage=LocalStorage("./data"))
session = Session(environment=env)

config = env.config_registry.get("basic_agent")
task = env.task_registry.get("hello_world")
session.create_agent_from_task(config, task)

session.run()   # runs every agent in the session to completion

Hugin is open source. The best places to dig in:

  • Hugin website β€” documentation, concepts, and a guided walkthrough of the architecture.
  • Hugin on GitHub β€” the source, runnable example apps, and YAML/Python tool definitions to copy from. Clone it, run uv run hugin ..., and watch a stack build in real time.
  • State machines for multi-agent systems β€” the four-part blog series this article is based on, for the deeper rationale behind the design.