fugue labs
Fugue Labs · Open Source

gollem

The production agent framework for Go. Typed agents, structured output, streaming, guardrails, cost tracking, multi-agent swarms, code mode, graph workflows. Zero core dependencies. Single binary.

$ go get github.com/fugue-labs/gollem Star
01

What a run looks like four agents, four transcripts

Every run emits structured trace events. Switch between agent archetypes below to see the same trace system rendering four different workloads.

gollem ./research-agent · claude-sonnet-4-5

    

Each line is a trace event. Export to JSON, OpenTelemetry, or plug in your own TraceExporter.


02

Typed agents, typed results

Agent[T] is the central type. You define the output shape; gollem generates the JSON Schema, validates every model response against it, auto-repairs malformed output with a repair model, and hands you a typed Go struct.

goccopytype Analysis struct {
    Sentiment  string   `json:"sentiment"  jsonschema:"enum=positive|negative|neutral"`
    Keywords   []string `json:"keywords"   jsonschema:"description=Key topics"`
    Confidence float64  `json:"confidence" jsonschema:"minimum=0,maximum=1"`
}

agent := gollem.NewAgent[Analysis](model,
    gollem.WithSystemPrompt[Analysis]("You are a sentiment analyst."),
    gollem.WithOutputRepair[Analysis](gollem.ModelRepair[Analysis](repairModel)),
    gollem.WithOutputValidator[Analysis](func(a Analysis) error {
        if a.Confidence < 0 || a.Confidence > 1 {
            return fmt.Errorf("confidence out of range: %f", a.Confidence)
        }
        return nil
    }),
)

result, _ := agent.Run(ctx, "Analyze this earnings call transcript.")
fmt.Println(result.Output.Sentiment)    // string, not map[string]any
fmt.Println(result.Output.Confidence)   // float64, not interface{}

Schema generated from struct tags. No hand-written JSON schemas. No json.Unmarshal at the callsite. No type assertions.


03

Streaming with Go 1.23+ iterators

Four streaming modes share one interface. All expose as iter.Seq2[T, error]: no channels, no callbacks, no goroutine management.

live

Same stream, any mode. Switch based on latency budget or transport.

go · raw deltasccopystream, _ := agent.RunStream(ctx, "Write a story about a robot.")

// Raw incremental chunks as they arrive from the model.
for delta, err := range gollem.StreamTextDelta(stream) {
    if err != nil { return err }
    fmt.Print(delta)                   // "The " "robot " "powered " ...
}
go · accumulatedccopy// Growing accumulated text at each step. Ideal for React/UI updates.
for text, err := range gollem.StreamTextAccumulated(stream) {
    if err != nil { return err }
    updateUI(text)                    // "The " → "The robot " → "The robot powered "
}
go · debouncedccopy// Grouped delivery every 100ms. Grouping network frames for websocket clients.
for text, err := range gollem.StreamTextDebounced(stream, 100*time.Millisecond) {
    if err != nil { return err }
    sendToClient(text)                // fewer frames, still feels live
}
go · unifiedccopy// Single function with options. Switch modes without rewriting the loop.
for text, err := range gollem.StreamText(stream, gollem.StreamTextOptions{
    Mode:     gollem.StreamModeDebounced,
    Debounce: 100 * time.Millisecond,
}) {
    if err != nil { return err }
    handle(text)
}

04

Tools from typed functions

FuncTool[P] turns a typed Go function into a tool. Parameter schemas come from struct tags via reflection. Access typed dependencies through the run context: no globals, no singletons, no any.

goccopytype SearchParams struct {
    Query string `json:"query" jsonschema:"description=Search query"`
    Limit int    `json:"limit" jsonschema:"description=Max results,default=10"`
}

type AppDeps struct { DB *sql.DB; Cache *redis.Client }

searchTool := gollem.FuncTool[SearchParams](
    "search", "Search the knowledge base",
    func(ctx context.Context, rc *gollem.RunContext, p SearchParams) (string, error) {
        deps := gollem.GetDeps[*AppDeps](rc)    // compile-time type safe
        return doSearch(deps.DB, p.Query, p.Limit)
    },
)

agent := gollem.NewAgent[Report](model,
    gollem.WithTools[Report](searchTool),
    gollem.WithDeps[Report](&AppDeps{DB: db, Cache: cache}),
    gollem.WithDefaultToolTimeout[Report](10*time.Second),
    gollem.WithToolResultValidator[Report](nonEmpty),
)

Tool-choice control: Auto, Required, None, Force("name"), with optional auto-reset to prevent infinite loops.


05

Multi-agent orchestration

Three composition primitives. AgentTool for delegation. Handoff for sequential chains with context filters. Pipeline for parallel fan-out and conditional branching. For durable coordination across restarts, ext/orchestrator owns tasks, leases, schedulers, and artifact history.

go · pipelineccopy// One agent calls another as a tool.
orchestrator := gollem.NewAgent[FinalReport](model,
    gollem.WithTools[FinalReport](
        orchestration.AgentTool("research", "Delegate research", researcher),
    ),
)

// Pipeline with parallel steps and conditional branching.
pipe := gollem.NewPipeline(
    gollem.AgentStep(researcher),
    gollem.ParallelSteps(
        gollem.AgentStep(factChecker),
        gollem.AgentStep(editor),
    ),
    gollem.ConditionalStep(
        func(s string) bool { return len(s) > 5000 },
        gollem.AgentStep(summarizer),
        gollem.TransformStep(strings.TrimSpace),
    ),
)
go · team swarmccopyt := team.NewTeam(team.TeamConfig{
    Name:    "code-review",
    Leader:  "lead",
    Model:   model,
    Toolset: codingTools,
    PersonalityGenerator: modelutil.CachedPersonalityGenerator(
        modelutil.GeneratePersonality(model),
    ),
})

t.SpawnTeammate(ctx, "reviewer", "Review auth module for security vulnerabilities")
t.SpawnTeammate(ctx, "tester",   "Write comprehensive tests for the payment flow")
t.SpawnTeammate(ctx, "docs",     "Update API docs for the new endpoints")

leader := gollem.NewAgent[string](model, gollem.WithTools[string](team.LeaderTools(t)...))
result, _ := leader.Run(ctx, "Coordinate the review across all teammates.")

Each teammate runs as a goroutine with a fresh context window. The LLM itself writes the system prompt for each task; SHA256-keyed cache prevents redundant generations.


06

Code mode N tool calls, one round-trip

Traditional tool use is round-trip heavy: model asks, you execute, model waits, model asks again. Code mode ships all of your tools into an LLM-authored Python script that runs in a pure-Go WASM sandbox via monty-go. The model composes in one shot.

traditional   model ──► tool1 ──► model ──► tool2 ──► model ──► result
              3 model calls · 2 context refills · serial latency

code mode     model ──► python { tool1(); tool2(); } ──► result
              1 model call · 0 refills · parallel execution
goccopyimport "github.com/fugue-labs/gollem/ext/monty"

agent := gollem.NewAgent[Report](model,
    monty.AgentOptions(
        monty.WithTools(searchTool, fetchTool, citeTool),
    )...,
)

// The model writes a single Python script that calls N tools as functions.
// Runs in a WASM sandbox. No CGO, no containers, no subprocess.
result, _ := agent.Run(ctx, "Research and cite the top 5 papers on memory consolidation.")
python · what the model wroteread-only# gollem injects typed function stubs; the model chooses how to compose.
results = search(query="memory consolidation LLM", limit=10)
top = sorted(results, key=lambda r: r["score"], reverse=True)[:5]

# Parallel fetches in the sandbox; each call is a typed Go function.
docs = [fetch_url(url=r["url"]) for r in top]

final_result(
    summary="Consolidation requires decay scheduling ...",
    citations=[cite(doc=d) for d in docs],
)

One model round-trip. Up to N× fewer tokens than sequential tool use on branchy workloads. Sandbox timeout, memory cap, and import allowlist configurable.


07

Graph workflows typed state machines

When control flow outgrows linear pipelines, drop into ext/graph: typed state, conditional branches, fan-out / map-reduce, cycle detection, Mermaid export. Nodes and edges are type-checked at compile time.

  start  ──►  classify  ──►  { simple   ──►  answer  ──►  end
                          { complex  ──►  plan  ──►  fanout[3]
                                                            ├►  search
                                                            ├►  fetch
                                                            └►  analyze  ──►  merge  ──►  answer
goccopyg := graph.New[State]()
g.Node("classify", classifyFn).Edge("simple", simplePath).Edge("complex", complexPath)
g.FanOut("plan", searchNode, fetchNode, analyzeNode).Merge("merge", mergeFn)
g.Edge("merge", "answer")

if err := g.Validate(); err != nil {  // cycle detection at build time
    return err
}
fmt.Println(g.Mermaid())                // diagram for PRs
result, _ := g.Run(ctx, initialState)

08

Guardrails, cost, observability

Production concerns are first-class. Guardrails at every lifecycle stage. Cost tracked per run and cumulative. Middleware composes like HTTP middleware. First registered is outermost.

goccopytracker := gollem.NewCostTracker(map[string]gollem.ModelPricing{
    "claude-sonnet-4-5-20250929": {InputTokenCost: 0.003, OutputTokenCost: 0.015},
})

agent := gollem.NewAgent[Report](model,
    // Safety: validate prompts, turns, outputs.
    gollem.WithInputGuardrail[Report]("length", gollem.MaxPromptLength(10_000)),
    gollem.WithInputGuardrail[Report]("content", gollem.ContentFilter("ignore previous")),
    gollem.WithTurnGuardrail[Report]("turns", gollem.MaxTurns(20)),

    // Cost & usage.
    gollem.WithCostTracker[Report](tracker),
    gollem.WithUsageQuota[Report](gollem.UsageQuota{MaxRequests: 50, MaxTotalTokens: 100_000}),

    // Middleware: outer to inner.
    gollem.WithAgentMiddleware[Report](gollem.TimingMiddleware(metrics.RecordLatency)),
    gollem.WithAgentMiddleware[Report](gollem.LoggingMiddleware(log.Printf)),
    gollem.WithMessageInterceptor[Report](gollem.RedactPII(
        `\b\d{3}-\d{2}-\d{4}\b`, "[SSN REDACTED]",
    )),

    // Observability.
    gollem.WithTracing[Report](),
    gollem.WithTraceExporter[Report](gollem.NewJSONFileExporter("./traces")),
    gollem.WithRunCondition[Report](gollem.Or(
        gollem.MaxRunDuration(2*time.Minute),
        gollem.ToolCallCount(50),
    )),
)
GuardrailsMaxPromptLength, ContentFilter, MaxTurns, plus custom input / turn / output / tool-result validators.
MiddlewareTimingMiddleware, LoggingMiddleware, MaxTokensMiddleware, or write your own. Skip the model call entirely if you want.
InterceptorsRedactPII, AuditLog, or custom. Intercept before the message leaves your system; transform responses on the way back.
TracingStructured run traces with step-level detail. Exporters: JSON file, console, multi, OpenTelemetry middleware for metrics + distributed tracing.
HooksOnRunStart, OnRunEnd, OnModelRequest, OnModelResponse, OnToolStart, OnToolEnd.
Event busTyped pub/sub with Subscribe[E], Publish[E]. Built-in RunStartedEvent, ToolCalledEvent, RunCompletedEvent carry run IDs, parent IDs, timestamps.

09

Providers one interface, swap freely

All providers implement the same Model interface. Wrap any with retry, rate limiting, and caching. Switch the import and the agent code is unchanged.

go · anthropicccopyimport "github.com/fugue-labs/gollem/provider/anthropic"

// Reads ANTHROPIC_API_KEY from env.
claude := anthropic.New()

// Opt-in features.
claude = anthropic.New(
    anthropic.WithModel("claude-sonnet-4-5-20250929"),
    anthropic.WithExtendedThinking(anthropic.Thinking{Budget: 10_000}),
    anthropic.WithPromptCaching(),
)
go · openaiccopyimport "github.com/fugue-labs/gollem/provider/openai"

// Reads OPENAI_API_KEY from env.
gpt := openai.New()

// WebSocket continuation for tool-heavy loops (non-streaming).
gpt = openai.New(
    openai.WithModel("gpt-4o"),
    openai.WithTransport("websocket"),     // or OPENAI_TRANSPORT=websocket
    openai.WithJSONMode(),                     // native structured output
)
go · vertex aiccopyimport "github.com/fugue-labs/gollem/provider/vertexai"

// Uses GCP application default credentials.
gemini := vertexai.New("my-project", "us-central1")

gemini = vertexai.New("my-project", "us-central1",
    vertexai.WithModel("gemini-2.0-flash"),
    vertexai.WithJSONMode(),
)
go · vertex · anthropicccopyimport "github.com/fugue-labs/gollem/provider/vertexai_anthropic"

// Claude via Vertex: extended thinking + prompt caching + GCP auth.
vc := vertexai_anthropic.New("my-project", "us-east5",
    vertexai_anthropic.WithModel("claude-sonnet-4-5@20250929"),
    vertexai_anthropic.WithExtendedThinking(Thinking{Budget: 10_000}),
)
go · resilience wrappersccopy// Retry around rate-limit around cache around raw. Works for any Model.
resilient := gollem.NewRetryModel(
    gollem.NewRateLimitedModel(
        gollem.NewCachedModel(claude, gollem.NewMemoryCacheWithTTL(5*time.Minute)),
        10, 20, // rps, burst
    ),
    gollem.DefaultRetryConfig(),
)

// Or route by capability: same agent code, right model per prompt.
router := gollem.NewCapabilityRouter(
    []gollem.Model{fast, power, vision},
    gollem.ModelProfile{SupportsVision: true, SupportsToolCalls: true},
)
Capability Anthropic OpenAI Vertex AI Vertex · Anthropic
Structured output
Streaming
Tool use
Extended thinking
Prompt caching
Native JSON mode
Auth API keyAPI keyOAuth2 · GCPOAuth2 · GCP

10

Single binary ship the compiler's output, not a virtualenv

Gollem compiles to a statically-linked binary. Cross-compile to any OS/arch from any OS/arch. No runtime. No interpreter. No shared library resolution at startup.

$go build -o research-agent ./cmd/research $ls -lh research-agent -rwxr-xr-x 1 user staff 14M Apr 17 17:45 research-agent $file research-agent research-agent: Mach-O 64-bit executable arm64 $otool -L research-agent research-agent: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0) /usr/lib/libresolv.9.dylib (compatibility version 1.0.0) $GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o research-agent-linux ./cmd/research $scp research-agent-linux prod:/usr/local/bin/ research-agent-linux 100% 14MB 8.2MB/s 00:01 ✓ deployed.
StaticZero core dependencies. Linked against libSystem on macOS, nothing on Linux with CGO_ENABLED=0.
SmallTypical agent binary: ~14 MB with Anthropic + OpenAI + Vertex + monty. Strips to ~10 MB.
Cross-compileAny OS/arch → any OS/arch. Build on your laptop; deploy to Linux ARM servers, serverless, edge.
ObservabilityThe binary ships its own trace exporter, OTLP middleware, and structured logger. No sidecar required.

11

Testing without ever calling a real model

TestModel is a deterministic mock. Canned responses, call recording, per-invocation assertions. Swap with WithTestModel or Override in tests without touching the production agent definition.

goccopymodel := gollem.NewTestModel(
    gollem.ToolCallResponse("search", `{"query":"Go generics"}`),
    gollem.ToolCallResponse("final_result", `{"answer":"..."}`),
)

result, err := productionAgent.WithTestModel(model).Run(ctx, "prompt")

// Assert what the model saw.
calls := model.RecordedCalls()
assert.Len(t, calls, 2)
assert.Equal(t, "search", calls[0].ToolName)

12

Build your agent live configuration

Pick a provider, an output shape, and the features you need. A real compilable snippet regenerates live. Copy it and paste into a main.go. Nothing else to set up.

Configuration

Output
Provider
Features
Guardrails

The rest lives in the Go reference. Every public type has a docstring; every extension package has an example; every feature is tested.