Fugue Labs · Open Source

gollem

The production agent framework for Go. Typed agents, structured output, streaming, guardrails, cost tracking, multi-agent swarms, code mode, graph workflows. Zero core dependencies. Single binary.

$ go get github.com/fugue-labs/gollem Star
01

What a run looks like — four agents, four transcripts

Every run emits structured trace events. Switch between agent archetypes below to see the same trace system rendering four different workloads.

gollem ./research-agent · claude-sonnet-4-5

    

Each line is a trace event. Export to JSON, OpenTelemetry, or plug in your own TraceExporter.


02

Typed agents, typed results

Agent[T] is the central type. You define the output shape; gollem generates the JSON Schema, validates every model response against it, auto-repairs malformed output with a repair model, and hands you a typed Go struct.

goccopytype Analysis struct {
    Sentiment  string   `json:"sentiment"  jsonschema:"enum=positive|negative|neutral"`
    Keywords   []string `json:"keywords"   jsonschema:"description=Key topics"`
    Confidence float64  `json:"confidence" jsonschema:"minimum=0,maximum=1"`
}

agent := gollem.NewAgent[Analysis](model,
    gollem.WithSystemPrompt[Analysis]("You are a sentiment analyst."),
    gollem.WithOutputRepair[Analysis](gollem.ModelRepair[Analysis](repairModel)),
    gollem.WithOutputValidator[Analysis](func(a Analysis) error {
        if a.Confidence < 0 || a.Confidence > 1 {
            return fmt.Errorf("confidence out of range: %f", a.Confidence)
        }
        return nil
    }),
)

result, _ := agent.Run(ctx, "Analyze this earnings call transcript.")
fmt.Println(result.Output.Sentiment)    // string, not map[string]any
fmt.Println(result.Output.Confidence)   // float64, not interface{}

Schema generated from struct tags. No hand-written JSON schemas. No json.Unmarshal at the callsite. No type assertions.


03

Streaming with Go 1.23+ iterators

Four streaming modes share one interface. All expose as iter.Seq2[T, error] — no channels, no callbacks, no goroutine management.

live

Same stream, any mode. Switch based on latency budget or transport.

go · raw deltasccopystream, _ := agent.RunStream(ctx, "Write a story about a robot.")

// Raw incremental chunks as they arrive from the model.
for delta, err := range gollem.StreamTextDelta(stream) {
    if err != nil { return err }
    fmt.Print(delta)                   // "The " "robot " "powered " ...
}
go · accumulatedccopy// Growing accumulated text at each step — ideal for React/UI updates.
for text, err := range gollem.StreamTextAccumulated(stream) {
    if err != nil { return err }
    updateUI(text)                    // "The " → "The robot " → "The robot powered "
}
go · debouncedccopy// Grouped delivery every 100ms — grouping network frames for websocket clients.
for text, err := range gollem.StreamTextDebounced(stream, 100*time.Millisecond) {
    if err != nil { return err }
    sendToClient(text)                // fewer frames, still feels live
}
go · unifiedccopy// Single function with options — switch modes without rewriting the loop.
for text, err := range gollem.StreamText(stream, gollem.StreamTextOptions{
    Mode:     gollem.StreamModeDebounced,
    Debounce: 100 * time.Millisecond,
}) {
    if err != nil { return err }
    handle(text)
}

04

Tools from typed functions

FuncTool[P] turns a typed Go function into a tool. Parameter schemas come from struct tags via reflection. Access typed dependencies through the run context — no globals, no singletons, no any.

goccopytype SearchParams struct {
    Query string `json:"query" jsonschema:"description=Search query"`
    Limit int    `json:"limit" jsonschema:"description=Max results,default=10"`
}

type AppDeps struct { DB *sql.DB; Cache *redis.Client }

searchTool := gollem.FuncTool[SearchParams](
    "search", "Search the knowledge base",
    func(ctx context.Context, rc *gollem.RunContext, p SearchParams) (string, error) {
        deps := gollem.GetDeps[*AppDeps](rc)    // compile-time type safe
        return doSearch(deps.DB, p.Query, p.Limit)
    },
)

agent := gollem.NewAgent[Report](model,
    gollem.WithTools[Report](searchTool),
    gollem.WithDeps[Report](&AppDeps{DB: db, Cache: cache}),
    gollem.WithDefaultToolTimeout[Report](10*time.Second),
    gollem.WithToolResultValidator[Report](nonEmpty),
)

Tool-choice control: Auto, Required, None, Force("name"), with optional auto-reset to prevent infinite loops.


05

Multi-agent orchestration

Three composition primitives. AgentTool for delegation. Handoff for sequential chains with context filters. Pipeline for parallel fan-out and conditional branching. For durable coordination across restarts, ext/orchestrator owns tasks, leases, schedulers, and artifact history.

go · pipelineccopy// One agent calls another as a tool.
orchestrator := gollem.NewAgent[FinalReport](model,
    gollem.WithTools[FinalReport](
        orchestration.AgentTool("research", "Delegate research", researcher),
    ),
)

// Pipeline with parallel steps and conditional branching.
pipe := gollem.NewPipeline(
    gollem.AgentStep(researcher),
    gollem.ParallelSteps(
        gollem.AgentStep(factChecker),
        gollem.AgentStep(editor),
    ),
    gollem.ConditionalStep(
        func(s string) bool { return len(s) > 5000 },
        gollem.AgentStep(summarizer),
        gollem.TransformStep(strings.TrimSpace),
    ),
)
go · team swarmccopyt := team.NewTeam(team.TeamConfig{
    Name:    "code-review",
    Leader:  "lead",
    Model:   model,
    Toolset: codingTools,
    PersonalityGenerator: modelutil.CachedPersonalityGenerator(
        modelutil.GeneratePersonality(model),
    ),
})

t.SpawnTeammate(ctx, "reviewer", "Review auth module for security vulnerabilities")
t.SpawnTeammate(ctx, "tester",   "Write comprehensive tests for the payment flow")
t.SpawnTeammate(ctx, "docs",     "Update API docs for the new endpoints")

leader := gollem.NewAgent[string](model, gollem.WithTools[string](team.LeaderTools(t)...))
result, _ := leader.Run(ctx, "Coordinate the review across all teammates.")

Each teammate runs as a goroutine with a fresh context window. The LLM itself writes the system prompt for each task; SHA256-keyed cache prevents redundant generations.


06

Code mode — N tool calls, one round-trip

Traditional tool use is round-trip heavy: model asks, you execute, model waits, model asks again. Code mode ships all of your tools into an LLM-authored Python script that runs in a pure-Go WASM sandbox via monty-go. The model composes in one shot.

traditional   model ──► tool1 ──► model ──► tool2 ──► model ──► result
              3 model calls · 2 context refills · serial latency

code mode     model ──► python { tool1(); tool2(); } ──► result
              1 model call · 0 refills · parallel execution
goccopyimport "github.com/fugue-labs/gollem/ext/monty"

agent := gollem.NewAgent[Report](model,
    monty.AgentOptions(
        monty.WithTools(searchTool, fetchTool, citeTool),
    )...,
)

// The model writes a single Python script that calls N tools as functions.
// Runs in a WASM sandbox. No CGO, no containers, no subprocess.
result, _ := agent.Run(ctx, "Research and cite the top 5 papers on memory consolidation.")
python · what the model wroteread-only# gollem injects typed function stubs; the model chooses how to compose.
results = search(query="memory consolidation LLM", limit=10)
top = sorted(results, key=lambda r: r["score"], reverse=True)[:5]

# Parallel fetches in the sandbox; each call is a typed Go function.
docs = [fetch_url(url=r["url"]) for r in top]

final_result(
    summary="Consolidation requires decay scheduling ...",
    citations=[cite(doc=d) for d in docs],
)

One model round-trip. Up to N× fewer tokens than sequential tool use on branchy workloads. Sandbox timeout, memory cap, and import allowlist configurable.


07

Graph workflows — typed state machines

When control flow outgrows linear pipelines, drop into ext/graph: typed state, conditional branches, fan-out / map-reduce, cycle detection, Mermaid export. Nodes and edges are type-checked at compile time.

  start  ──►  classify  ──►  { simple   ──►  answer  ──►  end
                          { complex  ──►  plan  ──►  fanout[3]
                                                            ├►  search
                                                            ├►  fetch
                                                            └►  analyze  ──►  merge  ──►  answer
goccopyg := graph.New[State]()
g.Node("classify", classifyFn).Edge("simple", simplePath).Edge("complex", complexPath)
g.FanOut("plan", searchNode, fetchNode, analyzeNode).Merge("merge", mergeFn)
g.Edge("merge", "answer")

if err := g.Validate(); err != nil {  // cycle detection at build time
    return err
}
fmt.Println(g.Mermaid())                // diagram for PRs
result, _ := g.Run(ctx, initialState)

08

Guardrails, cost, observability

Production concerns are first-class. Guardrails at every lifecycle stage. Cost tracked per run and cumulative. Middleware composes like HTTP middleware — first registered is outermost.

goccopytracker := gollem.NewCostTracker(map[string]gollem.ModelPricing{
    "claude-sonnet-4-5-20250929": {InputTokenCost: 0.003, OutputTokenCost: 0.015},
})

agent := gollem.NewAgent[Report](model,
    // Safety — validate prompts, turns, outputs.
    gollem.WithInputGuardrail[Report]("length", gollem.MaxPromptLength(10_000)),
    gollem.WithInputGuardrail[Report]("content", gollem.ContentFilter("ignore previous")),
    gollem.WithTurnGuardrail[Report]("turns", gollem.MaxTurns(20)),

    // Cost & usage.
    gollem.WithCostTracker[Report](tracker),
    gollem.WithUsageQuota[Report](gollem.UsageQuota{MaxRequests: 50, MaxTotalTokens: 100_000}),

    // Middleware — outer to inner.
    gollem.WithAgentMiddleware[Report](gollem.TimingMiddleware(metrics.RecordLatency)),
    gollem.WithAgentMiddleware[Report](gollem.LoggingMiddleware(log.Printf)),
    gollem.WithMessageInterceptor[Report](gollem.RedactPII(
        `\b\d{3}-\d{2}-\d{4}\b`, "[SSN REDACTED]",
    )),

    // Observability.
    gollem.WithTracing[Report](),
    gollem.WithTraceExporter[Report](gollem.NewJSONFileExporter("./traces")),
    gollem.WithRunCondition[Report](gollem.Or(
        gollem.MaxRunDuration(2*time.Minute),
        gollem.ToolCallCount(50),
    )),
)
GuardrailsMaxPromptLength, ContentFilter, MaxTurns, plus custom input / turn / output / tool-result validators.
MiddlewareTimingMiddleware, LoggingMiddleware, MaxTokensMiddleware, or write your own — skip the model call entirely if you want.
InterceptorsRedactPII, AuditLog, or custom. Intercept before the message leaves your system; transform responses on the way back.
TracingStructured run traces with step-level detail. Exporters: JSON file, console, multi, OpenTelemetry middleware for metrics + distributed tracing.
HooksOnRunStart, OnRunEnd, OnModelRequest, OnModelResponse, OnToolStart, OnToolEnd.
Event busTyped pub/sub with Subscribe[E], Publish[E]. Built-in RunStartedEvent, ToolCalledEvent, RunCompletedEvent carry run IDs, parent IDs, timestamps.

09

Providers — one interface, swap freely

All providers implement the same Model interface. Wrap any with retry, rate limiting, and caching. Switch the import and the agent code is unchanged.

go · anthropicccopyimport "github.com/fugue-labs/gollem/provider/anthropic"

// Reads ANTHROPIC_API_KEY from env.
claude := anthropic.New()

// Opt-in features.
claude = anthropic.New(
    anthropic.WithModel("claude-sonnet-4-5-20250929"),
    anthropic.WithExtendedThinking(anthropic.Thinking{Budget: 10_000}),
    anthropic.WithPromptCaching(),
)
go · openaiccopyimport "github.com/fugue-labs/gollem/provider/openai"

// Reads OPENAI_API_KEY from env.
gpt := openai.New()

// WebSocket continuation for tool-heavy loops (non-streaming).
gpt = openai.New(
    openai.WithModel("gpt-4o"),
    openai.WithTransport("websocket"),     // or OPENAI_TRANSPORT=websocket
    openai.WithJSONMode(),                     // native structured output
)
go · vertex aiccopyimport "github.com/fugue-labs/gollem/provider/vertexai"

// Uses GCP application default credentials.
gemini := vertexai.New("my-project", "us-central1")

gemini = vertexai.New("my-project", "us-central1",
    vertexai.WithModel("gemini-2.0-flash"),
    vertexai.WithJSONMode(),
)
go · vertex · anthropicccopyimport "github.com/fugue-labs/gollem/provider/vertexai_anthropic"

// Claude via Vertex — extended thinking + prompt caching + GCP auth.
vc := vertexai_anthropic.New("my-project", "us-east5",
    vertexai_anthropic.WithModel("claude-sonnet-4-5@20250929"),
    vertexai_anthropic.WithExtendedThinking(Thinking{Budget: 10_000}),
)
go · resilience wrappersccopy// Retry around rate-limit around cache around raw. Works for any Model.
resilient := gollem.NewRetryModel(
    gollem.NewRateLimitedModel(
        gollem.NewCachedModel(claude, gollem.NewMemoryCacheWithTTL(5*time.Minute)),
        10, 20, // rps, burst
    ),
    gollem.DefaultRetryConfig(),
)

// Or route by capability — same agent code, right model per prompt.
router := gollem.NewCapabilityRouter(
    []gollem.Model{fast, power, vision},
    gollem.ModelProfile{SupportsVision: true, SupportsToolCalls: true},
)
Capability Anthropic OpenAI Vertex AI Vertex · Anthropic
Structured output
Streaming
Tool use
Extended thinking
Prompt caching
Native JSON mode
Auth API keyAPI keyOAuth2 · GCPOAuth2 · GCP

10

Single binary — ship the compiler's output, not a virtualenv

Gollem compiles to a statically-linked binary. Cross-compile to any OS/arch from any OS/arch. No runtime. No interpreter. No shared library resolution at startup.

$go build -o research-agent ./cmd/research $ls -lh research-agent -rwxr-xr-x 1 user staff 14M Apr 17 17:45 research-agent $file research-agent research-agent: Mach-O 64-bit executable arm64 $otool -L research-agent research-agent: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0) /usr/lib/libresolv.9.dylib (compatibility version 1.0.0) $GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o research-agent-linux ./cmd/research $scp research-agent-linux prod:/usr/local/bin/ research-agent-linux 100% 14MB 8.2MB/s 00:01 ✓ deployed.
StaticZero core dependencies. Linked against libSystem on macOS, nothing on Linux with CGO_ENABLED=0.
SmallTypical agent binary: ~14 MB with Anthropic + OpenAI + Vertex + monty. Strips to ~10 MB.
Cross-compileAny OS/arch → any OS/arch. Build on your laptop; deploy to Linux ARM servers, serverless, edge.
ObservabilityThe binary ships its own trace exporter, OTLP middleware, and structured logger. No sidecar required.

11

Testing — without ever calling a real model

TestModel is a deterministic mock. Canned responses, call recording, per-invocation assertions. Swap with WithTestModel or Override in tests without touching the production agent definition.

goccopymodel := gollem.NewTestModel(
    gollem.ToolCallResponse("search", `{"query":"Go generics"}`),
    gollem.ToolCallResponse("final_result", `{"answer":"..."}`),
)

result, err := productionAgent.WithTestModel(model).Run(ctx, "prompt")

// Assert what the model saw.
calls := model.RecordedCalls()
assert.Len(t, calls, 2)
assert.Equal(t, "search", calls[0].ToolName)

12

Build your agent — live configuration

Pick a provider, an output shape, and the features you need. A real compilable snippet regenerates live. Copy it and paste into a main.go — nothing else to set up.

Configuration

Output
Provider
Features
Guardrails

The rest lives in the Go reference. Every public type has a docstring; every extension package has an example; every feature is tested.