gollem
The production agent framework for Go. Typed agents, structured output, streaming, and runtime traces for agents that edit files, wait on humans, delegate work, resume after crashes, fork from checkpoints, and prove the branch with diff/regression evidence. Zero core dependencies. Single binary.
What a run looks like four agents, four transcripts
Every run emits structured trace events. Switch between agent archetypes below to see the same trace system rendering four different workloads.
✓ Each line is a trace event. Export to JSON, OpenTelemetry, or plug in your own TraceExporter.
Trace workbench real agents, real state
Span dashboards tell you what happened. Gollem traces are runtime artifacts: the messages, tools, approvals, snapshots, file diffs, topology, evaluator results, and fork provenance needed to operate on the run after it has already done real work.
same worker, same model boundary, same failing command
causal path still shared
baseline path keeps stale snapshot messages
fork path uses RunContext messages
1 hunk, 1 semantic field restored
no unrelated file churn
eval +0.49 · cost -18% · retries -2
tests pass, branch accepted
What did the worker actually send?
The trace captures the request after middleware, history processors, message interceptors, retries, and delegate routing shaped it. In a Kubernetes or Temporal fleet, this is the exact boundary emitted by the worker that made the call.
Replay applies recorded model/tool boundaries to reconstructed state. It does not pretend model sampling is deterministic.
Which real command broke the run?
Every tool call stays paired with the result, elapsed time, error payload, approval outcome, workspace, and root/delegate lineage. The failure is tied to the terminal action, not reconstructed from logs later.
Where can the real run resume?
Checkpoints make branch points operational. Resume the same run after SIGTERM, or fork from a step, event ID, checkpoint ID, or event kind and continue as a fresh trace segment.
What is waiting on a human?
Approvals, sleeps, deferred work, and Temporal waits are first-class boundaries. A live workflow can export as waiting, not failed, with the unresolved decision still visible.
What did the agent mutate?
Artifact events turn filesystem mutations into evidence: path, operation, tool call, before/after hashes, omission reasons, and bounded unified diff hunks when text content is safe to capture.
@@ -69,7 +69,8 @@ func Snapshot(rc RunContext) *RunSnapshot { - snap := state.snapshot(rc.AgentID, rc.RunID) + snap := state.snapshot(rc.AgentID, rc.RunID) + snap.Messages = append([]Message(nil), rc.Messages...) return snap
Did the branch deserve to live?
Diff and regression reports compare baseline and forked traces by divergence, final output, usage, cost, topology, evaluator score, retries, errors, and artifacts. The output is review evidence, not a vibes-based rerun.
gollem run --trace-out
inspectgollem trace view
branchgollem trace fork --continue
comparegollem trace diff
gategollem trace regress
sh · trace workflowccopy$ gollem run --trace-out base.trace.json "fix the failing auth test" $ gollem trace view base.trace.json $ gollem trace fork base.trace.json --from-checkpoint snap_13h42m \ --append-user "preserve RunContext messages in snapshots" \ --continue --out fork.trace.json $ gollem trace diff base.trace.json fork.trace.json $ gollem trace regress base.trace.json fork.trace.json --require-status succeeded
gollem.trace.v1 is the single trace format across local CLI, SDK runs, Temporal status export, team/delegate runs, and dashboard artifacts.Typed agents, typed results
Agent[T] is the central type. You define the output shape; gollem generates the JSON Schema, validates every model response against it, auto-repairs malformed output with a repair model, and hands you a typed Go struct.
goccopytype Analysis struct { Sentiment string `json:"sentiment" jsonschema:"enum=positive|negative|neutral"` Keywords []string `json:"keywords" jsonschema:"description=Key topics"` Confidence float64 `json:"confidence" jsonschema:"minimum=0,maximum=1"` } agent := gollem.NewAgent[Analysis](model, gollem.WithSystemPrompt[Analysis]("You are a sentiment analyst."), gollem.WithOutputRepair[Analysis](gollem.ModelRepair[Analysis](repairModel)), gollem.WithOutputValidator[Analysis](func(a Analysis) error { if a.Confidence < 0 || a.Confidence > 1 { return fmt.Errorf("confidence out of range: %f", a.Confidence) } return nil }), ) result, _ := agent.Run(ctx, "Analyze this earnings call transcript.") fmt.Println(result.Output.Sentiment) // string, not map[string]any fmt.Println(result.Output.Confidence) // float64, not interface{}
Schema generated from struct tags. No hand-written JSON schemas. No json.Unmarshal at the callsite. No type assertions.
Streaming with Go 1.23+ iterators
Four streaming modes share one interface. All expose as iter.Seq2[T, error]: no channels, no callbacks, no goroutine management.
Same stream, any mode. Switch based on latency budget or transport.
go · raw deltasccopystream, _ := agent.RunStream(ctx, "Write a story about a robot.") // Raw incremental chunks as they arrive from the model. for delta, err := range gollem.StreamTextDelta(stream) { if err != nil { return err } fmt.Print(delta) // "The " "robot " "powered " ... }
go · accumulatedccopy// Growing accumulated text at each step. Ideal for React/UI updates. for text, err := range gollem.StreamTextAccumulated(stream) { if err != nil { return err } updateUI(text) // "The " → "The robot " → "The robot powered " }
go · debouncedccopy// Grouped delivery every 100ms. Grouping network frames for websocket clients. for text, err := range gollem.StreamTextDebounced(stream, 100*time.Millisecond) { if err != nil { return err } sendToClient(text) // fewer frames, still feels live }
go · unifiedccopy// Single function with options. Switch modes without rewriting the loop. for text, err := range gollem.StreamText(stream, gollem.StreamTextOptions{ Mode: gollem.StreamModeDebounced, Debounce: 100 * time.Millisecond, }) { if err != nil { return err } handle(text) }
Tools from typed functions
FuncTool[P] turns a typed Go function into a tool. Parameter schemas come from struct tags via reflection. Access typed dependencies through the run context: no globals, no singletons, no any.
goccopytype SearchParams struct { Query string `json:"query" jsonschema:"description=Search query"` Limit int `json:"limit" jsonschema:"description=Max results,default=10"` } type AppDeps struct { DB *sql.DB; Cache *redis.Client } searchTool := gollem.FuncTool[SearchParams]( "search", "Search the knowledge base", func(ctx context.Context, rc *gollem.RunContext, p SearchParams) (string, error) { deps := gollem.GetDeps[*AppDeps](rc) // compile-time type safe return doSearch(deps.DB, p.Query, p.Limit) }, ) agent := gollem.NewAgent[Report](model, gollem.WithTools[Report](searchTool), gollem.WithDeps[Report](&AppDeps{DB: db, Cache: cache}), gollem.WithDefaultToolTimeout[Report](10*time.Second), gollem.WithToolResultValidator[Report](nonEmpty), )
Tool-choice control: Auto, Required, None, Force("name"), with optional auto-reset to prevent infinite loops.
Multi-agent orchestration
Three composition primitives. AgentTool for delegation. Handoff for sequential chains with context filters. Pipeline for parallel fan-out and conditional branching. For durable coordination across restarts, ext/orchestrator owns tasks, leases, schedulers, and artifact history.
go · pipelineccopy// One agent calls another as a tool. orchestrator := gollem.NewAgent[FinalReport](model, gollem.WithTools[FinalReport]( orchestration.AgentTool("research", "Delegate research", researcher), ), ) // Pipeline with parallel steps and conditional branching. pipe := gollem.NewPipeline( gollem.AgentStep(researcher), gollem.ParallelSteps( gollem.AgentStep(factChecker), gollem.AgentStep(editor), ), gollem.ConditionalStep( func(s string) bool { return len(s) > 5000 }, gollem.AgentStep(summarizer), gollem.TransformStep(strings.TrimSpace), ), )
go · team swarmccopyt := team.NewTeam(team.TeamConfig{ Name: "code-review", Leader: "lead", Model: model, Toolset: codingTools, PersonalityGenerator: modelutil.CachedPersonalityGenerator( modelutil.GeneratePersonality(model), ), }) t.SpawnTeammate(ctx, "reviewer", "Review auth module for security vulnerabilities") t.SpawnTeammate(ctx, "tester", "Write comprehensive tests for the payment flow") t.SpawnTeammate(ctx, "docs", "Update API docs for the new endpoints") leader := gollem.NewAgent[string](model, gollem.WithTools[string](team.LeaderTools(t)...)) result, _ := leader.Run(ctx, "Coordinate the review across all teammates.")
Each teammate runs as a goroutine with a fresh context window. The LLM itself writes the system prompt for each task; SHA256-keyed cache prevents redundant generations.
Code mode N tool calls, one round-trip
Traditional tool use is round-trip heavy: model asks, you execute, model waits, model asks again. Code mode ships all of your tools into an LLM-authored Python script that runs in a pure-Go WASM sandbox via monty-go. The model composes in one shot.
traditional model ──► tool1 ──► model ──► tool2 ──► model ──► result 3 model calls · 2 context refills · serial latency code mode model ──► python { tool1(); tool2(); } ──► result 1 model call · 0 refills · parallel execution
goccopyimport "github.com/fugue-labs/gollem/ext/monty" agent := gollem.NewAgent[Report](model, monty.AgentOptions( monty.WithTools(searchTool, fetchTool, citeTool), )..., ) // The model writes a single Python script that calls N tools as functions. // Runs in a WASM sandbox. No CGO, no containers, no subprocess. result, _ := agent.Run(ctx, "Research and cite the top 5 papers on memory consolidation.")
python · what the model wroteread-only# gollem injects typed function stubs; the model chooses how to compose. results = search(query="memory consolidation LLM", limit=10) top = sorted(results, key=lambda r: r["score"], reverse=True)[:5] # Parallel fetches in the sandbox; each call is a typed Go function. docs = [fetch_url(url=r["url"]) for r in top] final_result( summary="Consolidation requires decay scheduling ...", citations=[cite(doc=d) for d in docs], )
One model round-trip. Up to N× fewer tokens than sequential tool use on branchy workloads. Sandbox timeout, memory cap, and import allowlist configurable.
Graph workflows typed state machines
When control flow outgrows linear pipelines, drop into ext/graph: typed state, conditional branches, fan-out / map-reduce, cycle detection, Mermaid export. Nodes and edges are type-checked at compile time.
start ──► classify ──► { simple ──► answer ──► end { complex ──► plan ──► fanout[3] ├► search ├► fetch └► analyze ──► merge ──► answer
goccopyg := graph.New[State]() g.Node("classify", classifyFn).Edge("simple", simplePath).Edge("complex", complexPath) g.FanOut("plan", searchNode, fetchNode, analyzeNode).Merge("merge", mergeFn) g.Edge("merge", "answer") if err := g.Validate(); err != nil { // cycle detection at build time return err } fmt.Println(g.Mermaid()) // diagram for PRs result, _ := g.Run(ctx, initialState)
Guardrails, cost, observability
Production concerns are first-class. Guardrails at every lifecycle stage. Cost tracked per run and cumulative. Middleware composes like HTTP middleware. First registered is outermost.
goccopytracker := gollem.NewCostTracker(map[string]gollem.ModelPricing{ "claude-sonnet-4-5-20250929": {InputTokenCost: 0.003, OutputTokenCost: 0.015}, }) agent := gollem.NewAgent[Report](model, // Safety: validate prompts, turns, outputs. gollem.WithInputGuardrail[Report]("length", gollem.MaxPromptLength(10_000)), gollem.WithInputGuardrail[Report]("content", gollem.ContentFilter("ignore previous")), gollem.WithTurnGuardrail[Report]("turns", gollem.MaxTurns(20)), // Cost & usage. gollem.WithCostTracker[Report](tracker), gollem.WithUsageQuota[Report](gollem.UsageQuota{MaxRequests: 50, MaxTotalTokens: 100_000}), // Middleware: outer to inner. gollem.WithAgentMiddleware[Report](gollem.TimingMiddleware(metrics.RecordLatency)), gollem.WithAgentMiddleware[Report](gollem.LoggingMiddleware(log.Printf)), gollem.WithMessageInterceptor[Report](gollem.RedactPII( `\b\d{3}-\d{2}-\d{4}\b`, "[SSN REDACTED]", )), // Observability. gollem.WithTracing[Report](), gollem.WithTraceExporter[Report](gollem.NewJSONFileExporter("./traces")), gollem.WithRunCondition[Report](gollem.Or( gollem.MaxRunDuration(2*time.Minute), gollem.ToolCallCount(50), )), )
MaxPromptLength, ContentFilter, MaxTurns, plus custom input / turn / output / tool-result validators.TimingMiddleware, LoggingMiddleware, MaxTokensMiddleware, or write your own. Skip the model call entirely if you want.RedactPII, AuditLog, or custom. Intercept before the message leaves your system; transform responses on the way back.OnRunStart, OnRunEnd, OnModelRequest, OnModelResponse, OnToolStart, OnToolEnd.Subscribe[E], Publish[E]. Built-in RunStartedEvent, ToolCalledEvent, RunCompletedEvent carry run IDs, parent IDs, timestamps.Providers one interface, swap freely
All providers implement the same Model interface. Wrap any with retry, rate limiting, and caching. Switch the import and the agent code is unchanged.
go · anthropicccopyimport "github.com/fugue-labs/gollem/provider/anthropic" // Reads ANTHROPIC_API_KEY from env. claude := anthropic.New() // Opt-in features. claude = anthropic.New( anthropic.WithModel("claude-sonnet-4-5-20250929"), anthropic.WithExtendedThinking(anthropic.Thinking{Budget: 10_000}), anthropic.WithPromptCaching(), )
go · openaiccopyimport "github.com/fugue-labs/gollem/provider/openai" // Reads OPENAI_API_KEY from env. gpt := openai.New() // WebSocket continuation for tool-heavy loops (non-streaming). gpt = openai.New( openai.WithModel("gpt-4o"), openai.WithTransport("websocket"), // or OPENAI_TRANSPORT=websocket openai.WithJSONMode(), // native structured output )
go · vertex aiccopyimport "github.com/fugue-labs/gollem/provider/vertexai" // Uses GCP application default credentials. gemini := vertexai.New("my-project", "us-central1") gemini = vertexai.New("my-project", "us-central1", vertexai.WithModel("gemini-2.0-flash"), vertexai.WithJSONMode(), )
go · vertex · anthropicccopyimport "github.com/fugue-labs/gollem/provider/vertexai_anthropic" // Claude via Vertex: extended thinking + prompt caching + GCP auth. vc := vertexai_anthropic.New("my-project", "us-east5", vertexai_anthropic.WithModel("claude-sonnet-4-5@20250929"), vertexai_anthropic.WithExtendedThinking(Thinking{Budget: 10_000}), )
go · resilience wrappersccopy// Retry around rate-limit around cache around raw. Works for any Model. resilient := gollem.NewRetryModel( gollem.NewRateLimitedModel( gollem.NewCachedModel(claude, gollem.NewMemoryCacheWithTTL(5*time.Minute)), 10, 20, // rps, burst ), gollem.DefaultRetryConfig(), ) // Or route by capability: same agent code, right model per prompt. router := gollem.NewCapabilityRouter( []gollem.Model{fast, power, vision}, gollem.ModelProfile{SupportsVision: true, SupportsToolCalls: true}, )
| Capability | Anthropic | OpenAI | Vertex AI | Vertex · Anthropic |
|---|---|---|---|---|
| Structured output | ● | ● | ● | ● |
| Streaming | ● | ● | ● | ● |
| Tool use | ● | ● | ● | ● |
| Extended thinking | ● | ○ | ○ | ● |
| Prompt caching | ● | ○ | ○ | ● |
| Native JSON mode | ○ | ● | ● | ○ |
| Auth | API key | API key | OAuth2 · GCP | OAuth2 · GCP |
Single binary ship the compiler's output, not a virtualenv
Gollem compiles to a statically-linked binary. Cross-compile to any OS/arch from any OS/arch. No runtime. No interpreter. No shared library resolution at startup.
CGO_ENABLED=0.Testing without ever calling a real model
TestModel is a deterministic mock. Canned responses, call recording, per-invocation assertions. Swap with WithTestModel or Override in tests without touching the production agent definition.
goccopymodel := gollem.NewTestModel( gollem.ToolCallResponse("search", `{"query":"Go generics"}`), gollem.ToolCallResponse("final_result", `{"answer":"..."}`), ) result, err := productionAgent.WithTestModel(model).Run(ctx, "prompt") // Assert what the model saw. calls := model.RecordedCalls() assert.Len(t, calls, 2) assert.Equal(t, "search", calls[0].ToolName)
Build your agent live configuration
Pick a provider, an output shape, and the features you need. A real compilable snippet regenerates live. Copy it and paste into a main.go. Nothing else to set up.
Configuration
The rest lives in the Go reference. Every public type has a docstring; every extension package has an example; every feature is tested.