gollem
The production agent framework for Go. Typed agents, structured output, streaming, guardrails, cost tracking, multi-agent swarms, code mode, graph workflows. Zero core dependencies. Single binary.
What a run looks like — four agents, four transcripts
Every run emits structured trace events. Switch between agent archetypes below to see the same trace system rendering four different workloads.
✓ Each line is a trace event. Export to JSON, OpenTelemetry, or plug in your own TraceExporter.
Typed agents, typed results
Agent[T] is the central type. You define the output shape; gollem generates the JSON Schema, validates every model response against it, auto-repairs malformed output with a repair model, and hands you a typed Go struct.
goccopytype Analysis struct { Sentiment string `json:"sentiment" jsonschema:"enum=positive|negative|neutral"` Keywords []string `json:"keywords" jsonschema:"description=Key topics"` Confidence float64 `json:"confidence" jsonschema:"minimum=0,maximum=1"` } agent := gollem.NewAgent[Analysis](model, gollem.WithSystemPrompt[Analysis]("You are a sentiment analyst."), gollem.WithOutputRepair[Analysis](gollem.ModelRepair[Analysis](repairModel)), gollem.WithOutputValidator[Analysis](func(a Analysis) error { if a.Confidence < 0 || a.Confidence > 1 { return fmt.Errorf("confidence out of range: %f", a.Confidence) } return nil }), ) result, _ := agent.Run(ctx, "Analyze this earnings call transcript.") fmt.Println(result.Output.Sentiment) // string, not map[string]any fmt.Println(result.Output.Confidence) // float64, not interface{}
Schema generated from struct tags. No hand-written JSON schemas. No json.Unmarshal at the callsite. No type assertions.
Streaming with Go 1.23+ iterators
Four streaming modes share one interface. All expose as iter.Seq2[T, error] — no channels, no callbacks, no goroutine management.
Same stream, any mode. Switch based on latency budget or transport.
go · raw deltasccopystream, _ := agent.RunStream(ctx, "Write a story about a robot.") // Raw incremental chunks as they arrive from the model. for delta, err := range gollem.StreamTextDelta(stream) { if err != nil { return err } fmt.Print(delta) // "The " "robot " "powered " ... }
go · accumulatedccopy// Growing accumulated text at each step — ideal for React/UI updates. for text, err := range gollem.StreamTextAccumulated(stream) { if err != nil { return err } updateUI(text) // "The " → "The robot " → "The robot powered " }
go · debouncedccopy// Grouped delivery every 100ms — grouping network frames for websocket clients. for text, err := range gollem.StreamTextDebounced(stream, 100*time.Millisecond) { if err != nil { return err } sendToClient(text) // fewer frames, still feels live }
go · unifiedccopy// Single function with options — switch modes without rewriting the loop. for text, err := range gollem.StreamText(stream, gollem.StreamTextOptions{ Mode: gollem.StreamModeDebounced, Debounce: 100 * time.Millisecond, }) { if err != nil { return err } handle(text) }
Tools from typed functions
FuncTool[P] turns a typed Go function into a tool. Parameter schemas come from struct tags via reflection. Access typed dependencies through the run context — no globals, no singletons, no any.
goccopytype SearchParams struct { Query string `json:"query" jsonschema:"description=Search query"` Limit int `json:"limit" jsonschema:"description=Max results,default=10"` } type AppDeps struct { DB *sql.DB; Cache *redis.Client } searchTool := gollem.FuncTool[SearchParams]( "search", "Search the knowledge base", func(ctx context.Context, rc *gollem.RunContext, p SearchParams) (string, error) { deps := gollem.GetDeps[*AppDeps](rc) // compile-time type safe return doSearch(deps.DB, p.Query, p.Limit) }, ) agent := gollem.NewAgent[Report](model, gollem.WithTools[Report](searchTool), gollem.WithDeps[Report](&AppDeps{DB: db, Cache: cache}), gollem.WithDefaultToolTimeout[Report](10*time.Second), gollem.WithToolResultValidator[Report](nonEmpty), )
Tool-choice control: Auto, Required, None, Force("name"), with optional auto-reset to prevent infinite loops.
Multi-agent orchestration
Three composition primitives. AgentTool for delegation. Handoff for sequential chains with context filters. Pipeline for parallel fan-out and conditional branching. For durable coordination across restarts, ext/orchestrator owns tasks, leases, schedulers, and artifact history.
go · pipelineccopy// One agent calls another as a tool. orchestrator := gollem.NewAgent[FinalReport](model, gollem.WithTools[FinalReport]( orchestration.AgentTool("research", "Delegate research", researcher), ), ) // Pipeline with parallel steps and conditional branching. pipe := gollem.NewPipeline( gollem.AgentStep(researcher), gollem.ParallelSteps( gollem.AgentStep(factChecker), gollem.AgentStep(editor), ), gollem.ConditionalStep( func(s string) bool { return len(s) > 5000 }, gollem.AgentStep(summarizer), gollem.TransformStep(strings.TrimSpace), ), )
go · team swarmccopyt := team.NewTeam(team.TeamConfig{ Name: "code-review", Leader: "lead", Model: model, Toolset: codingTools, PersonalityGenerator: modelutil.CachedPersonalityGenerator( modelutil.GeneratePersonality(model), ), }) t.SpawnTeammate(ctx, "reviewer", "Review auth module for security vulnerabilities") t.SpawnTeammate(ctx, "tester", "Write comprehensive tests for the payment flow") t.SpawnTeammate(ctx, "docs", "Update API docs for the new endpoints") leader := gollem.NewAgent[string](model, gollem.WithTools[string](team.LeaderTools(t)...)) result, _ := leader.Run(ctx, "Coordinate the review across all teammates.")
Each teammate runs as a goroutine with a fresh context window. The LLM itself writes the system prompt for each task; SHA256-keyed cache prevents redundant generations.
Code mode — N tool calls, one round-trip
Traditional tool use is round-trip heavy: model asks, you execute, model waits, model asks again. Code mode ships all of your tools into an LLM-authored Python script that runs in a pure-Go WASM sandbox via monty-go. The model composes in one shot.
traditional model ──► tool1 ──► model ──► tool2 ──► model ──► result 3 model calls · 2 context refills · serial latency code mode model ──► python { tool1(); tool2(); } ──► result 1 model call · 0 refills · parallel execution
goccopyimport "github.com/fugue-labs/gollem/ext/monty" agent := gollem.NewAgent[Report](model, monty.AgentOptions( monty.WithTools(searchTool, fetchTool, citeTool), )..., ) // The model writes a single Python script that calls N tools as functions. // Runs in a WASM sandbox. No CGO, no containers, no subprocess. result, _ := agent.Run(ctx, "Research and cite the top 5 papers on memory consolidation.")
python · what the model wroteread-only# gollem injects typed function stubs; the model chooses how to compose. results = search(query="memory consolidation LLM", limit=10) top = sorted(results, key=lambda r: r["score"], reverse=True)[:5] # Parallel fetches in the sandbox; each call is a typed Go function. docs = [fetch_url(url=r["url"]) for r in top] final_result( summary="Consolidation requires decay scheduling ...", citations=[cite(doc=d) for d in docs], )
One model round-trip. Up to N× fewer tokens than sequential tool use on branchy workloads. Sandbox timeout, memory cap, and import allowlist configurable.
Graph workflows — typed state machines
When control flow outgrows linear pipelines, drop into ext/graph: typed state, conditional branches, fan-out / map-reduce, cycle detection, Mermaid export. Nodes and edges are type-checked at compile time.
start ──► classify ──► { simple ──► answer ──► end { complex ──► plan ──► fanout[3] ├► search ├► fetch └► analyze ──► merge ──► answer
goccopyg := graph.New[State]() g.Node("classify", classifyFn).Edge("simple", simplePath).Edge("complex", complexPath) g.FanOut("plan", searchNode, fetchNode, analyzeNode).Merge("merge", mergeFn) g.Edge("merge", "answer") if err := g.Validate(); err != nil { // cycle detection at build time return err } fmt.Println(g.Mermaid()) // diagram for PRs result, _ := g.Run(ctx, initialState)
Guardrails, cost, observability
Production concerns are first-class. Guardrails at every lifecycle stage. Cost tracked per run and cumulative. Middleware composes like HTTP middleware — first registered is outermost.
goccopytracker := gollem.NewCostTracker(map[string]gollem.ModelPricing{ "claude-sonnet-4-5-20250929": {InputTokenCost: 0.003, OutputTokenCost: 0.015}, }) agent := gollem.NewAgent[Report](model, // Safety — validate prompts, turns, outputs. gollem.WithInputGuardrail[Report]("length", gollem.MaxPromptLength(10_000)), gollem.WithInputGuardrail[Report]("content", gollem.ContentFilter("ignore previous")), gollem.WithTurnGuardrail[Report]("turns", gollem.MaxTurns(20)), // Cost & usage. gollem.WithCostTracker[Report](tracker), gollem.WithUsageQuota[Report](gollem.UsageQuota{MaxRequests: 50, MaxTotalTokens: 100_000}), // Middleware — outer to inner. gollem.WithAgentMiddleware[Report](gollem.TimingMiddleware(metrics.RecordLatency)), gollem.WithAgentMiddleware[Report](gollem.LoggingMiddleware(log.Printf)), gollem.WithMessageInterceptor[Report](gollem.RedactPII( `\b\d{3}-\d{2}-\d{4}\b`, "[SSN REDACTED]", )), // Observability. gollem.WithTracing[Report](), gollem.WithTraceExporter[Report](gollem.NewJSONFileExporter("./traces")), gollem.WithRunCondition[Report](gollem.Or( gollem.MaxRunDuration(2*time.Minute), gollem.ToolCallCount(50), )), )
MaxPromptLength, ContentFilter, MaxTurns, plus custom input / turn / output / tool-result validators.TimingMiddleware, LoggingMiddleware, MaxTokensMiddleware, or write your own — skip the model call entirely if you want.RedactPII, AuditLog, or custom. Intercept before the message leaves your system; transform responses on the way back.OnRunStart, OnRunEnd, OnModelRequest, OnModelResponse, OnToolStart, OnToolEnd.Subscribe[E], Publish[E]. Built-in RunStartedEvent, ToolCalledEvent, RunCompletedEvent carry run IDs, parent IDs, timestamps.Providers — one interface, swap freely
All providers implement the same Model interface. Wrap any with retry, rate limiting, and caching. Switch the import and the agent code is unchanged.
go · anthropicccopyimport "github.com/fugue-labs/gollem/provider/anthropic" // Reads ANTHROPIC_API_KEY from env. claude := anthropic.New() // Opt-in features. claude = anthropic.New( anthropic.WithModel("claude-sonnet-4-5-20250929"), anthropic.WithExtendedThinking(anthropic.Thinking{Budget: 10_000}), anthropic.WithPromptCaching(), )
go · openaiccopyimport "github.com/fugue-labs/gollem/provider/openai" // Reads OPENAI_API_KEY from env. gpt := openai.New() // WebSocket continuation for tool-heavy loops (non-streaming). gpt = openai.New( openai.WithModel("gpt-4o"), openai.WithTransport("websocket"), // or OPENAI_TRANSPORT=websocket openai.WithJSONMode(), // native structured output )
go · vertex aiccopyimport "github.com/fugue-labs/gollem/provider/vertexai" // Uses GCP application default credentials. gemini := vertexai.New("my-project", "us-central1") gemini = vertexai.New("my-project", "us-central1", vertexai.WithModel("gemini-2.0-flash"), vertexai.WithJSONMode(), )
go · vertex · anthropicccopyimport "github.com/fugue-labs/gollem/provider/vertexai_anthropic" // Claude via Vertex — extended thinking + prompt caching + GCP auth. vc := vertexai_anthropic.New("my-project", "us-east5", vertexai_anthropic.WithModel("claude-sonnet-4-5@20250929"), vertexai_anthropic.WithExtendedThinking(Thinking{Budget: 10_000}), )
go · resilience wrappersccopy// Retry around rate-limit around cache around raw. Works for any Model. resilient := gollem.NewRetryModel( gollem.NewRateLimitedModel( gollem.NewCachedModel(claude, gollem.NewMemoryCacheWithTTL(5*time.Minute)), 10, 20, // rps, burst ), gollem.DefaultRetryConfig(), ) // Or route by capability — same agent code, right model per prompt. router := gollem.NewCapabilityRouter( []gollem.Model{fast, power, vision}, gollem.ModelProfile{SupportsVision: true, SupportsToolCalls: true}, )
| Capability | Anthropic | OpenAI | Vertex AI | Vertex · Anthropic |
|---|---|---|---|---|
| Structured output | ● | ● | ● | ● |
| Streaming | ● | ● | ● | ● |
| Tool use | ● | ● | ● | ● |
| Extended thinking | ● | — | — | ● |
| Prompt caching | ● | — | — | ● |
| Native JSON mode | — | ● | ● | — |
| Auth | API key | API key | OAuth2 · GCP | OAuth2 · GCP |
Single binary — ship the compiler's output, not a virtualenv
Gollem compiles to a statically-linked binary. Cross-compile to any OS/arch from any OS/arch. No runtime. No interpreter. No shared library resolution at startup.
CGO_ENABLED=0.Testing — without ever calling a real model
TestModel is a deterministic mock. Canned responses, call recording, per-invocation assertions. Swap with WithTestModel or Override in tests without touching the production agent definition.
goccopymodel := gollem.NewTestModel( gollem.ToolCallResponse("search", `{"query":"Go generics"}`), gollem.ToolCallResponse("final_result", `{"answer":"..."}`), ) result, err := productionAgent.WithTestModel(model).Run(ctx, "prompt") // Assert what the model saw. calls := model.RecordedCalls() assert.Len(t, calls, 2) assert.Equal(t, "search", calls[0].ToolName)
Build your agent — live configuration
Pick a provider, an output shape, and the features you need. A real compilable snippet regenerates live. Copy it and paste into a main.go — nothing else to set up.
Configuration
The rest lives in the Go reference. Every public type has a docstring; every extension package has an example; every feature is tested.