gollem
The production agent framework for Go. Typed agents, structured output, streaming, guardrails, cost tracking, multi-agent swarms, code mode, graph workflows. Zero core dependencies. Single binary.
What a run looks like four agents, four transcripts
Every run emits structured trace events. Switch between agent archetypes below to see the same trace system rendering four different workloads.
✓ Each line is a trace event. Export to JSON, OpenTelemetry, or plug in your own TraceExporter.
Typed agents, typed results
Agent[T] is the central type. You define the output shape; gollem generates the JSON Schema, validates every model response against it, auto-repairs malformed output with a repair model, and hands you a typed Go struct.
goccopytype Analysis struct { Sentiment string `json:"sentiment" jsonschema:"enum=positive|negative|neutral"` Keywords []string `json:"keywords" jsonschema:"description=Key topics"` Confidence float64 `json:"confidence" jsonschema:"minimum=0,maximum=1"` } agent := gollem.NewAgent[Analysis](model, gollem.WithSystemPrompt[Analysis]("You are a sentiment analyst."), gollem.WithOutputRepair[Analysis](gollem.ModelRepair[Analysis](repairModel)), gollem.WithOutputValidator[Analysis](func(a Analysis) error { if a.Confidence < 0 || a.Confidence > 1 { return fmt.Errorf("confidence out of range: %f", a.Confidence) } return nil }), ) result, _ := agent.Run(ctx, "Analyze this earnings call transcript.") fmt.Println(result.Output.Sentiment) // string, not map[string]any fmt.Println(result.Output.Confidence) // float64, not interface{}
Schema generated from struct tags. No hand-written JSON schemas. No json.Unmarshal at the callsite. No type assertions.
Streaming with Go 1.23+ iterators
Four streaming modes share one interface. All expose as iter.Seq2[T, error]: no channels, no callbacks, no goroutine management.
Same stream, any mode. Switch based on latency budget or transport.
go · raw deltasccopystream, _ := agent.RunStream(ctx, "Write a story about a robot.") // Raw incremental chunks as they arrive from the model. for delta, err := range gollem.StreamTextDelta(stream) { if err != nil { return err } fmt.Print(delta) // "The " "robot " "powered " ... }
go · accumulatedccopy// Growing accumulated text at each step. Ideal for React/UI updates. for text, err := range gollem.StreamTextAccumulated(stream) { if err != nil { return err } updateUI(text) // "The " → "The robot " → "The robot powered " }
go · debouncedccopy// Grouped delivery every 100ms. Grouping network frames for websocket clients. for text, err := range gollem.StreamTextDebounced(stream, 100*time.Millisecond) { if err != nil { return err } sendToClient(text) // fewer frames, still feels live }
go · unifiedccopy// Single function with options. Switch modes without rewriting the loop. for text, err := range gollem.StreamText(stream, gollem.StreamTextOptions{ Mode: gollem.StreamModeDebounced, Debounce: 100 * time.Millisecond, }) { if err != nil { return err } handle(text) }
Tools from typed functions
FuncTool[P] turns a typed Go function into a tool. Parameter schemas come from struct tags via reflection. Access typed dependencies through the run context: no globals, no singletons, no any.
goccopytype SearchParams struct { Query string `json:"query" jsonschema:"description=Search query"` Limit int `json:"limit" jsonschema:"description=Max results,default=10"` } type AppDeps struct { DB *sql.DB; Cache *redis.Client } searchTool := gollem.FuncTool[SearchParams]( "search", "Search the knowledge base", func(ctx context.Context, rc *gollem.RunContext, p SearchParams) (string, error) { deps := gollem.GetDeps[*AppDeps](rc) // compile-time type safe return doSearch(deps.DB, p.Query, p.Limit) }, ) agent := gollem.NewAgent[Report](model, gollem.WithTools[Report](searchTool), gollem.WithDeps[Report](&AppDeps{DB: db, Cache: cache}), gollem.WithDefaultToolTimeout[Report](10*time.Second), gollem.WithToolResultValidator[Report](nonEmpty), )
Tool-choice control: Auto, Required, None, Force("name"), with optional auto-reset to prevent infinite loops.
Multi-agent orchestration
Three composition primitives. AgentTool for delegation. Handoff for sequential chains with context filters. Pipeline for parallel fan-out and conditional branching. For durable coordination across restarts, ext/orchestrator owns tasks, leases, schedulers, and artifact history.
go · pipelineccopy// One agent calls another as a tool. orchestrator := gollem.NewAgent[FinalReport](model, gollem.WithTools[FinalReport]( orchestration.AgentTool("research", "Delegate research", researcher), ), ) // Pipeline with parallel steps and conditional branching. pipe := gollem.NewPipeline( gollem.AgentStep(researcher), gollem.ParallelSteps( gollem.AgentStep(factChecker), gollem.AgentStep(editor), ), gollem.ConditionalStep( func(s string) bool { return len(s) > 5000 }, gollem.AgentStep(summarizer), gollem.TransformStep(strings.TrimSpace), ), )
go · team swarmccopyt := team.NewTeam(team.TeamConfig{ Name: "code-review", Leader: "lead", Model: model, Toolset: codingTools, PersonalityGenerator: modelutil.CachedPersonalityGenerator( modelutil.GeneratePersonality(model), ), }) t.SpawnTeammate(ctx, "reviewer", "Review auth module for security vulnerabilities") t.SpawnTeammate(ctx, "tester", "Write comprehensive tests for the payment flow") t.SpawnTeammate(ctx, "docs", "Update API docs for the new endpoints") leader := gollem.NewAgent[string](model, gollem.WithTools[string](team.LeaderTools(t)...)) result, _ := leader.Run(ctx, "Coordinate the review across all teammates.")
Each teammate runs as a goroutine with a fresh context window. The LLM itself writes the system prompt for each task; SHA256-keyed cache prevents redundant generations.
Code mode N tool calls, one round-trip
Traditional tool use is round-trip heavy: model asks, you execute, model waits, model asks again. Code mode ships all of your tools into an LLM-authored Python script that runs in a pure-Go WASM sandbox via monty-go. The model composes in one shot.
traditional model ──► tool1 ──► model ──► tool2 ──► model ──► result 3 model calls · 2 context refills · serial latency code mode model ──► python { tool1(); tool2(); } ──► result 1 model call · 0 refills · parallel execution
goccopyimport "github.com/fugue-labs/gollem/ext/monty" agent := gollem.NewAgent[Report](model, monty.AgentOptions( monty.WithTools(searchTool, fetchTool, citeTool), )..., ) // The model writes a single Python script that calls N tools as functions. // Runs in a WASM sandbox. No CGO, no containers, no subprocess. result, _ := agent.Run(ctx, "Research and cite the top 5 papers on memory consolidation.")
python · what the model wroteread-only# gollem injects typed function stubs; the model chooses how to compose. results = search(query="memory consolidation LLM", limit=10) top = sorted(results, key=lambda r: r["score"], reverse=True)[:5] # Parallel fetches in the sandbox; each call is a typed Go function. docs = [fetch_url(url=r["url"]) for r in top] final_result( summary="Consolidation requires decay scheduling ...", citations=[cite(doc=d) for d in docs], )
One model round-trip. Up to N× fewer tokens than sequential tool use on branchy workloads. Sandbox timeout, memory cap, and import allowlist configurable.
Graph workflows typed state machines
When control flow outgrows linear pipelines, drop into ext/graph: typed state, conditional branches, fan-out / map-reduce, cycle detection, Mermaid export. Nodes and edges are type-checked at compile time.
start ──► classify ──► { simple ──► answer ──► end { complex ──► plan ──► fanout[3] ├► search ├► fetch └► analyze ──► merge ──► answer
goccopyg := graph.New[State]() g.Node("classify", classifyFn).Edge("simple", simplePath).Edge("complex", complexPath) g.FanOut("plan", searchNode, fetchNode, analyzeNode).Merge("merge", mergeFn) g.Edge("merge", "answer") if err := g.Validate(); err != nil { // cycle detection at build time return err } fmt.Println(g.Mermaid()) // diagram for PRs result, _ := g.Run(ctx, initialState)
Guardrails, cost, observability
Production concerns are first-class. Guardrails at every lifecycle stage. Cost tracked per run and cumulative. Middleware composes like HTTP middleware. First registered is outermost.
goccopytracker := gollem.NewCostTracker(map[string]gollem.ModelPricing{ "claude-sonnet-4-5-20250929": {InputTokenCost: 0.003, OutputTokenCost: 0.015}, }) agent := gollem.NewAgent[Report](model, // Safety: validate prompts, turns, outputs. gollem.WithInputGuardrail[Report]("length", gollem.MaxPromptLength(10_000)), gollem.WithInputGuardrail[Report]("content", gollem.ContentFilter("ignore previous")), gollem.WithTurnGuardrail[Report]("turns", gollem.MaxTurns(20)), // Cost & usage. gollem.WithCostTracker[Report](tracker), gollem.WithUsageQuota[Report](gollem.UsageQuota{MaxRequests: 50, MaxTotalTokens: 100_000}), // Middleware: outer to inner. gollem.WithAgentMiddleware[Report](gollem.TimingMiddleware(metrics.RecordLatency)), gollem.WithAgentMiddleware[Report](gollem.LoggingMiddleware(log.Printf)), gollem.WithMessageInterceptor[Report](gollem.RedactPII( `\b\d{3}-\d{2}-\d{4}\b`, "[SSN REDACTED]", )), // Observability. gollem.WithTracing[Report](), gollem.WithTraceExporter[Report](gollem.NewJSONFileExporter("./traces")), gollem.WithRunCondition[Report](gollem.Or( gollem.MaxRunDuration(2*time.Minute), gollem.ToolCallCount(50), )), )
MaxPromptLength, ContentFilter, MaxTurns, plus custom input / turn / output / tool-result validators.TimingMiddleware, LoggingMiddleware, MaxTokensMiddleware, or write your own. Skip the model call entirely if you want.RedactPII, AuditLog, or custom. Intercept before the message leaves your system; transform responses on the way back.OnRunStart, OnRunEnd, OnModelRequest, OnModelResponse, OnToolStart, OnToolEnd.Subscribe[E], Publish[E]. Built-in RunStartedEvent, ToolCalledEvent, RunCompletedEvent carry run IDs, parent IDs, timestamps.Providers one interface, swap freely
All providers implement the same Model interface. Wrap any with retry, rate limiting, and caching. Switch the import and the agent code is unchanged.
go · anthropicccopyimport "github.com/fugue-labs/gollem/provider/anthropic" // Reads ANTHROPIC_API_KEY from env. claude := anthropic.New() // Opt-in features. claude = anthropic.New( anthropic.WithModel("claude-sonnet-4-5-20250929"), anthropic.WithExtendedThinking(anthropic.Thinking{Budget: 10_000}), anthropic.WithPromptCaching(), )
go · openaiccopyimport "github.com/fugue-labs/gollem/provider/openai" // Reads OPENAI_API_KEY from env. gpt := openai.New() // WebSocket continuation for tool-heavy loops (non-streaming). gpt = openai.New( openai.WithModel("gpt-4o"), openai.WithTransport("websocket"), // or OPENAI_TRANSPORT=websocket openai.WithJSONMode(), // native structured output )
go · vertex aiccopyimport "github.com/fugue-labs/gollem/provider/vertexai" // Uses GCP application default credentials. gemini := vertexai.New("my-project", "us-central1") gemini = vertexai.New("my-project", "us-central1", vertexai.WithModel("gemini-2.0-flash"), vertexai.WithJSONMode(), )
go · vertex · anthropicccopyimport "github.com/fugue-labs/gollem/provider/vertexai_anthropic" // Claude via Vertex: extended thinking + prompt caching + GCP auth. vc := vertexai_anthropic.New("my-project", "us-east5", vertexai_anthropic.WithModel("claude-sonnet-4-5@20250929"), vertexai_anthropic.WithExtendedThinking(Thinking{Budget: 10_000}), )
go · resilience wrappersccopy// Retry around rate-limit around cache around raw. Works for any Model. resilient := gollem.NewRetryModel( gollem.NewRateLimitedModel( gollem.NewCachedModel(claude, gollem.NewMemoryCacheWithTTL(5*time.Minute)), 10, 20, // rps, burst ), gollem.DefaultRetryConfig(), ) // Or route by capability: same agent code, right model per prompt. router := gollem.NewCapabilityRouter( []gollem.Model{fast, power, vision}, gollem.ModelProfile{SupportsVision: true, SupportsToolCalls: true}, )
| Capability | Anthropic | OpenAI | Vertex AI | Vertex · Anthropic |
|---|---|---|---|---|
| Structured output | ● | ● | ● | ● |
| Streaming | ● | ● | ● | ● |
| Tool use | ● | ● | ● | ● |
| Extended thinking | ● | ○ | ○ | ● |
| Prompt caching | ● | ○ | ○ | ● |
| Native JSON mode | ○ | ● | ● | ○ |
| Auth | API key | API key | OAuth2 · GCP | OAuth2 · GCP |
Single binary ship the compiler's output, not a virtualenv
Gollem compiles to a statically-linked binary. Cross-compile to any OS/arch from any OS/arch. No runtime. No interpreter. No shared library resolution at startup.
CGO_ENABLED=0.Testing without ever calling a real model
TestModel is a deterministic mock. Canned responses, call recording, per-invocation assertions. Swap with WithTestModel or Override in tests without touching the production agent definition.
goccopymodel := gollem.NewTestModel( gollem.ToolCallResponse("search", `{"query":"Go generics"}`), gollem.ToolCallResponse("final_result", `{"answer":"..."}`), ) result, err := productionAgent.WithTestModel(model).Run(ctx, "prompt") // Assert what the model saw. calls := model.RecordedCalls() assert.Len(t, calls, 2) assert.Equal(t, "search", calls[0].ToolName)
Build your agent live configuration
Pick a provider, an output shape, and the features you need. A real compilable snippet regenerates live. Copy it and paste into a main.go. Nothing else to set up.
Configuration
The rest lives in the Go reference. Every public type has a docstring; every extension package has an example; every feature is tested.