Chapter 14: Agents & Model Routing

One of OpenClaw's most powerful features is its support for multiple AI agents running simultaneously. You can connect Claude Opus for your senior developers, Claude Sonnet for your general team, Claude Haiku for your public-facing bot, and even a local Ollama model for offline work — all from a single gateway instance. This chapter explains how to define agents and route traffic to them.

What Is an Agent?

In OpenClaw, an agent is a named configuration block that specifies:

Which AI provider and model to use
The API key and endpoint
Default generation parameters (temperature, max tokens)
A default system prompt

Agents are defined in the agents block of your config and then referenced by workspaces.

Defining Agents

{
  "agents": {
    "fast": {
      "provider": "anthropic",
      "model": "claude-haiku-4-5-20251001",
      "apiKey": "${ANTHROPIC_API_KEY}",
      "maxTokens": 2048,
      "temperature": 0.5,
      "systemPrompt": "You are a fast, concise assistant. Keep answers brief."
    },
    "balanced": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-6",
      "apiKey": "${ANTHROPIC_API_KEY}",
      "maxTokens": 8192,
      "temperature": 0.3,
      "systemPrompt": "You are a helpful AI assistant with strong reasoning abilities."
    },
    "expert": {
      "provider": "anthropic",
      "model": "claude-opus-4-7",
      "apiKey": "${ANTHROPIC_API_KEY}",
      "maxTokens": 16384,
      "temperature": 0.2,
      "systemPrompt": "You are an expert-level AI assistant. Think deeply before answering."
    }
  }
}

Supported Providers

Anthropic (Claude)

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-6",
  "apiKey": "${ANTHROPIC_API_KEY}"
}

Model	Best For
`claude-opus-4-7`	Complex reasoning, long documents, expert tasks
`claude-sonnet-4-6`	Balanced performance and cost
`claude-haiku-4-5-20251001`	Fast responses, simple tasks, high volume

OpenAI

{
  "provider": "openai",
  "model": "gpt-4o",
  "apiKey": "${OPENAI_API_KEY}"
}

Google Gemini

{
  "provider": "google",
  "model": "gemini-1.5-pro",
  "apiKey": "${GOOGLE_AI_API_KEY}"
}

Ollama (Local Models)

Run models locally with no API key required:

{
  "provider": "ollama",
  "model": "llama3",
  "baseUrl": "http://localhost:11434",
  "maxTokens": 4096
}

Pull a model first:

ollama pull llama3

Azure OpenAI

{
  "provider": "azure-openai",
  "model": "gpt-4o",
  "apiKey": "${AZURE_OPENAI_API_KEY}",
  "baseUrl": "https://your-resource.openai.azure.com",
  "azureDeployment": "my-gpt4o-deployment",
  "azureApiVersion": "2024-08-01-preview"
}

Cerebras

Cerebras chips run inference at dramatically higher speeds than GPUs — useful for fast, low-latency responses:

{
  "provider": "cerebras",
  "model": "llama3.1-70b",
  "apiKey": "${CEREBRAS_API_KEY}"
}

NVIDIA NIM

Run NVIDIA-hosted models with enterprise-grade SLAs:

{
  "provider": "nvidia",
  "model": "meta/llama-3.1-70b-instruct",
  "apiKey": "${NVIDIA_API_KEY}",
  "baseUrl": "https://integrate.api.nvidia.com/v1"
}

DeepInfra

Cost-efficient inference with image generation and text-to-video support:

{
  "provider": "deepinfra",
  "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
  "apiKey": "${DEEPINFRA_API_KEY}"
}

Self-Hosted (vLLM / SGLang)

For teams running their own inference servers:

{
  "provider": "openai-compatible",
  "model": "your-model-name",
  "baseUrl": "http://localhost:8000/v1",
  "apiKey": "not-required"
}

Both vLLM and SGLang expose an OpenAI-compatible API endpoint, so this config works for both.

All 35+ Supported Providers

Provider	Value	Notes
Anthropic	`anthropic`	Claude family
OpenAI	`openai`	GPT family
Google	`google`	Gemini family
Ollama	`ollama`	Local models
Azure OpenAI	`azure-openai`	Enterprise
Cerebras	`cerebras`	Ultra-fast inference
NVIDIA NIM	`nvidia`	GPU cloud
DeepInfra	`deepinfra`	Cost-efficient
Groq	`groq`	Fast inference
Together AI	`together`	Open model hosting
Mistral AI	`mistral`	Mistral models
Cohere	`cohere`	Command family
Perplexity	`perplexity`	Search-augmented
xAI (Grok)	`xai`	Grok family
DeepSeek	`deepseek`	Chinese frontier model
Qwen	`qwen`	Alibaba models
MiniMax	`minimax`	Chinese provider
vLLM / SGLang	`openai-compatible`	Self-hosted

Routing: Workspace-to-Agent Assignment

Each workspace is assigned exactly one agent. All messages handled by that workspace go to that agent:

{
  "workspaces": [
    {
      "id": "vip",
      "agent": "expert",
      "allowlist": ["U01VIP"]
    },
    {
      "id": "dev-team",
      "agent": "balanced",
      "allowlist": ["U01DEV", "U02DEV"]
    },
    {
      "id": "everyone",
      "agent": "fast",
      "allowlist": ["*"]
    }
  ]
}

Per-Workspace Agent Overrides

You can override agent parameters at the workspace level without defining a separate agent:

{
  "workspaces": [
    {
      "id": "creative-writing",
      "agent": "balanced",
      "temperature": 0.9,
      "maxTokens": 4096,
      "systemPrompt": "You are a creative writing assistant. Be imaginative and expressive."
    }
  ]
}

The workspace-level values override the agent defaults for all conversations in that workspace.

Dynamic Agent Switching

Users can switch the agent for their current session using a chat command:

/agent expert

This temporarily routes the user's session to the expert agent until the session expires or they switch again. Only agents explicitly listed in the workspace's allowedAgents field are available:

{
  "workspaces": [
    {
      "id": "dev-team",
      "agent": "balanced",
      "allowedAgents": ["fast", "balanced", "expert"]
    }
  ]
}

Fallback Agents

Configure a fallback agent in case the primary agent's API is unavailable:

{
  "agents": {
    "claude-main": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-6",
      "apiKey": "${ANTHROPIC_API_KEY}",
      "fallback": "local-llama"
    },
    "local-llama": {
      "provider": "ollama",
      "model": "llama3",
      "baseUrl": "http://localhost:11434"
    }
  }
}

If the Anthropic API returns an error or times out, OpenClaw automatically retries the request with local-llama.

Multi-Provider Load Balancing

Distribute load across multiple API keys or providers:

{
  "agents": {
    "claude-balanced": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-6",
      "loadBalance": [
        { "apiKey": "${ANTHROPIC_KEY_1}", "weight": 50 },
        { "apiKey": "${ANTHROPIC_KEY_2}", "weight": 50 }
      ]
    }
  }
}

Useful for organizations that need to stay within per-key rate limits.

Monitoring Agent Usage

Check how much each agent is being used:

openclaw stats agents

Output:

Agent        Requests   Tokens In   Tokens Out   Avg Latency
------------ ---------- ----------- ------------ -----------
fast         1,842      284,100     512,000      820ms
balanced     423        198,400     1,024,300    2,100ms
expert       47         89,200      430,100      4,800ms

Next: Chapter 15 — Skills: Giving Your Agent Superpowers — How to enable and configure tools that let your AI read files, run code, search the web, and more.