🔍
Agents & SkillsChapter 14 of 33· 5 min read

Chapter 14: Agents & Model Routing

One of OpenClaw's most powerful features is its support for multiple AI agents running simultaneously. You can connect Claude Opus for your senior developers, Claude Sonnet for your general team, Claude Haiku for your public-facing bot, and even a local Ollama model for offline work — all from a single gateway instance. This chapter explains how to define agents and route traffic to them.


What Is an Agent?

In OpenClaw, an agent is a named configuration block that specifies:

  • Which AI provider and model to use
  • The API key and endpoint
  • Default generation parameters (temperature, max tokens)
  • A default system prompt

Agents are defined in the agents block of your config and then referenced by workspaces.


Defining Agents

{
  "agents": {
    "fast": {
      "provider": "anthropic",
      "model": "claude-haiku-4-5-20251001",
      "apiKey": "${ANTHROPIC_API_KEY}",
      "maxTokens": 2048,
      "temperature": 0.5,
      "systemPrompt": "You are a fast, concise assistant. Keep answers brief."
    },
    "balanced": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-6",
      "apiKey": "${ANTHROPIC_API_KEY}",
      "maxTokens": 8192,
      "temperature": 0.3,
      "systemPrompt": "You are a helpful AI assistant with strong reasoning abilities."
    },
    "expert": {
      "provider": "anthropic",
      "model": "claude-opus-4-7",
      "apiKey": "${ANTHROPIC_API_KEY}",
      "maxTokens": 16384,
      "temperature": 0.2,
      "systemPrompt": "You are an expert-level AI assistant. Think deeply before answering."
    }
  }
}

Supported Providers

Anthropic (Claude)

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-6",
  "apiKey": "${ANTHROPIC_API_KEY}"
}
ModelBest For
claude-opus-4-7Complex reasoning, long documents, expert tasks
claude-sonnet-4-6Balanced performance and cost
claude-haiku-4-5-20251001Fast responses, simple tasks, high volume

OpenAI

{
  "provider": "openai",
  "model": "gpt-4o",
  "apiKey": "${OPENAI_API_KEY}"
}

Google Gemini

{
  "provider": "google",
  "model": "gemini-1.5-pro",
  "apiKey": "${GOOGLE_AI_API_KEY}"
}

Ollama (Local Models)

Run models locally with no API key required:

{
  "provider": "ollama",
  "model": "llama3",
  "baseUrl": "http://localhost:11434",
  "maxTokens": 4096
}

Pull a model first:

ollama pull llama3

Azure OpenAI

{
  "provider": "azure-openai",
  "model": "gpt-4o",
  "apiKey": "${AZURE_OPENAI_API_KEY}",
  "baseUrl": "https://your-resource.openai.azure.com",
  "azureDeployment": "my-gpt4o-deployment",
  "azureApiVersion": "2024-08-01-preview"
}

Cerebras

Cerebras chips run inference at dramatically higher speeds than GPUs — useful for fast, low-latency responses:

{
  "provider": "cerebras",
  "model": "llama3.1-70b",
  "apiKey": "${CEREBRAS_API_KEY}"
}

NVIDIA NIM

Run NVIDIA-hosted models with enterprise-grade SLAs:

{
  "provider": "nvidia",
  "model": "meta/llama-3.1-70b-instruct",
  "apiKey": "${NVIDIA_API_KEY}",
  "baseUrl": "https://integrate.api.nvidia.com/v1"
}

DeepInfra

Cost-efficient inference with image generation and text-to-video support:

{
  "provider": "deepinfra",
  "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
  "apiKey": "${DEEPINFRA_API_KEY}"
}

Self-Hosted (vLLM / SGLang)

For teams running their own inference servers:

{
  "provider": "openai-compatible",
  "model": "your-model-name",
  "baseUrl": "http://localhost:8000/v1",
  "apiKey": "not-required"
}

Both vLLM and SGLang expose an OpenAI-compatible API endpoint, so this config works for both.

All 35+ Supported Providers

ProviderValueNotes
AnthropicanthropicClaude family
OpenAIopenaiGPT family
GooglegoogleGemini family
OllamaollamaLocal models
Azure OpenAIazure-openaiEnterprise
CerebrascerebrasUltra-fast inference
NVIDIA NIMnvidiaGPU cloud
DeepInfradeepinfraCost-efficient
GroqgroqFast inference
Together AItogetherOpen model hosting
Mistral AImistralMistral models
CoherecohereCommand family
PerplexityperplexitySearch-augmented
xAI (Grok)xaiGrok family
DeepSeekdeepseekChinese frontier model
QwenqwenAlibaba models
MiniMaxminimaxChinese provider
vLLM / SGLangopenai-compatibleSelf-hosted

Routing: Workspace-to-Agent Assignment

Each workspace is assigned exactly one agent. All messages handled by that workspace go to that agent:

{
  "workspaces": [
    {
      "id": "vip",
      "agent": "expert",
      "allowlist": ["U01VIP"]
    },
    {
      "id": "dev-team",
      "agent": "balanced",
      "allowlist": ["U01DEV", "U02DEV"]
    },
    {
      "id": "everyone",
      "agent": "fast",
      "allowlist": ["*"]
    }
  ]
}

Per-Workspace Agent Overrides

You can override agent parameters at the workspace level without defining a separate agent:

{
  "workspaces": [
    {
      "id": "creative-writing",
      "agent": "balanced",
      "temperature": 0.9,
      "maxTokens": 4096,
      "systemPrompt": "You are a creative writing assistant. Be imaginative and expressive."
    }
  ]
}

The workspace-level values override the agent defaults for all conversations in that workspace.


Dynamic Agent Switching

Users can switch the agent for their current session using a chat command:

/agent expert

This temporarily routes the user's session to the expert agent until the session expires or they switch again. Only agents explicitly listed in the workspace's allowedAgents field are available:

{
  "workspaces": [
    {
      "id": "dev-team",
      "agent": "balanced",
      "allowedAgents": ["fast", "balanced", "expert"]
    }
  ]
}

Fallback Agents

Configure a fallback agent in case the primary agent's API is unavailable:

{
  "agents": {
    "claude-main": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-6",
      "apiKey": "${ANTHROPIC_API_KEY}",
      "fallback": "local-llama"
    },
    "local-llama": {
      "provider": "ollama",
      "model": "llama3",
      "baseUrl": "http://localhost:11434"
    }
  }
}

If the Anthropic API returns an error or times out, OpenClaw automatically retries the request with local-llama.


Multi-Provider Load Balancing

Distribute load across multiple API keys or providers:

{
  "agents": {
    "claude-balanced": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-6",
      "loadBalance": [
        { "apiKey": "${ANTHROPIC_KEY_1}", "weight": 50 },
        { "apiKey": "${ANTHROPIC_KEY_2}", "weight": 50 }
      ]
    }
  }
}

Useful for organizations that need to stay within per-key rate limits.


Monitoring Agent Usage

Check how much each agent is being used:

openclaw stats agents

Output:

Agent        Requests   Tokens In   Tokens Out   Avg Latency
------------ ---------- ----------- ------------ -----------
fast         1,842      284,100     512,000      820ms
balanced     423        198,400     1,024,300    2,100ms
expert       47         89,200      430,100      4,800ms

Next: Chapter 15 — Skills: Giving Your Agent Superpowers — How to enable and configure tools that let your AI read files, run code, search the web, and more.