Skip to content
Filters & more
45 min
Intermediate
openclawlemonadeagentlocal-ailinux
Device Family
Device
OS

Running OpenClaw Locally with Lemonade Server

Install and configure OpenClaw autonomous AI agent with Lemonade Server.

Overview

OpenClaw is an autonomous AI agent that can write and run code, manage files, and work through complex multi-step tasks on your behalf. Unlike a chat assistant that just answers questions, OpenClaw takes real actions on your system, which means it needs a fast, capable AI backend that can keep up with a demanding agent loop.

Lemonade Server is that backend. It is an open-source local inference server that runs GenAI models directly on your hardware and exposes them through the industry-standard OpenAI API.

Together, they form a fully local AI agent stack: Lemonade handles model inference, and OpenClaw provides the agent loop that turns model outputs into real actions.

Before you continue: OpenClaw is a highly autonomous AI agent. Giving any AI agent access to your system may result in unpredictable or unintended outcomes. Proceed only if you understand the risks and are comfortable with autonomous software acting on your behalf.


What You’ll Learn

By the end of this playbook you will be able to:

  • Learn about Lemonade Server
  • Install OpenClaw and point it at Lemonade Server as its AI backend.
  • Start the OpenClaw gateway and confirm your agent is ready to work.
  • Connect a communication channel (Discord or Telegram) so you can chat with your agent from any device.

Setting the Memory Configuration

For the Ryzen AI Halo, the dedicated GPU memory defaults to 64GB, which is sufficient for most workloads. For larger models or longer contexts, increasing this to 96GB may help. To adjust, open AMD Software: Adrenalin Edition™ and navigate to Performance → Tuning → AMD Variable Graphics Memory. Reboot for the changes to take effect.

AMD Software Adrenalin Edition — AMD Variable Graphics Memory panel

To change the dedicated GPU memory value, open AMD Software: Adrenalin Edition™ and navigate to Performance → Tuning → AMD Variable Graphics Memory. Reboot for the changes to take effect.

AMD Software Adrenalin Edition — AMD Variable Graphics Memory panel

On Linux, to run larger models, increase the shared memory pool available to the GPU. This might involve setting the BIOS dedicated GPU memory to the minimum, so that the shared memory pool can be maximized.

For the AMD Ryzen™ AI Halo, the default is 96GB shared. To modify this, open the AMD Ryzen™ AI Developer Center and go to the Settings tab. Under Graphics Performance Settings, increase the Shared Video Memory slider, then click Apply Changes and reboot for the changes to take effect.

AMD Ryzen AI Developer Center — Graphics Performance Settings with Shared Video Memory slider

Increase the shared memory pool by changing the kernel’s Translation Table Manager (TTM) page setting. AMD recommends setting the minimum dedicated VRAM in the BIOS (0.5 GB) so the maximum amount is available as shared memory.

  1. Install the pipx utility and add the path for pipx-installed wheels to the system search path:
Terminal window
sudo apt install pipx
pipx ensurepath
  1. Install the amd-debug-tools wheel from PyPI:
Terminal window
pipx install amd-debug-tools
  1. Query the current shared memory settings:
Terminal window
amd-ttm
  1. Increase the shared memory allocation (units in GB):
Terminal window
amd-ttm --set <NUM>
  1. Reboot for the changes to take effect.

Check for Software Updates

Before starting, ensure your Ryzen AI Halo has the latest software installed. Open the AMD Ryzen™ AI Developer Center and check for available updates, both to the app itself and additional software.

Go to the Updates tab. If updates are available, install them and reboot before continuing.

AMD Ryzen AI Developer Center — Updates tab on Windows

Go to the Manage tab. If updates are available, install them and reboot before continuing.

AMD Ryzen AI Developer Center — Manage tab on Linux

Installing Software Prerequisites

  • A PC running Ubuntu 24.04+ or a compatible Debian-based Linux distribution with apt-get

  • At least 12 GB of RAM (64 GB+ recommended for larger models)

  • Docker Desktop (Optional, for sandboxing OpenClaw)

  • ~10–30 GB of free disk space for model weights

  • A PC running Windows 10/11
  • At least 12 GB of RAM (64 GB+ recommended for larger models)
  • ~10–30 GB of free disk space for model weights
  • Docker Desktop (Optional, for sandboxing OpenClaw)

Lemonade

Installing Lemonade

Download the latest installer from lemonade-server.ai and run the .msi file.

After installation:

  • The lemonade CLI is added to your system PATH automatically
  • Lemonade server is expected to run in the background automatically

You can also install silently from the command line:

Terminal window
msiexec /i lemonade-server-minimal.msi /qn

Ubuntu:

Terminal window
sudo add-apt-repository ppa:lemonade-team/stable
sudo apt install lemonade-server

Arch Linux (AUR):

Terminal window
yay -S lemonade-server

For other distributions or to install from source, see the full installation options.

Verifying Lemonade Installation

Open a terminal and run:

Terminal window
lemonade --version

You should see output like:

lemonade version x.y.z

If you see a version number, Lemonade is installed correctly and ready to go.

For quick reference, here are common Lemonade CLI commands:

CommandWhat it does
lemonade --helpShows all available commands and flags.
lemonade --versionPrints the installed Lemonade version.
lemonade statusConfirms whether the Lemonade server is running and reachable. The default OpenAI-compatible API base URL is http://localhost:13305/api/v1.
lemonade listLists models available to your Lemonade setup.
lemonade pull <MODEL_NAME>Downloads a model without launching it.
lemonade run <MODEL_NAME>Downloads the model if needed, then starts it for inference/chat.
lemonade run <MODEL_NAME> --llamacpp rocmStarts a llama.cpp model with the ROCm backend.
lemonade run <MODEL_NAME> --llamacpp vulkanStarts a llama.cpp model with the Vulkan backend.
lemonade configDisplays the current Lemonade configuration values.
lemonade config set llamacpp.backend=rocmSets the default llama.cpp backend to ROCm.

For the latest Lemonade server options or troubleshooting, please refer to the official Lemonade documentation.


The recommended model for this playbook is Qwen3.6-35B-A3B-GGUF from Unsloth, a strong MoE model with a 263k-token context window that is well-suited to agent workloads. This model uses UD-Q4_K_XL quantization. Pull it now:

Terminal window
lemonade pull Qwen3.6-35B-A3B-GGUF

Then load it with a large context window and save that setting for future runs:

Terminal window
lemonade unload
lemonade load Qwen3.6-35B-A3B-GGUF --ctx-size 262144 --save-options

The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, consider reducing the context window. However, because Qwen3.6 leverages extended context for complex tasks, we advise maintaining a context length of at least 128K tokens to preserve thinking capabilities.

Tip: Disable thinking for faster agent responses: Qwen3.6-35B-A3B runs in thinking mode by default, which adds latency before each response. For agent loops this overhead accumulates quickly. The lemonade-sdk/recipes repo provides a ready-made config that disables thinking. To use it, download the file and import it:

Terminal window
curl -LO https://raw.githubusercontent.com/lemonade-sdk/recipes/main/coding-agents/Qwen3.6-35B-A3B-NoThinking.json
lemonade import Qwen3.6-35B-A3B-NoThinking.json

Set Up WSL

We run OpenClaw inside WSL (Recommended) and connect it to Lemonade running natively on Windows. This gives you a Linux shell environment for OpenClaw while keeping Lemonade’s GPU acceleration on the Windows side.

Install WSL and Ubuntu

Open PowerShell as Administrator and install the WSL kernel:

Terminal window
wsl --install --no-distribution

Then install Ubuntu:

Terminal window
wsl --install -d Ubuntu-24.04

Enable systemd in WSL

Run this inside the Ubuntu terminal:

Terminal window
sudo tee /etc/wsl.conf > /dev/null <<'EOF'
[boot]
systemd=true
EOF

Restart WSL:

Terminal window
wsl --shutdown
wsl

Bridge Lemonade from Windows into WSL

WSL2 runs in a virtual network. Lemonade on Windows binds to 127.0.0.1, which WSL cannot reach directly. A Windows port proxy forwards traffic from the WSL gateway IP to Windows localhost.

Find your WSL gateway IP (run inside WSL):

Terminal window
ip route show default | awk '{print $3}' | head -1

Add the port proxy (run in PowerShell as Administrator, replacing <WSL-Gateway-IP> with your WSL gateway IP):

Terminal window
netsh interface portproxy add v4tov4 listenaddress=<WSL-Gateway-IP> listenport=13305 connectaddress=127.0.0.1 connectport=13305

Add a firewall rule (same elevated PowerShell):

Terminal window
New-NetFirewallRule -DisplayName "Lemonade-WSL" -Direction Inbound -Protocol TCP -LocalPort 13305 -Action Allow

Verify from WSL:

Terminal window
WINDOWS_HOST=$(ip route show default | awk '{print $3}' | head -1)
curl -s "http://$WINDOWS_HOST:13305/api/v1/models"

If you’ve already loaded the Qwen3.6-35B-A3B-GGUF model in the previous step, you should see JSON output like this:

{
"data": [
{
"checkpoint": "unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_XL",
"checkpoints": {
"main": "unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_XL"
},
"mmproj": "unsloth/Qwen3.6-35B-A3B-GGUF:mmproj-F16.gguf",
....
}
],
"object": "list"
}

The netsh portproxy rule survives reboots but the WSL gateway IP can change after wsl --shutdown. If Lemonade becomes unreachable from WSL after a restart, get the updated gateway IP and update the proxy with this new IP.


Install and Configure OpenClaw

Install OpenClaw

Run the commands in this section inside your WSL terminal.

Terminal window
curl -fsSL https://openclaw.ai/install.sh | bash -s -- --no-prompt --no-onboard

The --no-onboard flag skips the interactive setup wizard, you will configure the model backend manually in the next step, which gives you precise control over which model and server are used.

Open a new terminal and confirm the installation:

Terminal window
openclaw --version

Configure OpenClaw to Use Lemonade

Run OpenClaw’s non-interactive onboarding.

Terminal window
openclaw onboard \
--non-interactive \
--mode local \
--auth-choice custom-api-key \
--custom-base-url "http://127.0.0.1:13305/api/v1" \
--custom-model-id "Qwen3.6-35B-A3B-GGUF" \
--custom-provider-id "lemonade" \
--custom-compatibility "openai" \
--custom-api-key "lemonade" \
--secret-input-mode plaintext \
--gateway-port 18789 \
--gateway-bind loopback \
--skip-health \
--accept-risk
Terminal window
WINDOWS_HOST=$(ip route show default | awk '{print $3}' | head -1)
openclaw onboard \
--non-interactive \
--mode local \
--auth-choice custom-api-key \
--custom-base-url "http://$WINDOWS_HOST:13305/api/v1" \
--custom-model-id "Qwen3.6-35B-A3B-GGUF" \
--custom-provider-id "lemonade" \
--custom-compatibility "openai" \
--custom-api-key "lemonade" \
--secret-input-mode plaintext \
--gateway-port 18789 \
--gateway-bind loopback \
--skip-health \
--accept-risk

This command writes OpenClaw’s configuration to ~/.openclaw/openclaw.json.

OpenClaw context window sizing: OpenClaw’s compaction triggers when contextTokens > contextWindow − reserveTokens. The default reserveTokensFloor is 20,000 tokens, a floor that overrides reserveTokens when lower, so any model context below ~37k will trigger an infinite compaction loop. Set a low reserve and disable the floor once in your config and it applies to every model, no per-model tuning needed:

"compaction": {
"reserveTokens": 4096,
"reserveTokensFloor": 0
}

reserveTokensFloor is a floor (minimum guard), not the reserve itself, setting only the floor has no effect. reserveTokensFloor: 0 disables the guard so the lower reserveTokens is accepted.

When to apply this: Use this config if your model’s effective context window is below ~37k, either because the model is small (e.g. 8k, 16k, 32k) or because you’ve intentionally capped it to a lower value (e.g. loading a 128k model but setting context to 16k in Lemonade). Without it, OpenClaw enters an infinite compaction loop on startup.

Large-context models at full context: You can skip this entirely. The defaults work fine, compaction will kick in well before the window fills and the model has ample room to generate long responses. If you do apply it, be aware that reserveTokens: 4096 limits response length to ~4k tokens, which may cut off long file generation or detailed plans.

Where to add this: Place the compaction block inside agents.defaults in your openclaw.json (usually at ~/.openclaw/openclaw.json):

{
"agents": {
"defaults": {
"workspace": "/home/&lt;you&gt;/.openclaw/workspace",
"model": {
"primary": "lemonade/&lt;your-model-id&gt;"
},
"compaction": {
"reserveTokens": 4096,
"reserveTokensFloor": 0
}
}
}
}

The rest of your config (gateway, channels, models, etc.) stays unchanged, only the compaction key needs to be added.

OpenClaw can route all agent file and code operations through an isolated Docker container rather than running them directly on your host. This limits the blast radius of any unintended action to the sandbox, leaving your host filesystem and network untouched.

Build the sandbox image once (Docker must be installed):

Terminal window
docker build -t openclaw-sandbox:bookworm-slim - <<'DOCKERFILE'
FROM debian:bookworm-slim
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
bash ca-certificates curl git jq python3 ripgrep \
&& rm -rf /var/lib/apt/lists/*
RUN useradd --create-home --shell /bin/bash sandbox
USER sandbox
WORKDIR /home/sandbox
CMD ["sleep", "infinity"]
DOCKERFILE

Run this to add the sandbox key inside the existing agents.defaults block in ~/.openclaw/openclaw.json:

Terminal window
cat > sandbox.patch.json5 <<JSON5
{
agents: {
defaults: {
sandbox: {
mode: "non-main",
scope: "session",
workspaceAccess: "none"
}
}
}
}
JSON5
openclaw config patch --file ./sandbox.patch.json5

Sandbox containers have no network access by default. See the sandboxing reference for bind mounts and network overrides.

Troubleshooting: Docker Permission Denied

If you get “permission denied” when running Docker commands:

Step 1: Add your user to the docker group

Terminal window
sudo groupadd docker # Create group if needed
sudo usermod -aG docker $USER # Add yourself to the group
newgrp docker # Activate the change
docker run hello-world # Test it

Step 2: If the error persists, apply the permanent fix

Terminal window
sudo chgrp docker /lib/systemd/system/docker.socket
sudo chmod g+w /lib/systemd/system/docker.socket

Then reboot your system.

Quick temporary fix (resets after reboot):

Terminal window
sudo chmod 666 /var/run/docker.sock

Start the OpenClaw Gateway

The gateway is the OpenClaw process that manages the agent loop and serves the dashboard:

Terminal window
openclaw gateway run --bind loopback --port 18789

To open the dashboard, run this in a second terminal while the gateway is still running:

Terminal window
openclaw dashboard

Because the gateway binds to loopback, the dashboard auto-authenticates when opened from the same machine, no token entry or device approval is needed for local access. You should see the OpenClaw dashboard with your Lemonade model listed as the active backend.

If you’ve enabled sandboxing, you can verify it by asking the agent to run hostname from the dashboard. If you see a short container ID instead of your machine’s hostname, the sandbox is working.

Congratulations, you’ve built a fully local AI agent stack from scratch.

Need the gateway token? Run openclaw dashboard --no-open to print the dashboard URL with the token embedded (it also attempts to copy it to your clipboard). Alternatively, the token is at gateway.auth.token in ~/.openclaw/openclaw.json.

Approving a remote device: When you open the dashboard from a second machine or phone, the browser displays a request ID. Back on the machine running the gateway, run:

Terminal window
openclaw devices approve &lt;requestId&gt;

This is only needed for remote or secondary devices, loopback access from the same machine auto-authenticates.


Optional: Connect a Communication Channel

Once the gateway is running you can reach your local agent from any device. Pick the option that fits your setup. OpenClaw supports Discord, Telegram, and other channels, see the full list at docs.openclaw.ai.


Option A: Discord

Discord requires a server where you have administrator access to add a bot. If you share servers but don’t own one, use Option B (Telegram) instead.

Create a Discord account and server

If you do not have a Discord account, sign up at discord.com. You also need a server where you are administrator, create one by clicking the + icon in the Discord sidebar and selecting Create My Own. A private server is fine.

Create a Discord application and bot

  1. Go to the Discord Developer Portal and click New Application. Give it a name (e.g. “openclaw-bot”).
  2. In the sidebar, click Bot. Set a username for the bot.
  3. Still on the Bot page, scroll to Privileged Gateway Intents and enable:
    • Message Content Intent (required)
    • Server Members Intent (recommended)
  4. Scroll back up and click Reset Token to generate your bot token. Copy it.

Add the bot to your server

  1. In the sidebar, click OAuth2/ URL Generator.
  2. Under Scopes, enable bot and applications.commands.
  3. Under Bot Permissions, enable: View Channels, Send Messages, Read Message History, Embed Links, Attach Files.
  4. Copy the generated URL, paste it in your browser, select your server, and confirm. The bot should now appear in your server’s member list.

Collect your IDs

Enable Developer Mode in Discord (User Settings/ Advanced/ Developer Mode), then:

  • Right-click your server icon: Copy Server ID
  • Right-click your own avatar: Copy User ID

Allow DMs from server members

Right-click your server icon/ Privacy Settings/ toggle on Direct Messages. This allows the bot to DM you, which is required for the pairing step.

Configure OpenClaw for Discord

Store your bot token as an environment variable, then create a single patch file that enables Discord, references the token, and allowlists your server. Replace <server_id> and <user_id> with the IDs collected above.

Terminal window
export DISCORD_BOT_TOKEN="YOUR_BOT_TOKEN"
cat > discord.patch.json5 <<JSON5
{
channels: {
discord: {
enabled: true,
token: { source: "env", provider: "default", id: "DISCORD_BOT_TOKEN" },
dmPolicy: "pairing",
groupPolicy: "allowlist",
guilds: {
"<server_id>": {
requireMention: false,
users: ["<user_id>"],
},
},
},
},
}
JSON5
openclaw config patch --file ./discord.patch.json5

Do not rely on asking the agent to configure this. When sandboxing is enabled, the agent cannot write to ~/.openclaw/openclaw.json from inside the sandbox, use the CLI commands above on the host instead.

Restart the gateway so it picks up the new channel config:

Terminal window
openclaw gateway run --bind loopback --port 18789

You should see logged in to discord as <bot-name> in the gateway output within a few seconds.

Pair your Discord account

DM the bot in Discord. It will reply with a short pairing code.

Approve it on the machine running OpenClaw:

Terminal window
openclaw pairing approve discord <CODE>

Pairing codes expire after one hour.

You can now chat with your agent directly from Discord and offload tasks to your local hardware.

image


Option B: Telegram

Telegram is simpler than Discord for most users, it requires no server and no admin access.

Create a Telegram bot

  1. Open Telegram and message @BotFather.
  2. Send /newbot and follow the prompts. Save the bot token it gives you.

Configure OpenClaw for Telegram

Store the token as an environment variable:

Terminal window
export TELEGRAM_BOT_TOKEN="YOUR_BOT_TOKEN"

Add the channel configuration to ~/.openclaw/openclaw.json (or patch it via the dashboard):

{
"channels": {
"telegram": {
"enabled": true,
"botToken": "YOUR_BOT_TOKEN",
"dmPolicy": "pairing"
}
}
}

Restart the gateway, then send your bot any message in Telegram. Approve the pairing:

Terminal window
openclaw pairing list telegram
openclaw pairing approve telegram <CODE>

Pairing codes expire after one hour. You can now chat with your agent via Telegram DM.


Next Steps

Now that your agent can receive commands from your phone and act on your local machine, here are three directions worth exploring:

  1. Stock market summarizer: Schedule OpenClaw to fetch data from financial APIs on a fixed interval, summarize the day’s movements with your local model, and push a digest to your phone each morning via your chosen channel.

  2. Fine-tuning monitor: Kick off a training job remotely via Telegram or Discord, then have the agent tail the training log and report periodic loss values, GPU utilization, and disk usage back to your phone. If the run stalls or VRAM spikes, you find out immediately without needing to be at the machine.

  3. IOT with a local VLM: Point a camera at your front door, run a vision model on Lemonade, and have OpenClaw analyze frames on demand or on a trigger. Ask “did any packages arrive today?” from your phone and get a straight answer from your own hardware.

Need help with this playbook?

Run into an issue or have a question? Open a GitHub issue and our team will take a look.