Device Family

Device

Building Your First Agent with GAIA

Build a 100% local AI agent — no cloud APIs needed. Use the GAIA SDK to create a hardware advisor on your AMD Ryzen™ AI

Getting Started Creating Agents with GAIA

GAIA agents are AI assistants that use a local LLM to reason and call tools you define — like chatbots that can take action. They run 100% locally with no cloud APIs, no data leaving your machine, and no API keys required.

In this playbook, you’ll build a Hardware Advisor Agent that detects your system’s RAM, GPU, and NPU, queries the local model catalog, and recommends which LLMs your machine can run. It’s a practical introduction to the GAIA Agent SDK that produces something immediately useful.

What You’ll Learn

How to create a GAIA agent with custom tools
Using the LemonadeClient SDK to query system info and model catalogs
Platform-specific GPU/NPU detection (Windows PowerShell and Linux lspci)
Memory-based model sizing using the 70% rule
Building an interactive CLI for natural language hardware queries

Installing Dependencies

Lemonade

Installing Lemonade

Download the latest installer from lemonade-server.ai and run the .msi file.

After installation:

The lemonade CLI is added to your system PATH automatically
Lemonade server is expected to run in the background automatically

You can also install silently from the command line:

msiexec /i lemonade-server-minimal.msi /qn

Ubuntu:

sudo add-apt-repository ppa:lemonade-team/stable
sudo apt install lemonade-server

Arch Linux (AUR):

yay -S lemonade-server

For other distributions or to install from source, see the full installation options.

Verifying Lemonade Installation

Open a terminal and run:

lemonade --version

You should see output like:

lemonade version x.y.z

If you see a version number, Lemonade is installed correctly and ready to go.

For quick reference, here are common Lemonade CLI commands:

Command	What it does
`lemonade --help`	Shows all available commands and flags.
`lemonade --version`	Prints the installed Lemonade version.
`lemonade status`	Confirms whether the Lemonade server is running and reachable. The default OpenAI-compatible API base URL is `http://localhost:13305/api/v1`.
`lemonade list`	Lists models available to your Lemonade setup.
`lemonade pull <MODEL_NAME>`	Downloads a model without launching it.
`lemonade run <MODEL_NAME>`	Downloads the model if needed, then starts it for inference/chat.
`lemonade run <MODEL_NAME> --llamacpp rocm`	Starts a llama.cpp model with the ROCm backend.
`lemonade run <MODEL_NAME> --llamacpp vulkan`	Starts a llama.cpp model with the Vulkan backend.
`lemonade config`	Displays the current Lemonade configuration values.
`lemonade config set llamacpp.backend=rocm`	Sets the default llama.cpp backend to ROCm.

For the latest Lemonade server options or troubleshooting, please refer to the official Lemonade documentation.

GAIA

GAIA is AMD’s open-source framework for building AI agents that run locally on AMD hardware with Ryzen AI acceleration.

Installing GAIA

On Windows, open a terminal in the directory of your choice and follow the commands to create a venv.

python -m venv gaia-env --system-site-packages
gaia-env\Scripts\activate

Then, use pip to install Gaia

pip install amd-gaia

On Linux, open a terminal in the directory of your choice and follow the commands to create a venv.

sudo apt update
sudo apt install -y python3-venv
python3 -m venv gaia-env --system-site-packages
source gaia-env/bin/activate

Then, use pip to install Gaia

pip install amd-gaia

On Windows, open a terminal in the directory of your choice and follow the commands to create a venv.

python -m venv gaia-env
gaia-env\Scripts\activate

Then, use pip to install Gaia

pip install amd-gaia

On Linux, open a terminal in the directory of your choice and follow the commands to create a venv.

sudo apt update
sudo apt install -y python3-venv
python3 -m venv gaia-env
source gaia-env/bin/activate

Then, use pip to install Gaia

pip install amd-gaia

Initializing GAIA

After installation, run gaia init to set up Lemonade Server and download models:

gaia init

This installs Lemonade Server, downloads the default models, and verifies the setup.

Verifying Installation

Verify that GAIA v0.16.2 or later is installed:

gaia --version

For more information, see the GAIA documentation.

Getting Started

Get the finished agent running first so you can see what you’re building. Then, we’ll walk through the code step by step.

Run the Pre-Built Example

This playbook includes the complete . Download it to a directory of your choice and run it to see the finished agent in action:

python hardware_advisor_agent.py

Try asking: “What size LLM can I run?”

Expected output:

============================================================
Hardware Advisor Agent
============================================================

Hi! I can help you figure out what size LLM your system can run.

Agent ready!

You: What size LLM can I run?

Agent: Great news! With 32 GB RAM and a 24 GB GPU, you can run:
- 30B parameter models (like Qwen3-Coder-30B)
- Most 7B-14B models comfortably
- NPU acceleration available for smaller models

Congratulations - you’ve built an agent!

The rest of the playbook will be explaining how each part of the script works, so you can understand it from the ground up.

Understand the Architecture

The Hardware Advisor Agent combines three components:

LemonadeClient SDK — System info and model catalog APIs
Platform-specific detection — Windows PowerShell / Linux lspci for GPU info
Memory calculations — 70% rule for safe model sizing

The data flows through these in sequence: user query → agent selects a tool → tool calls LemonadeClient + OS detection → agent synthesizes the results into a recommendation.

LemonadeClient SDK

The LemonadeClient provides a unified API for system detection, NPU/GPU availability, and model catalog queries.

Import and initialize:

from gaia.llm.lemonade_client import LemonadeClient

client = LemonadeClient(keep_alive=True)

get_system_info() — Returns OS, CPU, RAM, and device availability:

info = client.get_system_info()

# Returns:
{
    "OS Version": "Windows 11 Pro",
    "Processor": "AMD Ryzen 9 7950X",
    "Physical Memory": "32.0 GB",
    "devices": {
        "cpu": {"name": "...", "available": True},
        "amd_igpu": {"name": "...", "memory": 8192, "available": True},
        "amd_npu": {"name": "Ryzen AI NPU", "available": True}
    }
}

# Returns:
{
    "OS Version": "Ubuntu 24.04 LTS",
    "Processor": "AMD Ryzen 9 7950X",
    "Physical Memory": "32.0 GB",
    "devices": {
        "cpu": {"name": "...", "available": True},
        "amd_igpu": {"name": "...", "memory": 8192, "available": True},
        "amd_npu": {"name": "Not detected", "available": False}
    }
}

list_models(show_all=True) — Returns the full model catalog:

response = client.list_models(show_all=True)

# Returns:
{
    "data": [
        {
            "id": "Qwen3-0.6B-GGUF",
            "name": "Qwen3 0.6B",
            "downloaded": True,
            "labels": ["hot", "cpu", "small"]
        }
    ]
}

get_model_info(model_id) — Returns size estimates for a specific model:

model_info = client.get_model_info("Qwen3-Coder-30B-A3B-Instruct-GGUF")

# Returns:
{
    "id": "Qwen3-Coder-30B-A3B-Instruct-GGUF",
    "name": "Qwen3 Coder 30B",
    "size_gb": 18.5,
    "downloaded": False
}

Platform-Specific GPU Detection

The agent uses OS-native commands rather than PyTorch for GPU detection. This works without GPU drivers installed, detects all GPUs (not just CUDA-capable ones), and avoids heavy library imports.

On Windows, the agent uses PowerShell to query WMI:

ps_command = (
    "Get-WmiObject Win32_VideoController | "
    "Select-Object Name,AdapterRAM | "
    "ConvertTo-Csv -NoTypeInformation"
)
result = subprocess.run(
    ["powershell", "-Command", ps_command],
    capture_output=True, text=True, timeout=5
)
# Parse CSV output for GPU name and VRAM

On Linux, the agent uses lspci:

result = subprocess.run(
    ["lspci"], capture_output=True, text=True, timeout=5
)
# Parse output for "VGA compatible controller" lines
# Note: Memory not available via lspci

The 70% Memory Rule

Rule: Model size should be less than 70% of available RAM to leave 30% overhead for inference operations (KV cache, batch processing buffers, runtime memory spikes).

System: 32 GB RAM
Max safe model size: 32 x 0.7 = 22.4 GB
30B model (~18.5 GB): Fits safely
70B model (~42 GB):   Too large

Coding the Agent Step by Step (Optional)

You’ll create one file called hardware_advisor_agent.py and progressively add features. Each step builds on the previous one.

Step 1: Agent Skeleton

Start with a minimal agent structure — just the class and a basic system prompt. The agent has no tools yet.

from gaia import Agent
from gaia.llm.lemonade_client import LemonadeClient

class HardwareAdvisorAgent(Agent):
    """Agent that advises on LLM capabilities based on your hardware."""

    def __init__(self, **kwargs):
        self.client = LemonadeClient(keep_alive=True)
        super().__init__(**kwargs)

    def _get_system_prompt(self) -> str:
        return "You are a hardware advisor for running local LLMs on AMD systems."

    def _register_tools(self):
        # Tools will be added in the next steps
        pass

if __name__ == "__main__":
    agent = HardwareAdvisorAgent()
    print("Agent created successfully!")

Run it to verify:

python hardware_advisor_agent.py

Expected output:

Agent created successfully!

Step 2: GPU and Hardware Detection

Add the _get_gpu_info() helper method and the get_hardware_info() tool. This makes the agent interactive — you can now query it about system specs.

Update the imports at the top of the file:

from typing import Any, Dict

from gaia import Agent, tool
from gaia.llm.lemonade_client import LemonadeClient

Add the _get_gpu_info() helper after the _get_system_prompt() method:

def _get_gpu_info(self) -> Dict[str, Any]:
    """Detect GPU using OS-native commands."""
    import platform
    import subprocess

    system = platform.system()

    try:
        if system == "Windows":
            ps_command = (
                "Get-WmiObject Win32_VideoController | "
                "Select-Object Name,AdapterRAM | "
                "ConvertTo-Csv -NoTypeInformation"
            )
            result = subprocess.run(
                ["powershell", "-Command", ps_command],
                capture_output=True,
                text=True,
                timeout=5,
            )
            if result.returncode == 0:
                lines = [
                    l.strip()
                    for l in result.stdout.strip().split("\n")
                    if l.strip()
                ]
                # Skip virtual/remote adapters that aren't real GPUs
                skip_keywords = [
                    "microsoft remote display",
                    "microsoft basic display",
                    "remote desktop",
                ]
                # Collect all valid GPUs and pick the one with the most VRAM
                candidates = []
                for line in lines[1:]:  # Skip header
                    line = line.replace('"', "")
                    parts = line.split(",")
                    if len(parts) >= 2:
                        try:
                            name = parts[0].strip()
                            adapter_ram = (
                                int(parts[1]) if parts[1].strip().isdigit() else 0
                            )
                            if name and len(name) > 0:
                                if any(k in name.lower() for k in skip_keywords):
                                    continue
                                candidates.append({
                                    "name": name,
                                    "memory_mb": (
                                        adapter_ram // (1024 * 1024)
                                        if adapter_ram > 0
                                        else 0
                                    ),
                                })
                        except (ValueError, IndexError):
                            continue
                if candidates:
                    return max(candidates, key=lambda g: g["memory_mb"])

        elif system == "Linux":
            result = subprocess.run(
                ["lspci"], capture_output=True, text=True, timeout=5
            )
            if result.returncode == 0:
                candidates = []
                for line in result.stdout.split("\n"):
                    if "VGA compatible controller" in line:
                        parts = line.split(":", 2)
                        if len(parts) >= 3:
                            candidates.append({
                                "name": parts[2].strip(),
                                "memory_mb": 0,
                            })
                if candidates:
                    # Prefer AMD GPUs if present, otherwise return first
                    amd_gpus = [g for g in candidates if "amd" in g["name"].lower() or "radeon" in g["name"].lower()]
                    return amd_gpus[0] if amd_gpus else candidates[0]

    except Exception as e:
        print(f"GPU detection error: {e}")

    return {"name": "Not detected", "memory_mb": 0}

Replace the _register_tools() method with the get_hardware_info tool:

def _register_tools(self):
    client = self.client
    agent = self

    @tool(atomic=True)
    def get_hardware_info() -> Dict[str, Any]:
        """Get detailed system hardware information including RAM, GPU, and NPU."""
        try:
            info = client.get_system_info()

            # Parse RAM (format: "32.0 GB")
            ram_str = info.get("Physical Memory", "0 GB")
            ram_gb = float(ram_str.split()[0]) if ram_str else 0

            # Detect GPU
            gpu_info = agent._get_gpu_info()
            gpu_name = gpu_info.get("name", "Not detected")
            gpu_available = gpu_name != "Not detected"
            gpu_memory_mb = gpu_info.get("memory_mb", 0)
            gpu_memory_gb = (
                round(gpu_memory_mb / 1024, 2) if gpu_memory_mb > 0 else 0
            )

            # Get NPU information from Lemonade
            devices = info.get("devices", {})
            npu_info = devices.get("amd_npu", {})
            npu_available = npu_info.get("available", False)
            npu_name = (
                npu_info.get("name", "Not detected")
                if npu_available
                else "Not detected"
            )

            return {
                "success": True,
                "os": info.get("OS Version", "Unknown"),
                "processor": info.get("Processor", "Unknown"),
                "ram_gb": ram_gb,
                "amd_igpu": {
                    "name": gpu_name,
                    "memory_mb": gpu_memory_mb,
                    "memory_gb": gpu_memory_gb,
                    "available": gpu_available,
                },
                "amd_npu": {"name": npu_name, "available": npu_available},
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "message": "Failed to get hardware information from Lemonade Server",
            }

Update the __main__ block to enable interactive testing:

if __name__ == "__main__":
    agent = HardwareAdvisorAgent()
    print("Hardware Advisor Agent (Ctrl+C to exit)")
    print("Try: 'Show me my system specs'\n")

    while True:
        try:
            query = input("You: ").strip()
            if query:
                agent.process_query(query)
                print()
        except KeyboardInterrupt:
            print("\nGoodbye!")
            break

Run and try asking “Show me my system specs”:

python hardware_advisor_agent.py

Example output:

You: Show me my system specs

Agent: Your system has excellent specs for running LLMs locally!
- 32 GB RAM
- AMD Radeon RX 7900 XTX with 24 GB VRAM
- Ryzen AI NPU for accelerated inference

Step 3: Model Catalog

Add the list_available_models() tool inside _register_tools(), after the get_hardware_info function. Now the agent can tell you what models are available.

    @tool(atomic=True)
    def list_available_models() -> Dict[str, Any]:
        """List all models available in the catalog with their sizes and download status."""
        try:
            response = client.list_models(show_all=True)
            models_data = response.get("data", [])

            enriched_models = []
            for model in models_data:
                model_id = model.get("id", "")
                model_info = client.get_model_info(model_id)
                size_gb = model_info.get("size_gb", 0)

                enriched_models.append(
                    {
                        "id": model_id,
                        "name": model.get("name", model_id),
                        "size_gb": size_gb,
                        "downloaded": model.get("downloaded", False),
                        "labels": model.get("labels", []),
                    }
                )

            enriched_models.sort(key=lambda m: m["size_gb"], reverse=True)

            return {
                "success": True,
                "models": enriched_models,
                "count": len(enriched_models),
                "message": f"Found {len(enriched_models)} models in catalog",
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "message": "Failed to fetch models from Lemonade Server",
            }

Run and try asking “What models are available?”:

python hardware_advisor_agent.py

Example output:

You: What models are available?

Agent: I found 15 models in the catalog:
- Qwen3-Coder-30B (18.5 GB) [hot, coding] - Not downloaded
- Llama-3.1-8B (4.7 GB) [general] - Downloaded
- Qwen3-0.6B (0.4 GB) [hot, cpu, small] - Downloaded

Step 4: Smart Recommendations

Add the recommend_models() tool inside _register_tools(), after list_available_models. The agent can now calculate which models fit in your system’s memory using the 70% rule.

    @tool(atomic=True)
    def recommend_models(ram_gb: float, gpu_memory_mb: int = 0) -> Dict[str, Any]:
        """Recommend models based on available system memory.

        Args:
            ram_gb: Available system RAM in GB
            gpu_memory_mb: Available GPU memory in MB (0 if no GPU)

        Returns:
            Dictionary with model recommendations that fit in available memory
        """
        try:
            models_result = list_available_models()
            if not models_result.get("success"):
                return models_result

            all_models = models_result.get("models", [])

            # 70% rule: leave 30% overhead for inference
            max_model_size_gb = ram_gb * 0.7

            fitting_models = [
                model
                for model in all_models
                if model["size_gb"] <= max_model_size_gb and model["size_gb"] > 0
            ]

            for model in fitting_models:
                model["estimated_runtime_gb"] = round(model["size_gb"] * 1.3, 2)
                model["fits_in_ram"] = model["estimated_runtime_gb"] <= ram_gb

                if gpu_memory_mb > 0:
                    gpu_memory_gb = gpu_memory_mb / 1024
                    model["fits_in_gpu"] = model["size_gb"] <= (gpu_memory_gb * 0.9)

            fitting_models.sort(key=lambda m: m["size_gb"], reverse=True)

            return {
                "success": True,
                "recommendations": fitting_models,
                "total_fitting_models": len(fitting_models),
                "constraints": {
                    "available_ram_gb": ram_gb,
                    "available_gpu_mb": gpu_memory_mb,
                    "max_model_size_gb": round(max_model_size_gb, 2),
                    "safety_margin_percent": 30,
                },
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "message": "Failed to generate model recommendations",
            }

Run and try asking “What size LLM can I run?”:

python hardware_advisor_agent.py

Example output:

You: What size LLM can I run?

Agent: With 32 GB RAM and 24 GB GPU, you can safely run models up to 22.4 GB!

Top recommendations:
1. Qwen3-Coder-30B (18.5 GB) - Fits in RAM and GPU
2. Llama-3.1-8B (4.7 GB) - Fits in RAM and GPU

Step 5: Production CLI

Replace the simple __main__ block with a polished interactive CLI. This adds a banner, quit commands, and better error handling.

Replace the entire if __name__ == "__main__": block with:

def main():
    """Run the Hardware Advisor Agent interactively."""
    print("=" * 60)
    print("Hardware Advisor Agent")
    print("=" * 60)
    print("\nHi! I can help you figure out what size LLM your system can run.")
    print("\nTry asking:")
    print("  - 'What size LLM can I run?'")
    print("  - 'Show me my system specs'")
    print("  - 'What models are available?'")
    print("  - 'Can I run a 30B model?'")
    print("\nType 'quit', 'exit', or 'q' to stop.\n")

    try:
        agent = HardwareAdvisorAgent()
        print("Hardware Advisor Agent (Ctrl+C to exit)")
    except Exception as e:
        print(f"Error initializing agent: {e}")
        print("\nMake sure Lemonade Server is running before using GAIA.")
        return

    while True:
        try:
            user_input = input("You: ").strip()

            if not user_input:
                continue

            if user_input.lower() in ("quit", "exit", "q"):
                print("Goodbye!")
                break

            agent.process_query(user_input)
            print()

        except KeyboardInterrupt:
            print("\nGoodbye!")
            break
        except Exception as e:
            print(f"\nError: {e}\n")

if __name__ == "__main__":
    main()

Final Verification

Your hardware_advisor_agent.py should now have all of these components:

Imports: from typing import Any, Dict and from gaia import Agent, tool
HardwareAdvisorAgent class with __init__ and system prompt
_get_gpu_info() helper (Windows PowerShell + Linux lspci)
get_hardware_info() tool with GPU, NPU, and OS fields
list_available_models() tool with labels and size enrichment
recommend_models() tool with 70% rule, fits_in_ram, fits_in_gpu
main() function with interactive CLI

Test these queries to confirm everything works:

“What size LLM can I run?”
“Show me my system specs”
“What models are available?”
“Can I run a 30B model?”

Next Steps

Explore LemonadeClient APIs — Discover more system and model management capabilities in the LemonadeClient SDK documentation
Add voice interaction — Integrate Whisper ASR and Kokoro TTS to let users ask hardware questions by speaking. See the Talk guide
Add MCP support — Expose the hardware advisor as an MCP server so other tools can query it. See the MCP guide
Extend the recommendation engine — Factor in GPU VRAM for offloading layers, or add benchmarking to estimate tokens-per-second
Build a multi-agent system — Combine the hardware advisor with a code agent or chat agent using the Routing Agent

Need help with this playbook?

Run into an issue or have a question? Open a GitHub issue and our team will take a look.

Open an Issue