- Generating images with ComfyUI and Z Image Turbo
- Automating Workflows with n8n and Local LLMs
- Building Your First Agent with GAIA
- Fine-tune LLMs with PyTorch and AMD ROCm™ Software
- Getting Started with Ollama
- How to Chat with LLMs in Open WebUI
- LLM Fine-Tuning with LLaMA Factory
- Local LLM Coding with VS Code and Qwen3-Coder
- Optimized LLMs Fine-tuning with Unsloth
- Running and serving LLMs with LM Studio
- Running LLMs with PyTorch and AMD ROCm™ software
- Speech-to-Speech Translation
- Using Lemonade Across CPU, GPU, and NPU
Getting Started with Ollama
Install Ollama and run LLMs locally — chat from the terminal, desktop app, or REST API on your AMD Ryzen™ AI
Overview
Ollama is a popular lightweight tool for running large language models locally. It handles model downloading, quantization, and serving behind a simple command-line interface and desktop app, so you can go from zero to chatting with an LLM in minutes.
This playbook walks you through installing Ollama, pulling the GPT-OSS 20B model, and having a conversation with it, through both the terminal and the desktop app.
What You’ll Learn
- How to install and launch Ollama on your system
- Pull and run the GPT-OSS 20B model locally
- Chat with models using the CLI
- Query models programmatically through the REST API
Installing Dependencies
AMD GPU Driver
Update to the latest AMD GPU driver using AMD Software: Adrenalin Edition™.
- Open
AMD Software: Adrenalin Editionfrom your Start menu or system tray. - Navigate to Driver and Software, click Manage Updates.
- If an update is available, follow the prompts to download and install.
AMD GPU Driver
Download and install the latest AMD GPU driver for Linux:
- Visit the AMD Linux Drivers page.
- Follow the installation instructions provided on the download page.
Installing Ollama
- Download the installer from ollama.com/download.
- Run the
.exeinstaller and follow the prompts. - Once installed, Ollama runs as a background service and is accessible from the terminal, desktop app, and system tray.
Verify the installation by opening a terminal and running:
ollama --versionYou should see the installed version number printed to the console.
Run the official install script:
curl -fsSL https://ollama.com/install.sh | shVerify the installation:
ollama --versionYou should see the installed version number printed to the console.
Pulling Your First Model
Ollama manages models through a registry similar to container images. To download GPT-OSS 20B:
ollama pull gpt-oss:20bThis downloads the model weights to your local machine (approximately 12 GB). The download only happens once, and subsequent runs load the model from disk.
You can confirm the model is available with:
ollama listYou should see gpt-oss:20b in the output along with its size and last-modified date.
Model Naming
Ollama model names follow the format name:tag. The tag usually indicates the parameter count or quantization variant. Some useful commands for managing models:
| Command | Description |
|---|---|
ollama list | Show all downloaded models |
ollama pull <model> | Download a model without running it |
ollama rm <model> | Remove a model to free disk space |
ollama show <model> | Display model metadata and parameters |
Chatting from the Terminal
Launch an interactive chat session directly from the command line:
ollama run gpt-oss:20bOllama loads the model into memory and drops you into a prompt. Try asking it something:
>>> What is the capital of France and why is it historically significant?The model streams its response token-by-token directly in the terminal. Type /bye or press Ctrl+D to exit the session.
Chatting from the Desktop App
Ollama also ships with a desktop application that provides a clean chat interface for interacting with your models.
Open Ollama from the Start menu or click the Ollama icon in the system tray and select Open Ollama.
Once the app is open:
- Click New Chat in the sidebar.
- Select gpt-oss:20b from the model dropdown in the bottom-right corner of the chat input area.
- Type a message and press Enter to start chatting.

The desktop app keeps a history of your conversations in the sidebar, making it easy to revisit previous chats.
Using the REST API
After installation, Ollama runs as a background service and exposes a REST API on http://localhost:11434 that you can use to integrate models into your own applications and scripts.
Generate a Response in Terminal
curl http://localhost:11434/api/generate -d '{"model": "gpt-oss:20b", "prompt": "Explain GPU acceleration in two sentences.", "stream": false}'curl.exe http://localhost:11434/api/generate -d '{"model": "gpt-oss:20b", "prompt": "Explain GPU acceleration in two sentences.", "stream": false}'The response is a JSON object containing the model’s output in the response field.
Python Example
Now that we can hit the Ollama API programmatically, let’s call it from Python.
Create a Virtual Environment in Terminal
sudo apt install -y python3-venvpython3 -m venv ollama-envsource ollama-env/bin/activatepip install requestspython -m venv ollama-envollama-env\Scripts\activatepip install requestsCreate a Python file
In the same directory, use VS Code or another editor to create a .py file and copy the following code into it. Then, run the file in your activated environment with python your_file_name.py
import requests
response = requests.post( "http://localhost:11434/api/generate", json={ "model": "gpt-oss:20b", "prompt": "Write a haiku about local AI inference.", "stream": False, },)
print(response.json()["response"])Key API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/api/generate | POST | Single-turn text generation |
/api/chat | POST | Multi-turn conversation with message history |
/api/tags | GET | List available models |
/api/show | POST | Show model details |
/api/pull | POST | Pull a model from the registry |
For the full API reference, see the Ollama API documentation.
Next Steps
- Try different models: Browse the Ollama model library to explore hundreds of available models, from small coding assistants to large reasoning models.
- Create custom models: Use a Modelfile to set custom system prompts, temperature, and other parameters for a tailored experience.
- Build with the API: Use the Python or JavaScript client libraries to integrate Ollama into your applications.
- Connect to frontends: Pair Ollama with tools like Open WebUI for a feature-rich chat interface with search, personas, and document upload.
For more information, check out the Ollama documentation.
Need help with this playbook?
Run into an issue or have a question? Open a GitHub issue and our team will take a look.