Device Family

Device

Getting Started with Ollama

Install Ollama and run LLMs locally — chat from the terminal, desktop app, or REST API on your AMD Ryzen™ AI

Overview

Ollama is a popular lightweight tool for running large language models locally. It handles model downloading, quantization, and serving behind a simple command-line interface and desktop app, so you can go from zero to chatting with an LLM in minutes.

This playbook walks you through installing Ollama, pulling the GPT-OSS 20B model, and having a conversation with it, through both the terminal and the desktop app.

What You’ll Learn

How to install and launch Ollama on your system
Pull and run the GPT-OSS 20B model locally
Chat with models using the CLI
Query models programmatically through the REST API

Installing Dependencies

AMD GPU Driver

Update to the latest AMD GPU driver using AMD Software: Adrenalin Edition™.

Open AMD Software: Adrenalin Edition from your Start menu or system tray.
Navigate to Driver and Software, click Manage Updates.
If an update is available, follow the prompts to download and install.

AMD GPU Driver

Download and install the latest AMD GPU driver for Linux:

Visit the AMD Linux Drivers page.
Follow the installation instructions provided on the download page.

Installing Ollama

Download the installer from ollama.com/download.
Run the .exe installer and follow the prompts.
Once installed, Ollama runs as a background service and is accessible from the terminal, desktop app, and system tray.

Verify the installation by opening a terminal and running:

ollama --version

You should see the installed version number printed to the console.

Run the official install script:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version

You should see the installed version number printed to the console.

Pulling Your First Model

Ollama manages models through a registry similar to container images. To download GPT-OSS 20B:

ollama pull gpt-oss:20b

This downloads the model weights to your local machine (approximately 12 GB). The download only happens once, and subsequent runs load the model from disk.

You can confirm the model is available with:

ollama list

You should see gpt-oss:20b in the output along with its size and last-modified date.

Model Naming

Ollama model names follow the format name:tag. The tag usually indicates the parameter count or quantization variant. Some useful commands for managing models:

Command	Description
`ollama list`	Show all downloaded models
`ollama pull <model>`	Download a model without running it
`ollama rm <model>`	Remove a model to free disk space
`ollama show <model>`	Display model metadata and parameters

Chatting from the Terminal

Launch an interactive chat session directly from the command line:

ollama run gpt-oss:20b

Ollama loads the model into memory and drops you into a prompt. Try asking it something:

>>> What is the capital of France and why is it historically significant?

The model streams its response token-by-token directly in the terminal. Type /bye or press Ctrl+D to exit the session.

Chatting from the Desktop App

Ollama also ships with a desktop application that provides a clean chat interface for interacting with your models.

Open Ollama from the Start menu or click the Ollama icon in the system tray and select Open Ollama.

Once the app is open:

Click New Chat in the sidebar.
Select gpt-oss:20b from the model dropdown in the bottom-right corner of the chat input area.
Type a message and press Enter to start chatting.

Ollama desktop app chatting with gpt-oss:20b

The desktop app keeps a history of your conversations in the sidebar, making it easy to revisit previous chats.

Using the REST API

After installation, Ollama runs as a background service and exposes a REST API on http://localhost:11434 that you can use to integrate models into your own applications and scripts.

Generate a Response in Terminal

curl http://localhost:11434/api/generate -d '{"model": "gpt-oss:20b", "prompt": "Explain GPU acceleration in two sentences.", "stream": false}'

curl.exe http://localhost:11434/api/generate -d '{"model": "gpt-oss:20b", "prompt": "Explain GPU acceleration in two sentences.", "stream": false}'

The response is a JSON object containing the model’s output in the response field.

Python Example

Now that we can hit the Ollama API programmatically, let’s call it from Python.

Create a Virtual Environment in Terminal

sudo apt install -y python3-venv
python3 -m venv ollama-env
source ollama-env/bin/activate
pip install requests

python -m venv ollama-env
ollama-env\Scripts\activate
pip install requests

Create a Python file

In the same directory, use VS Code or another editor to create a .py file and copy the following code into it. Then, run the file in your activated environment with python your_file_name.py

import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "gpt-oss:20b",
        "prompt": "Write a haiku about local AI inference.",
        "stream": False,
    },
)

print(response.json()["response"])

Key API Endpoints

Endpoint	Method	Purpose
`/api/generate`	POST	Single-turn text generation
`/api/chat`	POST	Multi-turn conversation with message history
`/api/tags`	GET	List available models
`/api/show`	POST	Show model details
`/api/pull`	POST	Pull a model from the registry

For the full API reference, see the Ollama API documentation.

Next Steps

Try different models: Browse the Ollama model library to explore hundreds of available models, from small coding assistants to large reasoning models.
Create custom models: Use a Modelfile to set custom system prompts, temperature, and other parameters for a tailored experience.
Build with the API: Use the Python or JavaScript client libraries to integrate Ollama into your applications.
Connect to frontends: Pair Ollama with tools like Open WebUI for a feature-rich chat interface with search, personas, and document upload.

For more information, check out the Ollama documentation.

Need help with this playbook?

Run into an issue or have a question? Open a GitHub issue and our team will take a look.

Open an Issue