Skip to content
Device Family
Device
OS

Running LLMs with PyTorch and AMD ROCm™ software

Learn to run powerful language models on your PC with PyTorch and AMD ROCm™ software to summarize documents quickly and easily.

Running LLMs with PyTorch and AMD ROCm™ software

Overview

Want to run powerful AI language models on your own hardware? This guide shows you how. This tutorial uses PyTorch powered by AMD ROCm™ software to run models that can summarize documents, answer questions, generate text, and more, all running locally.

What You’ll Learn

  • Run LLMs like gpt-oss-20b and Mistral-7B-Instruct locally using PyTorch and ROCm
  • Create a document summarization tool using LLMs

Initial Setup

Create a Virtual Environment

On Windows, open a terminal in the directory of your choice and follow the commands to create a venv with ROCm+Pytorch already installed.

Terminal window
python -m venv llm-env --system-site-packages
llm-env\Scripts\activate

On Linux, open a terminal in the directory of your choice and follow the commands to create a venv with ROCm+Pytorch already installed.

Terminal window
sudo apt update
sudo apt install -y python3-venv
python3 -m venv llm-env --system-site-packages
source llm-env/bin/activate

On Windows, open a terminal in the directory of your choice and follow the commands to create a venv.

Terminal window
python -m venv llm-env
llm-env\Scripts\activate

On Linux, open a terminal in the directory of your choice and follow the commands to create a venv.

Terminal window
sudo apt update
sudo apt install -y python3-venv
python3 -m venv llm-env
source llm-env/bin/activate

Installing Basic Dependencies

ROCm

1. Install AMD ROCm™ software on Linux (Ubuntu 24.04)

These steps install the system ROCm 7.2.1 runtime on Ubuntu 24.04.

Terminal window
sudo apt update
wget https://repo.radeon.com/amdgpu-install/7.2.1/ubuntu/noble/amdgpu-install_7.2.1.70201-1_all.deb
sudo apt install ./amdgpu-install_7.2.1.70201-1_all.deb
sudo amdgpu-install -y --usecase=rocm --no-dkms

2. Set the correct user permissions

Terminal window
sudo usermod -aG render,video $USER

3. Reboot the system

Terminal window
sudo reboot

This is important for the runtime stack and permissions to settle.

4. Verify that ROCm is installed correctly and usable

Terminal window
ls -l /opt/rocm
ls -l /opt/rocm/lib/libroctx64.so*
# Check ROCm device files (Device files owned by the render group should be visible)
ls -l /dev/kfd
ls -l /dev/dri/renderD*
# Check user groups ($USER should be listed in both render and video)
id
groups
# Check ROCm with rocminfo ('Permission denied' error should NOT be seen)
rocminfo | sed -n '1,120p'
# Check installed ROCm version
cat /opt/rocm/.info/version

Refer this official documentation for more info.


PyTorch

Install PyTorch with AMD ROCm™ software support in the created virtual environment:

Terminal window
python -m pip install --upgrade pip
python -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1151/ torch torchvision torchaudio
Terminal window
python -m pip install --upgrade pip
python -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1152/ torch torchvision torchaudio
Terminal window
python -m pip install --upgrade pip
python -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1150/ torch torchvision torchaudio

See this link for details.


AMD GPU Driver

Update to the latest AMD GPU driver using AMD Software: Adrenalin Edition™.

  1. Open AMD Software: Adrenalin Edition from your Start menu or system tray.
  2. Navigate to Driver and Software, click Manage Updates.
  3. If an update is available, follow the prompts to download and install.

Download and install the latest AMD GPU driver for Linux:

  1. Visit the AMD Linux Drivers page.
  2. Follow the installation instructions provided on the download page.

PyTorch

Install PyTorch with AMD ROCm™ software support in the created virtual environment:

Terminal window
python -m pip install --upgrade pip
python -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1151/ torch torchvision torchaudio
Terminal window
python -m pip install --upgrade pip
python -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1152/ torch torchvision torchaudio
Terminal window
python -m pip install --upgrade pip
python -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1150/ torch torchvision torchaudio

See this link for details.


AMD GPU Driver

Update to the latest AMD GPU driver using AMD Software: Adrenalin Edition™.

  1. Open AMD Software: Adrenalin Edition from your Start menu or system tray.
  2. Navigate to Driver and Software, click Manage Updates.
  3. If an update is available, follow the prompts to download and install.

Download and install the latest AMD GPU driver for Linux:

  1. Visit the AMD Linux Drivers page.
  2. Follow the installation instructions provided on the download page.

Installing Additional Dependencies

Terminal window
pip install transformers==4.57.1 safetensors==0.6.2 accelerate sentencepiece protobuf
Terminal window
pip install transformers safetensors accelerate sentencepiece protobuf

Quick Start with Example Scripts

This playbook includes ready-to-use scripts. Click them to preview and download them to the same directory as the environment you created.

ScriptDescriptionUsage
run_llm.pyBasic LLM text generationpython run_llm.py
summarizer.pyDocument summarizer with Harmony supportpython summarizer.py --file document.txt

Both scripts support:

  • Model selection: openai/gpt-oss-20b (default) or other models such as Mistral
  • Chat template formatting for proper model prompting, especially useful for document summarization

Loading and Running Your First LLM

The included run_llm.py script shows how to generate text with LLMs using PyTorch and AMD ROCm.

The snippet below shows how to use the model and customize the questions asked.

model_name = "openai/gpt-oss-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Create system and user prompts
prompt = "Explain what a large language model is in 2 brief sentences."
print(f"Prompt: {prompt}\n")
messages = [
{"role": "system", "content": "You are a helpful technology assistant"},
{"role": "user", "content": f"{prompt}"},
]

Try out the downloaded script:

Terminal window
python run_llm.py

Building a Document Summarizer

Now that you’ve generated local LLM output, you can build on that by making a practical document summarizer. In this section, you will use the summarizer.py script to feed in a .txt file and automatically generate a concise summary, all running locally on your GPU.

The script is designed to work out of the box. Open the script in an editor to explore the code, customize prompts, and tweak parameters like length and temperature.

Usage Examples

Terminal window
# Summarize a large piece of text (at Line 152 in summarizer.py)
python summarizer.py
# Summarize a text file
python summarizer.py --file example_document.txt
# Adjust creativity with temperature
python summarizer.py --file document.txt --temperature 0.5
# Longer summaries with more tokens
python summarizer.py --file document.txt --max-length 400

Learn about Generation Parameters

ParameterWhat It ControlsTypical Values
max_new_tokensThe maximum lenght of the LLM’s outputUse 50–500 tokens for summaries. (1 token is about 0.75 English words)
temperatureCreativity. Low values make it focused, while high values come with more unpredictability- 0.1–0.3: Focused, deterministic (good for summaries)
0.5–0.7: Balanced(general use)
0.8–1.0: Creative, varied (brainstorming)
top_pNucleus Sampling - Low values limit the model to more narrow outputs0.1-0.5: Strict, predictable
0.9-0.95: (standard, natural, conversational)

Real-World Applications

  • Research Paper Analysis: Extract key findings from complex publications for quick review
  • News Aggregation: Summarize news articles into brief daily digests or highlights
  • Meeting Notes: Condense transcripts into actionable items and concise summaries
  • Legal Document Review: Extract relevant clauses or obligations from long legal texts quickly
  • Code Documentation: Generate concise repository overviews and function explanations

Next Steps

  • Fine-tuning: Adapt models to your specific field or jargon for better accuracy (see Fine-tuning Playbooks)
  • RAG Systems: Combine LLMs with document retrieval for context-aware answers and search
  • Model Exploration: Experiment with new models like Llama 3, Phi-3, or Qwen for better results
  • Production Deployment: Use tools like vLLM for scalable LLM serving in organizations

Your system gives you the power to run sophisticated language models locally. Experiment with different models, prompts, and parameters to discover what works best for your applications.

Need help with this playbook?

Run into an issue or have a question? Open a GitHub issue and our team will take a look.