Running LLMs with PyTorch and AMD ROCm™ software
Learn to run powerful language models on your PC with PyTorch and AMD ROCm™ software to summarize documents quickly and easily.
Overview
Want to run powerful AI language models on your own hardware? This guide shows you how. This tutorial uses PyTorch powered by AMD ROCm™ software to run models that can summarize documents, answer questions, generate text, and more, all running locally.
What You’ll Learn
- Run LLMs like gpt-oss-20b and Mistral-7B-Instruct locally using PyTorch and ROCm
- Create a document summarization tool using LLMs
Initial Setup
Create a Virtual Environment
On Windows, open a terminal in the directory of your choice and follow the commands to create a venv with ROCm+Pytorch already installed.
python -m venv llm-env --system-site-packagesllm-env\Scripts\activateOn Linux, open a terminal in the directory of your choice and follow the commands to create a venv with ROCm+Pytorch already installed.
sudo apt updatesudo apt install -y python3-venvpython3 -m venv llm-env --system-site-packagessource llm-env/bin/activateOn Windows, open a terminal in the directory of your choice and follow the commands to create a venv.
python -m venv llm-envllm-env\Scripts\activateOn Linux, open a terminal in the directory of your choice and follow the commands to create a venv.
sudo apt updatesudo apt install -y python3-venvpython3 -m venv llm-envsource llm-env/bin/activateInstalling Basic Dependencies
ROCm
1. Install AMD ROCm™ software on Linux (Ubuntu 24.04)
These steps install the system ROCm 7.2.1 runtime on Ubuntu 24.04.
sudo apt updatewget https://repo.radeon.com/amdgpu-install/7.2.1/ubuntu/noble/amdgpu-install_7.2.1.70201-1_all.debsudo apt install ./amdgpu-install_7.2.1.70201-1_all.debsudo amdgpu-install -y --usecase=rocm --no-dkms2. Set the correct user permissions
sudo usermod -aG render,video $USER3. Reboot the system
sudo rebootThis is important for the runtime stack and permissions to settle.
4. Verify that ROCm is installed correctly and usable
ls -l /opt/rocmls -l /opt/rocm/lib/libroctx64.so*
# Check ROCm device files (Device files owned by the render group should be visible)ls -l /dev/kfdls -l /dev/dri/renderD*
# Check user groups ($USER should be listed in both render and video)idgroups
# Check ROCm with rocminfo ('Permission denied' error should NOT be seen)rocminfo | sed -n '1,120p'
# Check installed ROCm versioncat /opt/rocm/.info/versionRefer this official documentation for more info.
PyTorch
Install PyTorch with AMD ROCm™ software support in the created virtual environment:
python -m pip install --upgrade pippython -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1151/ torch torchvision torchaudiopython -m pip install --upgrade pippython -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1152/ torch torchvision torchaudiopython -m pip install --upgrade pippython -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1150/ torch torchvision torchaudioSee this link for details.
AMD GPU Driver
Update to the latest AMD GPU driver using AMD Software: Adrenalin Edition™.
- Open
AMD Software: Adrenalin Editionfrom your Start menu or system tray. - Navigate to Driver and Software, click Manage Updates.
- If an update is available, follow the prompts to download and install.
Download and install the latest AMD GPU driver for Linux:
- Visit the AMD Linux Drivers page.
- Follow the installation instructions provided on the download page.
PyTorch
Install PyTorch with AMD ROCm™ software support in the created virtual environment:
python -m pip install --upgrade pippython -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1151/ torch torchvision torchaudiopython -m pip install --upgrade pippython -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1152/ torch torchvision torchaudiopython -m pip install --upgrade pippython -m pip install --force-reinstall --no-cache-dir --index-url https://repo.amd.com/rocm/whl/gfx1150/ torch torchvision torchaudioSee this link for details.
AMD GPU Driver
Update to the latest AMD GPU driver using AMD Software: Adrenalin Edition™.
- Open
AMD Software: Adrenalin Editionfrom your Start menu or system tray. - Navigate to Driver and Software, click Manage Updates.
- If an update is available, follow the prompts to download and install.
Download and install the latest AMD GPU driver for Linux:
- Visit the AMD Linux Drivers page.
- Follow the installation instructions provided on the download page.
Installing Additional Dependencies
pip install transformers==4.57.1 safetensors==0.6.2 accelerate sentencepiece protobufpip install transformers safetensors accelerate sentencepiece protobufQuick Start with Example Scripts
This playbook includes ready-to-use scripts. Click them to preview and download them to the same directory as the environment you created.
| Script | Description | Usage |
|---|---|---|
| run_llm.py | Basic LLM text generation | python run_llm.py |
| summarizer.py | Document summarizer with Harmony support | python summarizer.py --file document.txt |
Both scripts support:
- Model selection:
openai/gpt-oss-20b(default) or other models such as Mistral - Chat template formatting for proper model prompting, especially useful for document summarization
Loading and Running Your First LLM
The included run_llm.py script shows how to generate text with LLMs using PyTorch and AMD ROCm.
The snippet below shows how to use the model and customize the questions asked.
model_name = "openai/gpt-oss-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto",)
# Create system and user promptsprompt = "Explain what a large language model is in 2 brief sentences."print(f"Prompt: {prompt}\n")
messages = [ {"role": "system", "content": "You are a helpful technology assistant"}, {"role": "user", "content": f"{prompt}"},]Try out the downloaded script:
python run_llm.pyBuilding a Document Summarizer
Now that you’ve generated local LLM output, you can build on that by making a practical document summarizer. In this section, you will use the summarizer.py script to feed in a .txt file and automatically generate a concise summary, all running locally on your GPU.
The script is designed to work out of the box. Open the script in an editor to explore the code, customize prompts, and tweak parameters like length and temperature.
Usage Examples
# Summarize a large piece of text (at Line 152 in summarizer.py)python summarizer.py
# Summarize a text filepython summarizer.py --file example_document.txt
# Adjust creativity with temperaturepython summarizer.py --file document.txt --temperature 0.5
# Longer summaries with more tokenspython summarizer.py --file document.txt --max-length 400Learn about Generation Parameters
| Parameter | What It Controls | Typical Values |
|---|---|---|
max_new_tokens | The maximum lenght of the LLM’s output | Use 50–500 tokens for summaries. (1 token is about 0.75 English words) |
temperature | Creativity. Low values make it focused, while high values come with more unpredictability | - 0.1–0.3: Focused, deterministic (good for summaries) 0.5–0.7: Balanced(general use) 0.8–1.0: Creative, varied (brainstorming) |
top_p | Nucleus Sampling - Low values limit the model to more narrow outputs | 0.1-0.5: Strict, predictable 0.9-0.95: (standard, natural, conversational) |
Real-World Applications
- Research Paper Analysis: Extract key findings from complex publications for quick review
- News Aggregation: Summarize news articles into brief daily digests or highlights
- Meeting Notes: Condense transcripts into actionable items and concise summaries
- Legal Document Review: Extract relevant clauses or obligations from long legal texts quickly
- Code Documentation: Generate concise repository overviews and function explanations
Next Steps
- Fine-tuning: Adapt models to your specific field or jargon for better accuracy (see Fine-tuning Playbooks)
- RAG Systems: Combine LLMs with document retrieval for context-aware answers and search
- Model Exploration: Experiment with new models like Llama 3, Phi-3, or Qwen for better results
- Production Deployment: Use tools like vLLM for scalable LLM serving in organizations
Your system gives you the power to run sophisticated language models locally. Experiment with different models, prompts, and parameters to discover what works best for your applications.
Need help with this playbook?
Run into an issue or have a question? Open a GitHub issue and our team will take a look.