Device Family

Device

LLM Fine-Tuning with LLaMA Factory

Fine-tune large language models (LLMs) using LLaMA Factory and LoRA techniques.

Overview

Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks. LLaMA Factory is an open-source and user-friendly platform that streamlines the training and fine-tuning of large language models and multimodal models. It allows users to customize hundreds of pre-trained models locally with minimal coding.

This playbook teaches you how to fine-tune LLMs using LLaMA Factory on your local AMD hardware.

What you’ll learn

How to set up LLaMA Factory with AMD ROCm™ software
How to configure LLM fine-tuning parameters (using Qwen/Qwen3-4B-Instruct-2507 as an example)
How to run LLaMA Factory fine-tuning
How to run inference with the fine-tuned model
How to export the fine-tuned model

Estimated Time

Duration: It will take about 60 minutes to run this playbook (depending on your model/dataset size and network speed).
View the LLaMA Factory GitHub for more information.

Setting up the Environment

Create a Virtual Environment

sudo apt update
sudo apt install -y python3-venv
python3 -m venv venv
source venv/bin/activate

python -m venv venv
venv\Scripts\activate

Installing Basic Dependencies

ROCm

Add the current user to the render and video groups.

sudo usermod -a -G render,video $LOGNAME

Restart your system to apply the settings.

sudo reboot

Install ROCm in the created virtual environment.

python -m pip install --upgrade pip
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1151/ "rocm[libraries,devel]"

python -m pip install --upgrade pip
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1152/ "rocm[libraries,devel]"

python -m pip install --upgrade pip
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1150/ "rocm[libraries,devel]"

python -m pip install --upgrade pip
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx120x-all/ "rocm[libraries,devel]"

For further installation help, please see this link.

PyTorch

Install PyTorch with AMD ROCm™ software support in the created virtual environment:

python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1151/ "torch==2.11.0+rocm7.13.0" "torchvision==0.26.0+rocm7.13.0" "torchaudio==2.11.0+rocm7.13.0"

python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1152/ "torch==2.11.0+rocm7.13.0" "torchvision==0.26.0+rocm7.13.0" "torchaudio==2.11.0+rocm7.13.0"

python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1150/ "torch==2.11.0+rocm7.13.0" "torchvision==0.26.0+rocm7.13.0" "torchaudio==2.11.0+rocm7.13.0"

See this link for details.

AMD GPU Driver

Update to the latest AMD GPU driver using AMD Software: Adrenalin Edition™.

Open AMD Software: Adrenalin Edition from your Start menu or system tray.
Navigate to Driver and Software, click Manage Updates.
If an update is available, follow the prompts to download and install.

AMD GPU Driver

Download and install the latest AMD GPU driver for Linux:

Visit the AMD Linux Drivers page.
Follow the installation instructions provided on the download page.

PyTorch

Install PyTorch with AMD ROCm™ software support in the created virtual environment:

python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1151/ "torch==2.11.0+rocm7.13.0" "torchvision==0.26.0+rocm7.13.0" "torchaudio==2.11.0+rocm7.13.0"

python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1152/ "torch==2.11.0+rocm7.13.0" "torchvision==0.26.0+rocm7.13.0" "torchaudio==2.11.0+rocm7.13.0"

python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1150/ "torch==2.11.0+rocm7.13.0" "torchvision==0.26.0+rocm7.13.0" "torchaudio==2.11.0+rocm7.13.0"

See this link for details.

AMD GPU Driver

Update to the latest AMD GPU driver using AMD Software: Adrenalin Edition™.

Open AMD Software: Adrenalin Edition from your Start menu or system tray.
Navigate to Driver and Software, click Manage Updates.
If an update is available, follow the prompts to download and install.

AMD GPU Driver

Download and install the latest AMD GPU driver for Linux:

Visit the AMD Linux Drivers page.
Follow the installation instructions provided on the download page.

Installing Additional Dependencies

Python: ensure minimum version is 3.11

pip install huggingface_hub --break-system-packages

pip install huggingface_hub

Install LLaMA Factory

LLaMA Factory depends on PyTorch. You should already have it installed per the above requirements.

Download the source code from LLaMA Factory official GitHub repository, and install its dependencies.

git clone --depth 1 https://github.com/hiyouga/LlamaFactory.git
cd LlamaFactory
pip install setuptools --break-system-packages
pip install -e . --break-system-packages
pip install -r requirements/metrics.txt --break-system-packages

git clone --depth 1 https://github.com/hiyouga/LlamaFactory.git
cd LlamaFactory
pip install -e .
pip install -r requirements/metrics.txt

Verify if llamafactory-cli is executable.

cd LlamaFactory
llamafactory-cli version || python -m llamafactory.cli version || true
echo "llamafactory-cli is available"
command -v llamafactory-cli

cd LlamaFactory
if (Get-Command llamafactory-cli -ErrorAction SilentlyContinue) {
    llamafactory-cli version
    Write-Host "llamafactory-cli is available"
} else {
    Write-Host "llamafactory-cli is not available"
}

Example output:

LlaMaFactory version

Having successfully installed LLaMA Factory, let’s run fine-tuning on it.

Using LLaMA Factory CLI for Fine Tuning

This section will cover how to prepare fine-tuning datasets, configure LoRA/QLoRA parameters, and run LoRA fine-tuning.

Dataset Preparation

LLaMA Factory supports fine-tuning datasets in the Alpaca format and ShareGPT format. All the available datasets have been defined in the dataset_info.json. If you are using a custom dataset, please make sure to add a dataset description in dataset_info.json and specify the dataset name before training. Details can be found in their docs here.

In this playbook, we will use the identity and alpaca_en_demo datasets as an example, and configure the dataset information in the next step.

Fine-tuning parameter configuration

LLaMA Factory supports multiple fine-tuning schemes.

Fine-Tuning schemes	LLaMA Factory Examples
Full-Parameter	examples/train_full
LoRA fine-tuning	examples/train_lora
QLoRA fine-tuning	examples/train_qlora

These example configuration files have specified model parameters, fine-tuning method parameters, dataset parameters, evaluation parameters, and more. You can configure them according to your own needs. In this playbook, we will use qwen3_lora_sft.yaml.

Key parameters explained:

model_name_or_path - Hugging Face model name or local model file path.
stage - Training stage. Options: rm (reward modeling), pt (pretrain), sft (Supervised Fine-Tuning), PPO, DPO, KTO, ORPO.
do_train - true for training, false for evaluation
finetuning_type - Fine-tuning method. Options: freeze, lora, full
lora_rank - The dimensionality of the low-rank matrix used in LoRA, typical values: 4, 6, 8, 16 (smaller values = fewer parameters = faster fine-tuning; larger values = better task adaptation but higher resource usage).
lora_target - Target modules for LoRA method. Default: all.
dataset - Dataset(s) to use. Use “,” to separate multiple datasets
output_dir - Fine-tuning Output path
logging_steps - Logging interval in steps
save_steps - Model checkpoint saving interval.
overwrite_output_dir - Whether to allow overwriting the output directory.
per_device_train_batch_size - Training batch size per device.
gradient_accumulation_steps - Number of gradient accumulation steps.
learning_rate - Learning rate
num_train_epochs - Number of training epochs
lr_scheduler_type - Learning rate schedule. Options: linear, cosine, polynomial, constant, etc.
warmup_ratio - Learning rate warmup ratio

We will modify the default value of lora_rank to run fine-tuning on AMD Ryzen™ & AMD Radeon™ GPUs.

sed -i.bak 's/lora_rank: 8/lora_rank: 6/g' examples/train_lora/qwen3_lora_sft.yaml

We will update the default LoRA fine-tuning configuration for better compatibility with AMD Ryzen™ and AMD Radeon™ GPUs:

Set lora_rank from 8 to 6 to reduce memory usage during fine-tuning.
Use fp16 instead of bf16 for broader AMD GPU compatibility and lower memory usage.
Set dataloader_num_workers to 0 on Windows to avoid "Can't pickle local object<>" errors caused by multiprocessing data loading.

$filePath = "examples/train_lora/qwen3_lora_sft.yaml"

# Create a backup before modifying the YAML file
Copy-Item -Path $filePath -Destination "$filePath.bak" -Force

# Read the file and update the training settings
$content = Get-Content -Path $filePath -Raw

$newContent = $content `
  -replace 'lora_rank: 8', 'lora_rank: 6' `
  -replace 'bf16: true', 'fp16: true' `
  -replace 'dataloader_num_workers: 4', 'dataloader_num_workers: 0'

Set-Content -Path $filePath -Value $newContent

Run LLaMA Factory Fine-Tuning

llamafactory-cli is the official command-line interface (CLI) tool for LLaMA Factory, developed to simplify end-to-end LLM workflows (data preparation → fine-tuning → evaluation → deployment) without writing complex code.

For training/fine-tuning, llamafactory-cli train is the core subcommand of the LLaMA Factory CLI. It abstracts fine-tuning workflows (data preprocessing, hyperparameter tuning, hardware optimization) into a single CLI command, supporting multiple fine-tuning paradigms (LoRA/QLoRA/Full Fine-Tuning) and is optimized for low-resource GPUs (e.g., QLoRA on 16GB VRAM).

You can run LLaMA Factory fine-tuning using the following command, which is based on the modified configuration file of Qwen3 LoRA fine-tuning.

llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml

After running LLM finetuning, all generated outputs are stored in the “output_dir”, including model checkpoint files, configuration files, and training metrics.

Qwen3 LoRA Fine-tuning

Test the fine-tuned model

llamafactory-cli chat is designed for interactive chat/inference with LLMs (both base models and LoRA-fine-tuned models). LLaMA Factory provides the sample configuration to run inference of fine-tuned models in examples/inference. You can also modify this sample configuration to change the settings, such as the inference backend.

Use the following command to test the Qwen3 fine-tuned model:

llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml

An example chat using the fine-tuned model is shown below:

Test Qwen3 Fine-Tuned model

Export the fine-tuned model

For production use-cases, the pre-trained model and the LoRA adapter need to be merged and exported into a single model. This merged model can be used as a normal Hugging Face model file. LLaMA Factory provides the sample configurations in examples/merge_lora.

Use the following command to export the Qwen3 fine-tuned model:

llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml

The result of exporting the fine-tuned model is shown below.

Export Qwen3 Fine-Tuned model

Using LLaMA Factory GUI

LLaMA-Factory also supports zero-code fine-tuning of LLMs through a web UI in the browser.

Use the following command to open it:

llamafactory-cli webui

The LlamaFactory Web UI offers a streamlined interface for managing machine learning workflows, including training, evaluation, prediction, chatting, and exporting models. Here’s a brief introduction to each tab:

Train: This tab allows you to select a model and dataset, configure training parameters, and initiate the training process. It’s essential to understand the mandatory and optional parameters to optimize the training setup.
Evaluate & Predict: After training, you can evaluate the model’s performance and make predictions using this tab. It provides insights into the model’s accuracy and effectiveness on new data.
Chat: Once training is complete, load the model in the Chat tab to interact with it and see the results of your work. This feature enables real-time communication with the trained model.
Export: This tab facilitates the export of trained models for deployment or further use. You can save your models in various formats suitable for different applications.

For detailed guidance, we encourage you to refer to the official documentation on the LlamaFactory GitHub repository and the LlamaFactory ReadTheDocs. Additionally, the Wiki LLaMA Board Web UI provides valuable insights into the interface and its functionalities.

Next Steps

Try different models such as gpt-oss and other state of the art models.
Experiment with different backends on the fine-tuned model

For more documentation, please visit: https://llamafactory.readthedocs.io/en/latest/

Need help with this playbook?

Run into an issue or have a question? Open a GitHub issue and our team will take a look.

Open an Issue