Building llama.cpp with rocm on Fedora 41
llama.cpp rocm fedora41Building llama.cpp with rocm on Fedora 41
Introduction
This article explains how to build llama.cpp with rocm
support.
Dependencies
Start with build deps:
sudo dnf install make gcc cmake lld clang clang-devel compiler-rt
You will then need to create a couple of symlinks:
ln -s /usr/bin/clang-offload-bundler-17 ~/bin/clang-offload-bundler
ln -s /usr/bin/llvm-objcopy-17 ~/bin/llvm-objcopy
Now you can install the rocm
packages:
sudo dnf install rocminfo 'rocm-*' 'rocblas-*' 'hipblas' 'hipblas-*'
Verify rocminfo
works:
rocminfo
You should be able to see your GPU:
ROCk module is loaded
...
*******
Agent 2
*******
Name: gfx1100
Uuid: GPU-e933ba0005f167b9
Marketing Name: AMD Radeon RX 7900 XTX
Verify you can see GPU usage info with rocm-smi
:
$ rocm-smi
========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
====================================================================================================================
0 1 0x744c, 58885 37.0°C 14.0W N/A, N/A, 0 2057Mhz 1249Mhz 0% auto 339.0W 39% 100%
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================
Build llama.cpp
Now it’s time to build llama.cpp!
Clone the llama.cpp
repo, preferably at:
"${HOME}/git/ggml-org/llama.cpp"
Run the following build script (you can configure the local path of the repo with the LLAMA_CPP_REPO_DIR
):
#!/usr/bin/env bash
# slightly edited and cleaned up from M4TH1EU's bash script
# see: https://github.com/ggml-org/llama.cpp/discussions/9981
function log {
local log_level="$1"
local msg="$2"
local timestamp
timestamp="$(date +'%Y.%m.%d %H:%M:%S')"
>&2 printf '[%s] [%s] %s\n' "${log_level}" "${timestamp}" "${msg}"
}
function log_info {
local msg="$1"
log INFO "${msg}"
}
function log_error {
local msg="$1"
log ERROR "${msg}"
}
function log_warn {
local msg="$1"
log WARN "${msg}"
}
function setup_build_env_vars {
HIPCXX="$(hipconfig -l)/clang"
HIP_PATH="$(hipconfig -R)"
if [[ -z "${HIP_PATH}" ]]; then
log_error "Unable to detect HIP_PATH. Ensure HIP is correctly installed."
return 1
fi
AMDGPU_TARGET="$(rocminfo | grep gfx | head -1 | awk '{print $2}')"
if [[ -z "${AMDGPU_TARGET}" ]]; then
log_error "Unable to detect AMDGPU target using rocminfo."
return 1
fi
HIP_DEVICE_LIB_PATH="$(find "${HIP_PATH}" -name "oclc_abi_version_400.bc" -exec dirname {} \; | head -n 1)"
if [[ -z "${HIP_DEVICE_LIB_PATH}" ]]; then
log_error "Unable to find oclc_abi_version_400.bc under HIP_PATH."
return 1
fi
export LLAMA_HIPBLAS=1
export HIPCXX
export HIP_PATH
export HIP_VISIBLE_DEVICES
export HIP_DEVICE_LIB_PATH
export DEVICE_LIB_PATH=$HIP_DEVICE_LIB_PATH
export ROCM_PATH=/usr/
}
function try_create_symlink {
local src_path="$1"
local dst_path="$2"
if [[ -f "${dst_path}" ]] || [[ -L "${symlink}" ]]; then
log_warn "${dst_path} already a symlink"
return
fi
ln -s "${src_path}" "${dst_path}"
}
function main {
set -e
local llama_cpp_repo_dir="${LLAMA_CPP_REPO_DIR:-"${HOME}/git/ggml-org/llama.cpp"}"
local llama_cpp_git_url="https://github.com/ggml-org/llama.cpp.git"
if ! git -C "${llama_cpp_repo_dir}" rev-parse --is-inside-worktree; then
if [[ -d "${llama_cpp_repo_dir}" ]]; then
log_error "${llama_cpp_repo_dir} is not a llama.cpp repo. Remove this directory."
return 1
fi
git clone "${llama_cpp_git_url}" "${llama_cpp_repo_dir}"
fi
log_info "Cloned ${llama_cpp_git_url}; proceeding"
pushd "${LLAMA_CPP_REPO_DIR}"
log_info "Installing build dependencies"
sudo dnf install make gcc cmake lld clang clang-devel compiler-rt
local nonroot_bin="${NONROOT_BIN:-"${HOME}/bin"}"
log_info "Creating symlinks for build dependencies in non-root bin directory"
try_create_symlink /usr/bin/clang-offload-bundler-17 "${nonroot_bin}/clang-offload-bundler"
try_create_symlink /usr/bin/llvm-objcopy-17 "${nonroot_bin}/llvm-objcopy"
log_info "Installing rocm packages"
sudo dnf install rocminfo 'rocm-*' 'rocblas-*' 'hipblas' 'hipblas-*'
log_info "Verifying frequently used rocm CLI commands"
rocminfo
rocm-smi
local max_threads="${MAX_THREADS:-8}"
setup_build_env_vars
rm -rf "${llama_cpp_repo_dir}/build/"*
cmake -S . -B build \
-DGGML_HIP=ON \
-DAMDGPU_TARGETS="${AMDGPU_TARGET}" \
-DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -- -j "${max_threads}"
log_info "Build complete! Enjoy llama.cpp with rocm!"
}
main
Download the GGUF model from huggingface.co
Create an account on huggingface.
Install the huggingface-cli
(preferably in a virtual environment):
$ python -m venv /tmp/delete-me-venv
$ . /tmp/delete-me-venv/bin/activate
(/tmp/delete-me-venv) $ python -m pip install 'huggingface_hub[cli]'
Create an API token on your huggingface account and login (paste your token when prompted):
huggingface-cli login
Download the GGUF model, e.g.:
# you will likely want to run with nohup because it will take a while
nohup huggingface-cli download unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF --local-dir ~/git/hf.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF 2>&1 > "${HOME}/huggingface-cli-download.out" &
Come back and check if the download is still running:
$ ps -eaf | grep 'huggingface-cli download'
cjvirtu+ 33901 1 11 14:53 ? 00:14:26 /home/cjvirtucio/git/ggml-org/llama.cpp/.venv/bin/python /home/cjvirtucio/git/ggml-org/llama.cpp/.venv/bin/huggingface-cli download unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF --local-dir /home/cjvirtucio/git/hf.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF
If you don’t see the process for it anymore, that means it is probably done. You can check its output at "${HOME}/huggingface-cli-download.out"
.
Run llama-cli
You are now ready to test llama.cpp
and the model:
./build/bin/llama-cli --model "${HOME}/git/hf.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF/DeepSeek-R1-Distill-Llama-70B-Q3_K_M.gguf" --main-gpu 0 --gpu-layers 40 --prompt "make me laugh"
Some important information you should see in its output:
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
...
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 40 repeating layers to GPU
load_tensors: offloaded 40/81 layers to GPU
load_tensors: ROCm0 model buffer size = 15640.00 MiB
load_tensors: CPU_Mapped model buffer size = 17032.53 MiB
Once you get to this point, you can hit the ENTER
key to submit the prompt:
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to the AI.
- To return control without starting a new line, end your input with '/'.
- If you want to submit another line, end your input with '\'.
make me laugh
>
It’ll start working, then respond:
</think>
</think>
Sure! Here’s something to make you laugh:
One day, a man walked into a library and asked the librarian, “Do you have any books on Pavlov’s dogs and Schrödinger’s cats?” The librarian replied, “It rings a bell, but I’m not sure if it’s in the cat-alog or the dog-ma.”
😄