=================
== CJ Virtucio ==
=================

Building llama.cpp with rocm on Fedora 41

llama.cpp rocm fedora41

Building llama.cpp with rocm on Fedora 41

Introduction

This article explains how to build llama.cpp with rocm support.

Dependencies

Start with build deps:

sudo dnf install make gcc cmake lld clang clang-devel compiler-rt

You will then need to create a couple of symlinks:

ln -s /usr/bin/clang-offload-bundler-17 ~/bin/clang-offload-bundler
ln -s /usr/bin/llvm-objcopy-17 ~/bin/llvm-objcopy

Now you can install the rocm packages:

sudo dnf install rocminfo 'rocm-*' 'rocblas-*' 'hipblas' 'hipblas-*'

Verify rocminfo works:

rocminfo

You should be able to see your GPU:

ROCk module is loaded

...

*******
Agent 2
*******
  Name:                    gfx1100
  Uuid:                    GPU-e933ba0005f167b9
  Marketing Name:          AMD Radeon RX 7900 XTX

Verify you can see GPU usage info with rocm-smi:

$ rocm-smi
========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device  Node  IDs              Temp    Power  Partitions          SCLK     MCLK     Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Avg)  (Mem, Compute, ID)
====================================================================================================================
0       1     0x744c,   58885  37.0°C  14.0W  N/A, N/A, 0         2057Mhz  1249Mhz  0%   auto  339.0W  39%    100%
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================

Build llama.cpp

Now it’s time to build llama.cpp!

Clone the llama.cpp repo, preferably at:

"${HOME}/git/ggml-org/llama.cpp"

Run the following build script (you can configure the local path of the repo with the LLAMA_CPP_REPO_DIR):

#!/usr/bin/env bash

# slightly edited and cleaned up from M4TH1EU's bash script
# see: https://github.com/ggml-org/llama.cpp/discussions/9981

function log {
  local log_level="$1"
  local msg="$2"

  local timestamp
  timestamp="$(date +'%Y.%m.%d %H:%M:%S')"

  >&2 printf '[%s] [%s] %s\n' "${log_level}" "${timestamp}" "${msg}"
}

function log_info {
  local msg="$1"

  log INFO "${msg}"
}

function log_error {
  local msg="$1"

  log ERROR "${msg}"
}

function log_warn {
  local msg="$1"

  log WARN "${msg}"
}

function setup_build_env_vars {
  HIPCXX="$(hipconfig -l)/clang"
  HIP_PATH="$(hipconfig -R)"

  if [[ -z "${HIP_PATH}" ]]; then
    log_error "Unable to detect HIP_PATH. Ensure HIP is correctly installed."
    return 1
  fi

  AMDGPU_TARGET="$(rocminfo | grep gfx | head -1 | awk '{print $2}')"
  if [[ -z "${AMDGPU_TARGET}" ]]; then
    log_error "Unable to detect AMDGPU target using rocminfo."
    return 1
  fi

  HIP_DEVICE_LIB_PATH="$(find "${HIP_PATH}" -name "oclc_abi_version_400.bc" -exec dirname {} \; | head -n 1)"
  if [[ -z "${HIP_DEVICE_LIB_PATH}" ]]; then
    log_error "Unable to find oclc_abi_version_400.bc under HIP_PATH."
    return 1
  fi

  export LLAMA_HIPBLAS=1
  export HIPCXX
  export HIP_PATH
  export HIP_VISIBLE_DEVICES
  export HIP_DEVICE_LIB_PATH
  export DEVICE_LIB_PATH=$HIP_DEVICE_LIB_PATH
  export ROCM_PATH=/usr/
}

function try_create_symlink {
  local src_path="$1"
  local dst_path="$2"

  if [[ -f "${dst_path}" ]] || [[ -L "${symlink}" ]]; then
    log_warn "${dst_path} already a symlink"
    return
  fi

  ln -s "${src_path}" "${dst_path}"
}

function main {
  set -e

  local llama_cpp_repo_dir="${LLAMA_CPP_REPO_DIR:-"${HOME}/git/ggml-org/llama.cpp"}"
  local llama_cpp_git_url="https://github.com/ggml-org/llama.cpp.git"

  if ! git -C "${llama_cpp_repo_dir}" rev-parse --is-inside-worktree; then
    if [[ -d "${llama_cpp_repo_dir}" ]]; then
      log_error "${llama_cpp_repo_dir} is not a llama.cpp repo. Remove this directory."
      return 1
    fi

    git clone "${llama_cpp_git_url}" "${llama_cpp_repo_dir}"
  fi

  log_info "Cloned ${llama_cpp_git_url}; proceeding"
  pushd "${LLAMA_CPP_REPO_DIR}"

  log_info "Installing build dependencies"
  sudo dnf install make gcc cmake lld clang clang-devel compiler-rt

  local nonroot_bin="${NONROOT_BIN:-"${HOME}/bin"}"
  log_info "Creating symlinks for build dependencies in non-root bin directory"
  try_create_symlink /usr/bin/clang-offload-bundler-17 "${nonroot_bin}/clang-offload-bundler"
  try_create_symlink /usr/bin/llvm-objcopy-17 "${nonroot_bin}/llvm-objcopy"

  log_info "Installing rocm packages"
  sudo dnf install rocminfo 'rocm-*' 'rocblas-*' 'hipblas' 'hipblas-*'

  log_info "Verifying frequently used rocm CLI commands"
  rocminfo
  rocm-smi

  local max_threads="${MAX_THREADS:-8}"

  setup_build_env_vars

  rm -rf "${llama_cpp_repo_dir}/build/"*
  cmake -S . -B build \
    -DGGML_HIP=ON \
    -DAMDGPU_TARGETS="${AMDGPU_TARGET}" \
    -DCMAKE_BUILD_TYPE=Release

  cmake --build build --config Release -- -j "${max_threads}"

  log_info "Build complete! Enjoy llama.cpp with rocm!"
}

main

Download the GGUF model from huggingface.co

Create an account on huggingface.

Install the huggingface-cli (preferably in a virtual environment):

$ python -m venv /tmp/delete-me-venv
$ . /tmp/delete-me-venv/bin/activate
(/tmp/delete-me-venv) $ python -m pip install 'huggingface_hub[cli]'

Create an API token on your huggingface account and login (paste your token when prompted):

huggingface-cli login

Download the GGUF model, e.g.:

# you will likely want to run with nohup because it will take a while
nohup huggingface-cli download unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF --local-dir ~/git/hf.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF 2>&1 > "${HOME}/huggingface-cli-download.out" &

Come back and check if the download is still running:

$ ps -eaf | grep 'huggingface-cli download'
cjvirtu+   33901       1 11 14:53 ?        00:14:26 /home/cjvirtucio/git/ggml-org/llama.cpp/.venv/bin/python /home/cjvirtucio/git/ggml-org/llama.cpp/.venv/bin/huggingface-cli download unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF --local-dir /home/cjvirtucio/git/hf.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF

If you don’t see the process for it anymore, that means it is probably done. You can check its output at "${HOME}/huggingface-cli-download.out".

Run llama-cli

You are now ready to test llama.cpp and the model:

./build/bin/llama-cli --model "${HOME}/git/hf.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF/DeepSeek-R1-Distill-Llama-70B-Q3_K_M.gguf" --main-gpu 0 --gpu-layers 40 --prompt "make me laugh"

Some important information you should see in its output:

ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32

...

load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 40 repeating layers to GPU
load_tensors: offloaded 40/81 layers to GPU
load_tensors:        ROCm0 model buffer size = 15640.00 MiB
load_tensors:   CPU_Mapped model buffer size = 17032.53 MiB

Once you get to this point, you can hit the ENTER key to submit the prompt:

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

make me laugh

>

It’ll start working, then respond:

</think>

</think>

Sure! Here’s something to make you laugh:

One day, a man walked into a library and asked the librarian, “Do you have any books on Pavlov’s dogs and Schrödinger’s cats?” The librarian replied, “It rings a bell, but I’m not sure if it’s in the cat-alog or the dog-ma.”

😄

References