Building llama.cpp with rocm on Fedora 41

2025-02-17 15:07:00 llama.cpp rocm fedora41

Building llama.cpp with rocm on Fedora 41

Introduction

This article explains how to build llama.cpp with rocm support.

Dependencies

Start with build deps:

sudo dnf install make gcc cmake lld clang clang-devel compiler-rt

You will then need to create a couple of symlinks:

ln -s /usr/bin/clang-offload-bundler-17 ~/bin/clang-offload-bundler
ln -s /usr/bin/llvm-objcopy-17 ~/bin/llvm-objcopy

Now you can install the rocm packages:

sudo dnf install rocminfo 'rocm-*' 'rocblas-*' 'hipblas' 'hipblas-*'

Verify rocminfo works:

rocminfo

You should be able to see your GPU:

ROCk module is loaded

...

*******
Agent 2
*******
  Name:                    gfx1100
  Uuid:                    GPU-e933ba0005f167b9
  Marketing Name:          AMD Radeon RX 7900 XTX

Verify you can see GPU usage info with rocm-smi:

$ rocm-smi
========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device  Node  IDs              Temp    Power  Partitions          SCLK     MCLK     Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Avg)  (Mem, Compute, ID)
====================================================================================================================
0       1     0x744c,   58885  37.0°C  14.0W  N/A, N/A, 0         2057Mhz  1249Mhz  0%   auto  339.0W  39%    100%
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================

Build llama.cpp

Now it’s time to build llama.cpp!

Clone the llama.cpp repo, preferably at:

"${HOME}/git/ggml-org/llama.cpp"

Run the following build script (you can configure the local path of the repo with the LLAMA_CPP_REPO_DIR):

#!/usr/bin/env bash

# slightly edited and cleaned up from M4TH1EU's bash script
# see: https://github.com/ggml-org/llama.cpp/discussions/9981

function log {
  local log_level="$1"
  local msg="$2"

  local timestamp
  timestamp="$(date +'%Y.%m.%d %H:%M:%S')"

  >&2 printf '[%s] [%s] %s\n' "${log_level}" "${timestamp}" "${msg}"
}

function log_info {
  local msg="$1"

  log INFO "${msg}"
}

function log_error {
  local msg="$1"

  log ERROR "${msg}"
}

function log_warn {
  local msg="$1"

  log WARN "${msg}"
}

function setup_build_env_vars {
  HIPCXX="$(hipconfig -l)/clang"
  HIP_PATH="$(hipconfig -R)"

  if [[ -z "${HIP_PATH}" ]]; then
    log_error "Unable to detect HIP_PATH. Ensure HIP is correctly installed."
    return 1
  fi

  AMDGPU_TARGET="$(rocminfo | grep gfx | head -1 | awk '{print $2}')"
  if [[ -z "${AMDGPU_TARGET}" ]]; then
    log_error "Unable to detect AMDGPU target using rocminfo."
    return 1
  fi

  HIP_DEVICE_LIB_PATH="$(find "${HIP_PATH}" -name "oclc_abi_version_400.bc" -exec dirname {} \; | head -n 1)"
  if [[ -z "${HIP_DEVICE_LIB_PATH}" ]]; then
    log_error "Unable to find oclc_abi_version_400.bc under HIP_PATH."
    return 1
  fi

  export LLAMA_HIPBLAS=1
  export HIPCXX
  export HIP_PATH
  export HIP_VISIBLE_DEVICES
  export HIP_DEVICE_LIB_PATH
  export DEVICE_LIB_PATH=$HIP_DEVICE_LIB_PATH
  export ROCM_PATH=/usr/
}

function try_create_symlink {
  local src_path="$1"
  local dst_path="$2"

  if [[ -f "${dst_path}" ]] || [[ -L "${symlink}" ]]; then
    log_warn "${dst_path} already a symlink"
    return
  fi

  ln -s "${src_path}" "${dst_path}"
}

function main {
  set -e

  local llama_cpp_repo_dir="${LLAMA_CPP_REPO_DIR:-"${HOME}/git/ggml-org/llama.cpp"}"
  local llama_cpp_git_url="https://github.com/ggml-org/llama.cpp.git"

  if ! git -C "${llama_cpp_repo_dir}" rev-parse --is-inside-worktree; then
    if [[ -d "${llama_cpp_repo_dir}" ]]; then
      log_error "${llama_cpp_repo_dir} is not a llama.cpp repo. Remove this directory."
      return 1
    fi

    git clone "${llama_cpp_git_url}" "${llama_cpp_repo_dir}"
  fi

  log_info "Cloned ${llama_cpp_git_url}; proceeding"
  pushd "${LLAMA_CPP_REPO_DIR}"

  log_info "Installing build dependencies"
  sudo dnf install make gcc cmake lld clang clang-devel compiler-rt

  local nonroot_bin="${NONROOT_BIN:-"${HOME}/bin"}"
  log_info "Creating symlinks for build dependencies in non-root bin directory"
  try_create_symlink /usr/bin/clang-offload-bundler-17 "${nonroot_bin}/clang-offload-bundler"
  try_create_symlink /usr/bin/llvm-objcopy-17 "${nonroot_bin}/llvm-objcopy"

  log_info "Installing rocm packages"
  sudo dnf install rocminfo 'rocm-*' 'rocblas-*' 'hipblas' 'hipblas-*'

  log_info "Verifying frequently used rocm CLI commands"
  rocminfo
  rocm-smi

  local max_threads="${MAX_THREADS:-8}"

  setup_build_env_vars

  rm -rf "${llama_cpp_repo_dir}/build/"*
  cmake -S . -B build \
    -DGGML_HIP=ON \
    -DAMDGPU_TARGETS="${AMDGPU_TARGET}" \
    -DCMAKE_BUILD_TYPE=Release

  cmake --build build --config Release -- -j "${max_threads}"

  log_info "Build complete! Enjoy llama.cpp with rocm!"
}

main

Download the GGUF model from huggingface.co

Create an account on huggingface.

Install the huggingface-cli (preferably in a virtual environment):

$ python -m venv /tmp/delete-me-venv
$ . /tmp/delete-me-venv/bin/activate
(/tmp/delete-me-venv) $ python -m pip install 'huggingface_hub[cli]'

Create an API token on your huggingface account and login (paste your token when prompted):

huggingface-cli login

Download the GGUF model, e.g.:

# you will likely want to run with nohup because it will take a while
nohup huggingface-cli download unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF --local-dir ~/git/hf.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF 2>&1 > "${HOME}/huggingface-cli-download.out" &

Come back and check if the download is still running:

$ ps -eaf | grep 'huggingface-cli download'
cjvirtu+   33901       1 11 14:53 ?        00:14:26 /home/cjvirtucio/git/ggml-org/llama.cpp/.venv/bin/python /home/cjvirtucio/git/ggml-org/llama.cpp/.venv/bin/huggingface-cli download unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF --local-dir /home/cjvirtucio/git/hf.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF

If you don’t see the process for it anymore, that means it is probably done. You can check its output at "${HOME}/huggingface-cli-download.out".

Run llama-cli

You are now ready to test llama.cpp and the model:

./build/bin/llama-cli --model "${HOME}/git/hf.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF/DeepSeek-R1-Distill-Llama-70B-Q3_K_M.gguf" --main-gpu 0 --gpu-layers 40 --prompt "make me laugh"

Some important information you should see in its output:

ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32

...

load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 40 repeating layers to GPU
load_tensors: offloaded 40/81 layers to GPU
load_tensors:        ROCm0 model buffer size = 15640.00 MiB
load_tensors:   CPU_Mapped model buffer size = 17032.53 MiB

Once you get to this point, you can hit the ENTER key to submit the prompt:

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

make me laugh

>

It’ll start working, then respond:

</think>

</think>

Sure! Here’s something to make you laugh:

One day, a man walked into a library and asked the librarian, “Do you have any books on Pavlov’s dogs and Schrödinger’s cats?” The librarian replied, “It rings a bell, but I’m not sure if it’s in the cat-alog or the dog-ma.”

😄

Building llama.cpp with rocm on Fedora 41

Building llama.cpp with rocm on Fedora 41

Introduction

Dependencies

Build llama.cpp

Download the GGUF model from huggingface.co

Run llama-cli

References