33 Containerized services

33.1 Ollama LLMs

In this section we explore containerized deployment of large language models (LLMs) using Ollama for local deployment. For learing more about LLMs, the reader is referred to the following links:

Platforms:
- Ollama
- HuggingFace
Models:
- llama3.1
- mistral-nemo
Packages:
- ollama
- haystack
- weaviate

33.1.1 Deploying in RHEL

# https://developer.nvidia.com/cuda-downloads

# Identify NVIDIA driver
nvidia-smi

# Identify RHEL version
cat /etc/redhat-release
# or rpm -q redhat-release

# Add corresponding repo/toolkit
BASE=https://developer.download.nvidia.com/compute/cuda/repos
REPO="${BASE}/rhel9/x86_64/cuda-rhel9.repo"
dnf config-manager --add-repo="${REPO}"
dnf clean all

dnf -y install cuda-toolkit-13-0
dnf install -y cuda-toolkit-13-0

dnf install -y cuda-driver-devel-12-8
dnf install -y cuda-toolkit-12-8

33.2 ParticleAnalyzer