Setting Up an Nvidia GPU Node on k3s with Nvidia GPU Operator

This guide provides detailed steps on how to configure and run an Nvidia GPU node in a k3s cluster using the Nvidia GPU Operator. This setup allows you to leverage GPU resources for Kubernetes workloads.

Prerequisites

k3s Cluster: You must have a k3s cluster already running. If not, you can install it following the official k3s installation guide.
Nvidia GPU: Ensure that the node has an Nvidia GPU installed.
Nvidia Drivers: Nvidia drivers should be installed on the node where the GPU is located.

Step 1: Install the Nvidia GPU Operator

The Nvidia GPU Operator simplifies the management of GPU devices in Kubernetes environments.

Add the Nvidia GPU Operator repository to Helm:

helm repo add nvidia https://nvidia.github.io/gpu-operator
helm repo update

Install the Nvidia GPU Operator using Helm:

helm install --wait --generate-name nvidia/gpu-operator

Step 2: Configure k3s to Use the Nvidia GPU

To enable k3s to use the Nvidia GPU, some additional settings are required:

Label the GPU node: Label your GPU node so that the Nvidia GPU Operator can target it correctly.
```
kubectl label node <your-gpu-node-name> nvidia.com/gpu=true
```
Check the installation: After labeling the node, check if the Nvidia GPU Operator has deployed the components successfully.
```
kubectl get pods -n gpu-operator-resources
```

Step 3: Deploy a Test Application

To verify that everything is set up correctly, deploy a test application that utilizes the GPU.

Create a test deployment file (cuda-test.yaml):

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
spec:
  containers:
    - name: cuda-vector-add
      image: nvidia/cuda:11.0-base
      command: ["sh", "-c", "curl -s https://raw.githubusercontent.com/NVIDIA/cuda-samples/master/Samples/vectorAdd/vectorAdd.cu -o /tmp/vectorAdd.cu && nvcc /tmp/vectorAdd.cu -o /tmp/vectorAdd && /tmp/vectorAdd"]
      resources:
        limits:
          nvidia.com/gpu: 1

Deploy the application:
```
kubectl apply -f cuda-test.yaml
```
Check the application output: Verify that the application is using the GPU by checking its logs.
```
kubectl logs cuda-vector-add
```

Conclusion

You have successfully configured a k3s cluster to utilize an Nvidia GPU using the Nvidia GPU Operator. This setup allows for the efficient deployment of GPU-accelerated applications in a Kubernetes environment.