Quantcast
Channel: Artificial Intelligence News, Analysis and Resources - The New Stack
Viewing all articles
Browse latest Browse all 534

Tutorial: Set Up a Cloud Native GPU Testbed With Nvkind Kubernetes

$
0
0

DevOps engineers and developers are familiar with kind, a Kubernetes development environment built on Docker. In kind, the control plane and nodes of the cluster operate as individual containers. While kind is easy to use, accessing GPUs from the cluster can be challenging.

This tutorial walks you through installing nvkind from Nvidia, a GPU-aware kind cluster for running cloud native AI workloads in a development or test environment.

My environment consists of a host machine powered by a single Nvidia H100 GPU. We aim to deploy a pod within the nvkind cluster with access to the same GPU.

Prerequisites

Please ensure that Docker is correctly configured with Nvidia runtime as the default. Then you can access the GPU from a Docker container.

Compile and Install the Nvkind Binary

Clone the GitHub repository of nvkind and build the binary.

git clone https://github.com/NVIDIA/nvkind.git
cd nvkind
make
sudo cp ./nvkind /usr/local/bin/


Execute the nvkind binary to check that the build has been successfully completed.

Define a Template and Create the Cluster

Nvkind accepts a configuration file that gives fine-grained control on exposing GPUs to worker nodes. Since we only have one GPU, we will expose it to the worker node.

Create a YAML file called nvkind-cluster.yaml with the below content:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
  extraMounts:
    - hostPath: /dev/null
      containerPath: /var/run/nvidia-container-devices/all


Finally, we will create a cluster based on the above template.

nvkind cluster create --config-template=nvkind-cluster.yaml


You can now access the cluster with the kubectl CLI.

Install the Nvidia GPU Operator

With the cluster in place, we will install the GPU operator to access the underlying AI accelerator.

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install --wait --generate-name \
  -n gpu-operator --create-namespace \
  nvidia/gpu-operator --set driver.enabled=false


Ensure all the pods in the gpu-operator namespace are healthy.

Run a Workload To Test GPU Access

Let’s create a test pod to verify GPU access.


<img class="aligncenter size-large wp-image-22779667" src="https://cdn.thenewstack.io/media/2025/03/4a006a11-nvkind-6-1024x291.png" alt="" width="1024" height="291" />


We have successfully installed, configured and tested the nvkind cluster on an H100 GPU.

The post Tutorial: Set Up a Cloud Native GPU Testbed With Nvkind Kubernetes appeared first on The New Stack.

This tutorial walks you through installing nvKind from Nvidia, a GPU-aware Kind cluster for running cloud native AI workloads in a development or test environment.

Viewing all articles
Browse latest Browse all 534

Trending Articles