This document provides a procedure to verify GPU vBIOS versions across all HGX nodes in the cluster using an automated script.
The script connects to each node from the login node using passwordless SSH and retrieves the GPU vBIOS version using nvidia-smi.
It then compares the retrieved version with the expected target vBIOS and categorizes nodes accordingly.
This helps identify nodes that:
Have already been updated
Still have the older vBIOS
Are unreachable
This procedure applies to:
HGX cluster nodes
Nodes accessible via passwordless SSH
Systems running NVIDIA GPUs with nvidia-smi available
Parameter | Value |
Cluster Type | HGX |
Total Nodes | 55 |
Node IP Range | 10.152.241.101 – 10.152.241.155 |
Access Node | Login Node |
Access Method | Password less SSH |
Description | Version |
Current vBIOS | 96.00.A5.00.01 |
Target vBIOS | 96.00.D0.00.02 |
Before executing the script ensure the following:
Passwordless SSH access is configured
ssh <node_ip>
NVIDIA drivers are installed
nvidia-smi
The command below works on the nodes:
nvidia-smi -q | grep -i "VBIOS Version"
Create a script named: check_vbios_versions.sh
check_vbios_version.sh#!/bin/bashSTART=101END=155BASE_IP="10.152.241"CURRENT="96.00.A5.00.01"TARGET="96.00.D0.00.02"printf "%-18s %-20s %-15s\n" "NODE IP" "VBIOS VERSION" "STATUS"printf "%-18s %-20s %-15s\n" "-------" "-------------" "------"for i in $(seq $START $END)doNODE="$BASE_IP.$i"VBIOS=$(ssh -o ConnectTimeout=3 $NODE "nvidia-smi -q | grep -i 'VBIOS Version' | head -n1 | awk -F': ' '{print \$2}'" 2>/dev/null)if [ -z "$VBIOS" ]; thenSTATUS="UNREACHABLE"VBIOS="N/A"elif [ "$VBIOS" == "$TARGET" ]; thenSTATUS="UPDATED"elif [ "$VBIOS" == "$CURRENT" ]; thenSTATUS="OLD"elseSTATUS="UNKNOWN"fiprintf "%-18s %-20s %-15s\n" "$NODE" "$VBIOS" "$STATUS"done7. Execution Steps
vi check_vbios_versions.sh
Paste the script content and save the file.
chmod +x check_vbios_versions.sh
./check_vbios_versions.sh
NODE IP | VBIOS VERSION | STATUS |
10.152.241.101 | 96.00.A5.00.01 | OLD |
10.152.241.102 | 96.00.D0.00.02 | UPDATED |
10.152.241.103 | 96.00.A5.00.01 | OLD |
10.152.241.104 | 96.00.D0.00.02 | UPDATED |
10.152.241.105 | 96.00.BC.00.04 | UNKNOWN |
| Status | Description |
|---|---|
| UPDATED | Node has the target vBIOS version |
| OLD | Node is running the previous vBIOS |
| UNKNOWN | Detected vBIOS does not match expected versions |
To manually verify a node:
ssh <node_ip>
Then run:
nvidia-smi -q | grep -i vbios
Expected output example:
VBIOS Version: 96.00.D0.00.02