GPU vBIOS Verification Across HGX Nodes

1. Purpose

This document provides a procedure to verify GPU vBIOS versions across all HGX nodes in the cluster using an automated script.

The script connects to each node from the login node using passwordless SSH and retrieves the GPU vBIOS version using nvidia-smi.
It then compares the retrieved version with the expected target vBIOS and categorizes nodes accordingly.

This helps identify nodes that:

Have already been updated
Still have the older vBIOS
Are unreachable

2. Scope

This procedure applies to:

HGX cluster nodes
Nodes accessible via passwordless SSH
Systems running NVIDIA GPUs with nvidia-smi available

3. Environment Details

Parameter	Value
Cluster Type	HGX
Total Nodes	55
Node IP Range	10.152.241.101 – 10.152.241.155
Access Node	Login Node
Access Method	Password less SSH

4. vBIOS Version Reference

Description	Version
Current vBIOS	96.00.A5.00.01
Target vBIOS	96.00.D0.00.02

5. Prerequisites

Before executing the script ensure the following:

Passwordless SSH access is configured
```
ssh <node_ip>
```
NVIDIA drivers are installed
```
nvidia-smi
```
The command below works on the nodes:


      nvidia-smi -q | grep -i "VBIOS Version"

6. Script for Cluster-wide vBIOS Verification

Create a script named: check_vbios_versions.sh


check_vbios_version.sh
#!/bin/bash

START=101
END=155
BASE_IP="10.152.241"

CURRENT="96.00.A5.00.01"
TARGET="96.00.D0.00.02"

printf "%-18s %-20s %-15s\n" "NODE IP" "VBIOS VERSION" "STATUS"
printf "%-18s %-20s %-15s\n" "-------" "-------------" "------"

for i in $(seq $START $END)
do
    NODE="$BASE_IP.$i"

    VBIOS=$(ssh -o ConnectTimeout=3 $NODE "nvidia-smi -q | grep -i 'VBIOS Version' | head -n1 | awk -F': ' '{print \$2}'" 2>/dev/null)

    if [ -z "$VBIOS" ]; then
        STATUS="UNREACHABLE"
        VBIOS="N/A"

    elif [ "$VBIOS" == "$TARGET" ]; then
        STATUS="UPDATED"

    elif [ "$VBIOS" == "$CURRENT" ]; then
        STATUS="OLD"

    else
        STATUS="UNKNOWN"
    fi

    printf "%-18s %-20s %-15s\n" "$NODE" "$VBIOS" "$STATUS"

done
7. Execution Steps

Step 1: Create the Script


vi check_vbios_versions.sh

Paste the script content and save the file.

Step 2: Grant Execute Permission


chmod +x check_vbios_versions.sh

Step 3: Execute the Script


./check_vbios_versions.sh

8. Sample Output

NODE IP	VBIOS VERSION	STATUS
10.152.241.101	96.00.A5.00.01	OLD
10.152.241.102	96.00.D0.00.02	UPDATED
10.152.241.103	96.00.A5.00.01	OLD
10.152.241.104	96.00.D0.00.02	UPDATED
10.152.241.105	96.00.BC.00.04	UNKNOWN

9. Status Definitions

Status	Description
UPDATED	Node has the target vBIOS version
OLD	Node is running the previous vBIOS
UNKNOWN	Detected vBIOS does not match expected versions

10. Validation

To manually verify a node:


ssh <node_ip>

Then run:


nvidia-smi -q | grep -i vbios

Expected output example:


VBIOS Version: 96.00.D0.00.02

Related Articles
Using dmesg and Kernel Module Checks to Troubleshoot NVIDIA GPU Issues
Overview This article outlines how to use dmesg logs and kernel module commands to diagnose issues where the operating system fails to detect NVIDIA GPUs—even though they appear in the system's BMC (Baseboard Management Controller). When to Use This ...
What is NVIDIA NVLink?
NVLink is a high-speed, low-latency interconnect technology developed by NVIDIA, primarily used in their graphics processing units (GPUs) and other high-performance computing devices. It enables direct communication between multiple GPUs or between a ...
How to Set NVIDIA GPU Power Limit on Ubuntu using nvidia-smi
Managing the power consumption of your NVIDIA GPU on Ubuntu is a practical way to optimize energy efficiency, especially for workloads that do not require full GPU performance. This guide explains how to check and set power limits for your NVIDIA GPU ...
Execution of NVIDIA Field Diagnostic (FD) Tool and Collection of Diagnostic Logs, and troubleshooting of GPU(s) not detected
1. Objective To provide a standardized procedure for executing the NVIDIA Field Diagnostic (FD) tool, verifying GPU status, collecting the required diagnostic logs, and documenting the findings for further analysis. 2. Scope This procedure applies to ...
Nvidia Quadro Sync Installation
The NVIDIA Quadro Sync card is designed as an add-on card for NVIDIA RTX/Quadro series GPUs. The NVIDIA Quadro Sync II card fits into a free PCI Express slot within the system. Hardware Installation Steps to Install the NVIDIA Quadro Sync Card Power ...

Recent Articles
Troubleshooting NVSM Alert NV-CPU-XX – Unrecoverable CPU Internal Error
Purpose This document provides a general troubleshooting procedure for the NVIDIA System Management (NVSM) alert NV-CPU-XX, which indicates that a CPU has reported an internal error. The article outlines how to verify whether the alert represents an ...
PCIe Gen5 Switch Board Replacement
1. Objective The objective of this Method of Procedure (MOP) is to safely replace the defective PCIe Gen5 Switch Board in the Supermicro server while minimizing system downtime and ensuring all PCIe devices, including GPUs, NICs, NVMe drives, and ...
Local Boot Support for DGX H200 with BCM 11
Overview This Knowledge Base (KB) article explains the supported method for deploying and managing a DGX H200 system using Bright Cluster Manager (BCM) 11 while booting the operating system from the node's local NVMe storage. To be managed by BCM, ...
Execution of NVIDIA Field Diagnostic (FD) Tool and Collection of Diagnostic Logs, and troubleshooting of GPU(s) not detected
1. Objective To provide a standardized procedure for executing the NVIDIA Field Diagnostic (FD) tool, verifying GPU status, collecting the required diagnostic logs, and documenting the findings for further analysis. 2. Scope This procedure applies to ...
Fiber Optic Bend Radius Measurement and Compliance
1. Purpose This article outlines the procedure for verifying that installed fiber optic cables comply with minimum bend radius requirements. Proper verification prevents signal degradation, ensures optimal optical performance, and protects the ...

Popular Articles
CP Plus Camera and NVR Configuration
NVR Configuration The CP Plus Pro Series of NVRs have been meticulously designed for providing you with upgraded performance and higher recording quality in your IP video surveillance solution. The robust processor that has been inculcated in this ...
Kerberos Authentication – Overview
What is Kerberos? Kerberos is a secure authentication method used in our Active Directory (AD) environment (mbuzztech.com). It allows users to: Access multiple systems without re-entering passwords (Single Sign-On – SSO) Log in once Where We Use It ...
How to Remove and Reinstall NVIDIA Drivers on Ubuntu
This article provides step by step guide to completely remove existing NVIDIA drivers and reinstall specific version of the NVIDIA driver on the Ubuntu system Prerequisites Administrative (sudo) access to the Ubuntu system. Internet access to ...
Personal Computers and Servers - Classification and Point of Contact
We can classify the computers that MBUZZ handles based on their form-factor as below: Tower Workstations, Desktops, Gaming PCs and SFF (Small form factor) PCs fall under this category. These are computers people would use on a desk and rarely move. ...
M.2 SSD Tier List
The sequential read and write speeds, which are usually the most advertised number, are not a proper benchmark of real-world performance or the quality of an SSD. This article categorizes and tiers SSDs based on factors like the type of NAND flash, ...

GPU vBIOS Verification Across HGX Nodes

GPU vBIOS Verification Across HGX Nodes

1. Purpose

2. Scope

3. Environment Details

4. vBIOS Version Reference

5. Prerequisites

6. Script for Cluster-wide vBIOS Verification

7. Execution Steps

Step 1: Create the Script

Step 2: Grant Execute Permission

Step 3: Execute the Script

8. Sample Output

9. Status Definitions

10. Validation

Related Articles

Using dmesg and Kernel Module Checks to Troubleshoot NVIDIA GPU Issues

What is NVIDIA NVLink?

How to Set NVIDIA GPU Power Limit on Ubuntu using nvidia-smi

Execution of NVIDIA Field Diagnostic (FD) Tool and Collection of Diagnostic Logs, and troubleshooting of GPU(s) not detected

Nvidia Quadro Sync Installation

Recent Articles

Troubleshooting NVSM Alert NV-CPU-XX – Unrecoverable CPU Internal Error

PCIe Gen5 Switch Board Replacement

Local Boot Support for DGX H200 with BCM 11

Execution of NVIDIA Field Diagnostic (FD) Tool and Collection of Diagnostic Logs, and troubleshooting of GPU(s) not detected

Fiber Optic Bend Radius Measurement and Compliance

Popular Articles

CP Plus Camera and NVR Configuration

Kerberos Authentication – Overview

How to Remove and Reinstall NVIDIA Drivers on Ubuntu

Personal Computers and Servers - Classification and Point of Contact

M.2 SSD Tier List