How to Collect NVIDIA Bug Report

How to Collect NVIDIA Bug Report

Purpose

This article provides step-by-step instructions to collect an NVIDIA bug report from servers equipped with NVIDIA GPUs. The NVIDIA bug report is commonly required by NVIDIA Support for troubleshooting GPU driver, CUDA, NVLink, PCIe, and hardware related issues.

Scope

This procedure applies to:

  • Servers with NVIDIA H100 GPUs (PCIe or SXM)

  • Systems with NVIDIA proprietary drivers installed

Prerequisites

  • Root or sudo privileges on the server

  • NVIDIA driver installed and loaded


Procedure

Step 1: Verify NVIDIA Driver and GPU Visibility

Run the following command to confirm the GPUs are detected:

        nvidia-smi

Expected output:

  • H100 GPUs listed

  • Driver version displayed

  • No critical errors reported


If nvidia-smi fails, note the error and proceed with collection anyway.


Step 2: Locate the NVIDIA Bug Report Script

The bug report script is typically installed with the NVIDIA driver.

Default location:

         /usr/bin/nvidia-bug-report.sh


If not found, locate it using:
        find / -name nvidia-bug-report.sh 2>/dev/null

Step 3: Collect the NVIDIA Bug Report

Run the script with elevated privileges:

        sudo nvidia-bug-report.sh


This will generate a compressed log file in the current directory, typically named:
        nvidia-bug-report.log.gz

Step 4: Verify the Output File

Confirm the file was created successfully:

        ls -lh nvidia-bug-report.log.gz

Step 5: Downloading the file

From the Node you have collected the bug report run the below command to copy it to Login Node:

        scp nvidia-bug-report.log.gz mbuzz@10.152.241.241:/clhome/mbuzz/


After copying the file, download it from login node using an FTP tool like MobaXterm


Download it:



References

  • NVIDIA Enterprise Support Documentation

  • NVIDIA Driver Installation Guide


    • Related Articles

    • How to Collect Logs from NVIDIA Cumulus Linux Switch

      Purpose This article describes how to collect diagnostic logs from a switch running NVIDIA Cumulus Linux. These logs are typically required by NVIDIA Networking Support for troubleshooting switch-level issues such as port flaps, routing problems, ...
    • How to Collect Logs from NVIDIA UFM (UFM System Dump)

      Purpose This article explains how to collect diagnostic logs from NVIDIA Unified Fabric Manager (UFM) using the web-based GUI. The UFM system dump is typically required by NVIDIA Support for troubleshooting fabric health, host visibility, alerts, and ...
    • Collect Logs from NVIDIA QM9700 InfiniBand Switch (Sysdump) - Web GUI

      Purpose This article describes the procedure to collect diagnostic logs (sysdump) from an NVIDIA QM9700 InfiniBand switch. The sysdump file is typically requested by NVIDIA Networking Support for troubleshooting fabric, port, firmware, or stability ...
    • SOS Report collection from NetApp OSS Servers

      Purpose This article details the process of generating and collecting SOS Reports from NetApp OSS Servers. These reports are often required by the NetApp Support Team for detailed analysis and troubleshooting. Scope Applicable to: NetApp OSS Servers ...
    • How to trigger a Support Bundle on NetApp Appliance

      Purpose This article provides the steps to collect and trigger a full support bundle on a NetApp Storage Appliance. Support Bundles are used to collect diagnostics data for troubleshooting performance, connectivity, or I/O issues. Steps Login to the ...