This article provides step-by-step instructions to collect an NVIDIA bug report from servers equipped with NVIDIA GPUs. The NVIDIA bug report is commonly required by NVIDIA Support for troubleshooting GPU driver, CUDA, NVLink, PCIe, and hardware related issues.
This procedure applies to:
Servers with NVIDIA H100 GPUs (PCIe or SXM)
Systems with NVIDIA proprietary drivers installed
Root or sudo privileges on the server
NVIDIA driver installed and loaded
Run the following command to confirm the GPUs are detected:
nvidia-smiExpected output:
H100 GPUs listed
Driver version displayed
No critical errors reported
If nvidia-smi fails, note the error and proceed with collection anyway.
The bug report script is typically installed with the NVIDIA driver.
Default location:
/usr/bin/nvidia-bug-report.sh find / -name nvidia-bug-report.sh 2>/dev/nullRun the script with elevated privileges:
sudo nvidia-bug-report.sh nvidia-bug-report.log.gzConfirm the file was created successfully:
ls -lh nvidia-bug-report.log.gzFrom the Node you have collected the bug report run the below command to copy it to Login Node:
scp nvidia-bug-report.log.gz mbuzz@10.152.241.241:/clhome/mbuzz/NVIDIA Enterprise Support Documentation
NVIDIA Driver Installation Guide