Run a GPU stress test for RTX 6000 Ada and L40S on Ubuntu Server

Run a GPU stress test for RTX 6000 Ada and L40S on Ubuntu Server

You can stress‑test RTX 6000 Ada and L40S GPUs on Ubuntu Server using GPU Burn, a CUDA‑based stress tool commonly used for datacenter validation. The process is the same for both GPUs because they use NVIDIA’s CUDA stack.

Core Steps to Run GPU Burn on Ubuntu Server

The essential workflow is

  1. Cuda and NVIDIA drivers
  2. CPU Burn installation


Install CUDA and build tools

  1. GPU Burn requires a CUDA‑capable GPU and a working CUDA toolkit.
  2. Make sure your system has:
  3. Ubuntu Server
  4. NVIDIA driver and CUDA installed
  5. Build tools ( gcc, make)

Download GPU Burn Install & Run

Install GPU Burn (Snap method)

:sudo snap install gpu-burn

Build from Source (if you prefer)

cd gpu-burn
make
./gpu_burn 60 (60 = number of seconds)

To Run GPU Burn

Stress test all GPUs for 60 seconds:

  1. gpu-burn 60
To list all GPUs:
  1. gpu-burn -l
Run on GPU index 2 for 120 seconds:
  1. gpu-burn -i 2 120
Use 50% of GPU memory for 5 minutes:
  1. gpu-burn -m 50% 300


To Save GPU Burn Results to a File

Run gpu‑burn normally.
./gpu_burn 300 (300 = number of seconds)

Save the output to a log file
./gpu_burn 300 | tee gpu-burn.logtee shows output on screen and writes it to gpu-burn.log.)

If you want to append to an existing log:
./gpu_burn 300 | tee -a gpu-burn.log

The file will be created in the directory where you run the command.
You can move it afterward:
:mv gpu-burn.log /var/log/

Moving the log from /var/log to the USB

Assuming the file is /var/log/gpu-burn.log: sudo mv /var/log/gpu-burn.log /mnt/usb/

To confirm it arrived:
ls -l /mnt/usb/

Safely unmounting the USB
Always unmount before removing the drive:
:sudo umount /mnt/usb 

• Run GPU Burn for 10–30 minutes for a quick health check.
• Run 1–2 hours for deeper thermal/power stability validation.
• Log output from to detect:
• Thermal throttling
• ECC errors
• Power limit issues
• GPU crashes or Xid errors

GPU Burn will fully load the GPU, so ensure:

•Adequate cooling
• Proper power delivery
• No other workloads running
• GPU Burn is known to push GPUs to their thermal and power limits, making it useful for diagnosing hardware issues
    • Related Articles

    • Configure Date & Time on ASUS HGX Servers via ASMB11-iKVM (BMC)

      Purpose This article explains how to configure and synchronize the Date & Time on ASUS HGX servers using the ASMB11‑iKVM (BMC) interface. This ensures that all HGX servers synchronize their time with the NTP servers configured on Head Node 1 and Head ...
    • How to Collect NVIDIA Bug Report

      Purpose This article provides step-by-step instructions to collect an NVIDIA bug report from servers equipped with NVIDIA GPUs. The NVIDIA bug report is commonly required by NVIDIA Support for troubleshooting GPU driver, CUDA, NVLink, PCIe, and ...
    • How to use official ThinkParQ script to collect detailed BeeGFS Logs

      1. Purpose This document describes how to collect a full BeeGFS diagnostic bundle using the official ThinkParQ script. Applicable for environments running: BeeGFS This procedure is typically requested by: BeeGFS / ThinkParQ Support NetApp (when ...
    • How to update Mellanox ConnectX-7 NICs Firmware on OSS Servers

      1. Purpose This article describes the procedure to upgrade the Mellanox ConnectX-7 network adapter firmware on the affected OSS servers to version 28.45.1200 in order to ensure compatibility, stability, and optimal performance. 2. Scope This ...
    • BeeGFS Metadata Check Before Large File Ingest

      1. Purpose To validate metadata and inode capacity before ingesting a very large number of small files into a BeeGFS filesystem. Step 1 - Check Metadata Inodes Run beegfs-df on any BeeGFS client This checks if the nodes have enough Inode capacity for ...