You can stress‑test RTX 6000 Ada and L40S GPUs on Ubuntu Server using GPU Burn, a CUDA‑based stress tool commonly used for datacenter validation. The process is the same for both GPUs because they use NVIDIA’s CUDA stack.
Core Steps to Run GPU Burn on Ubuntu Server
The essential workflow is
Cuda and NVIDIA drivers
CPU Burn installation
- GPU Burn requires a CUDA‑capable GPU and a working CUDA toolkit.
-
Make sure your system has:
- Ubuntu Server
- NVIDIA driver and CUDA installed
- Build tools (
gcc, make)
Download GPU Burn Install & Run
Install GPU Burn (Snap method)
:sudo snap install gpu-burn
Build from Source (if you prefer)
cd gpu-burn
make
./gpu_burn 60 (60 = number of seconds)
To Run GPU Burn
Stress test all
GPUs for 60 seconds:
- gpu-burn 60
To list all GPUs:
- gpu-burn -l
Run on GPU
index 2 for 120 seconds:
- gpu-burn -i 2 120
Use 50% of GPU
memory for 5 minutes:
- gpu-burn -m 50% 300
To Save GPU Burn
Results to a File
Run gpu‑burn normally.
./gpu_burn 300 (300 = number of seconds)
Save the output to a log file
./gpu_burn 300 | tee gpu-burn.log ( tee shows output on screen and writes it to gpu-burn.log.)
If you
want to append to an existing log:
./gpu_burn 300 | tee -a gpu-burn.log
The file will be created in the directory where you run the
command.
You can move it afterward:
:mv gpu-burn.log /var/log/
Moving the log from
/var/log to the USB
Assuming the file is /var/log/gpu-burn.log: sudo mv /var/log/gpu-burn.log /mnt/usb/
To confirm it arrived:
ls -l /mnt/usb/
Safely unmounting the
USB
Always unmount before removing the drive:
:sudo umount /mnt/usb
Recommended Burn‑Test Procedure for Servers
• Run GPU Burn for 10–30 minutes for a quick health check.
• Run 1–2 hours for deeper thermal/power stability validation.
• Log output from to detect:
• Thermal throttling
• ECC errors
• Power limit issues
• GPU crashes or Xid errors
GPU Burn will fully load the GPU, so ensure:
•Adequate cooling
• Proper power delivery
• No other workloads running
• GPU Burn is known to push GPUs to their thermal and power limits, making it useful for diagnosing hardware issues