Purpose
In large GPU/AI clusters, each compute node often has multiple InfiniBand (IB) Host Channel Adapters (HCAs). On Linux, these HCAs appear as mlx5_x
devices.
To troubleshoot connectivity or performance issues, it is essential to map:
-
mlx5_x
(Linux device name)
-
Port GUID (from ibstat
)
-
Linux netdev interface (from ip a
)
-
Link speed (from ibstat
or mlxlink
)
-
Connected switch and port (from ibnetdiscover
or subnet manager topology dump)
Step 1 – Get Linux Device Mapping
Check available Mellanox devices:
Example output:
Each corresponds to one HCA port.
Step 2 – Get Port GUIDs (ibstat
)
Run the following for each HCA:
Look for Port GUID:
This uniquely identifies the IB port in the fabric.
Step 3 – Map to Linux netdev (ip a
)
Check ip a
output for ibX
interfaces:
Sample:
The netdev (e.g., ib0
) corresponds to the mlx5_x
device.
Cross-check via:
Step 4 – Check Link Speed
Use either:
or
Look for:
Step 5 – Find Connected Switch Port
To identify which switch and port the HCA is connected to:
Run topology discovery on the fabric:
or
Output will show mapping like:
This gives the node HCA → switch port mapping.
Example Table
mlx5_x | Port GUID (ibstat ) | Linux netdev (ip a ) | Speed | Connected Switch:Port |
---|
mlx5_0 | 0x248a070300abcd01 | ib0 | 400G | SW1:12 |
mlx5_1 | 0x248a070300abcd02 | ib1 | 400G | SW2:14 |
mlx5_2 | 0x248a070300abcd03 | ib2 | 200G⚠ | SW3:18 |
mlx5_3 | 0x248a070300abcd04 | ib3 | 200G⚠ | SW4:20 |
⚠️ Indicates ports running below expected speed.
Summary
-
mlx5_x
= Linux HCA device name.
-
Use ibstat
to fetch Port GUID and speed.
-
Use ip a
or /sys/class/infiniband
to map to Linux netdev (ib0
, ib1
, etc.).
-
Use ibnetdiscover
/ ibdiagnet
to identify the connected switch and port.
-
Build a mapping table for each node to simplify troubleshooting.
Related Articles
How to Fetch Transceiver Details of All Ports in an InfiniBand QM9700 Switch
Overview On NVIDIA/Mellanox QM9700 InfiniBand switches, you may need to view the transceiver details (e.g., vendor, part number, and serial number) for connected optical modules or cables. This can be useful for troubleshooting, inventory tracking, ...
Updating Port Description on Mellanox Switches like QM9700
Objective To update the description for multiple InfiniBand (IB) ports on a Mellanox QM9700 series switch using the web interface. Prerequisites Administrative access to the Mellanox QM9700 switch's web interface. A pre-prepared list of commands with ...
How to Change Default IB Interface in NVIDIA UFM (e.g., from ib0 to ib1)
By default, NVIDIA UFM binds to the ib0 InfiniBand interface during startup. If ib0 is down or unavailable, UFM will fail to start. In such cases, you may need to change the default interface to ib1 (or any other available IB interface). This article ...
How to Get Mellanox NIC Details on an Ubuntu Server
Step 1: List All Mellanox Adapters To find all Mellanox network adapters on your system, run: lspci | grep Mellanox Example output: 04:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0) Step 2: ...
Fixing "Cable Data Invalid EEPROM" Error on NVIDIA QM9700 InfiniBand Switch
Issue On NVIDIA QM9700 InfiniBand switches, some ports may appear down and show an error such as: This issue is often caused by outdated CPLD firmware and can be resolved by updating the CPLD version on the switch. Root Cause The EEPROM error is ...