How to update Mellanox ConnectX-7 NICs Firmware on OSS Servers

1. Purpose

This article describes the procedure to upgrade the Mellanox ConnectX-7 network adapter firmware on the affected OSS servers to version 28.45.1200 in order to ensure compatibility, stability, and optimal performance.

2. Scope

This procedure applies to all OSS servers equipped with Mellanox ConnectX-7 network interface cards that currently run an earlier firmware version.

3. Prerequisites

Firmware package: Obtain the firmware image file (e.g., fw-ConnectX7-rel-28_45_1200-MCX755106AS-xxx.bin) from NVIDIA/Mellanox official site or internal repository.
Backup current firmware and configuration.
Maintenance window approved.
Ensure server console or iDRAC/iLO access is available.
Network impact: Firmware update requires NIC reset; plan downtime.
Root/sudo privileges.

4. Procedure

Step 1: Verify Cluster State

`pcs status`
If it is normal the state would show online
You can check the current firmware version by using command:
`ibstat`

Step 2: Back Up Existing Firmware

`mstflint -d /dev/mst/mt4125_pciconf0 query > /root/mellanox_fw_backup.txt`

Step 3: Load the Mellanox Firmware Update Tool

Make sure the firmware file is accessible, e.g.:
`ls /root/fw-ConnectX7-rel-28_45_1200-MCX755106AS-xxx.bin`

Step 4: Change Cluster status to Standby

For the node targeted for upgrade, Place the node into standby mode to drain all BeeGFS services:
`pcs node standby <HOSTNAME>`
Eg: pcs node standby T10PHGXSTOSSER01
Note: This might take a few minutes

Step 5: Verify Status

Verify that the node’s Services have drained by running:
`pcs status`
After the status is on standby move towards the next step

Step 6: Updating Firmware

We can update the firmware using 'mlxfwmanager' utility
`mlxgwmanager -i </path/tp/firmware_file>.bin -u`
Eg: mlxgwmanager -i /root/fw-ConnectX7-rel-28_45_1200-SN37B06010_SN37B06011_AX-UEFI-14.38.16-FlexBoot-3.7.500.signed.bin -u

Step 7: Disable force_ib_speed

Before rebooting you have to disable and stop force_ib_speed by running commands:
`systemctl disable force_ib_speed systemctl stop forcr_ib_speed`

Step 8: Reboot the Server

`reboot`
Note: This might take some time (Approximate 10 to 15 minutes)

Step 9: Verify Storage Connections

This is to verify that the storage connections automatically reconnected
`ststemctl status eseries_nvme_ib nvme list-subsys`
Note: If you do not see the number of expected connections for your cluster, restart the 'eseries_nvme_ib' service and wait for all connections to be established.

Step 10: Make node Unstandby

To bring the node out of standby state you can the command:
`pcs node unstandby <HOSTNAME>`
Check cluster status
`pcs status`

Step 11: Verify FW Version

`ib_stat`
If it does not work, try after starting the pacemaker:
`systemctl status pacemaker systemctl start pacemaker (if not started)`
Again, try step 10

Step 12: Resource relocation

To relocate all the beegfs serices back to their preffered node, run:
`pcs resource relocate run`
Then check cluster status: pcs status
Repeat the above steps for each node until all nodes in the cluster are updated

Related Articles
SOS Report collection from NetApp OSS Servers
Purpose This article details the process of generating and collecting SOS Reports from NetApp OSS Servers. These reports are often required by the NetApp Support Team for detailed analysis and troubleshooting. Scope Applicable to: NetApp OSS Servers ...
How to Collect InfiniBand Transceiver Temperature (Non-Root Method)
1. Purpose This document explains how to collect InfiniBand (IB) transceiver (QSFP module) temperature from servers using the Linux sysfs hwmon interface without requiring root privileges. This method applies to systems using Mellanox / NVIDIA ...
Configure Date & Time on ASUS HGX Servers via ASMB11-iKVM (BMC)
Purpose This article explains how to configure and synchronize the Date & Time on ASUS HGX servers using the ASMB11‑iKVM (BMC) interface. This ensures that all HGX servers synchronize their time with the NTP servers configured on Head Node 1 and Head ...
How to collect diagnostic logs using the NetApp Log Collection Script
1. Purpose This document describes the procedure to collect diagnostic logs using the NetApp Log Collection Script in environments running: BeeGFS NetApp E-Series backend storage HA cluster using Pacemaker and Corosync This script is typically ...
How to use official ThinkParQ script to collect detailed BeeGFS Logs
1. Purpose This document describes how to collect a full BeeGFS diagnostic bundle using the official ThinkParQ script. Applicable for environments running: BeeGFS This procedure is typically requested by: BeeGFS / ThinkParQ Support NetApp (when ...

How to update Mellanox ConnectX-7 NICs Firmware on OSS Servers

How to update Mellanox ConnectX-7 NICs Firmware on OSS Servers

1. Purpose

2. Scope

3. Prerequisites

4. Procedure

Step 1: Verify Cluster State

pcs statusIf it is normal the state would show onlineYou can check the current firmware version by using command:ibstat

Step 2: Back Up Existing Firmware

mstflint -d /dev/mst/mt4125_pciconf0 query > /root/mellanox_fw_backup.txt

Step 3: Load the Mellanox Firmware Update Tool

Make sure the firmware file is accessible, e.g.: ls /root/fw-ConnectX7-rel-28_45_1200-MCX755106AS-xxx.bin

Step 4: Change Cluster status to Standby

For the node targeted for upgrade, Place the node into standby mode to drain all BeeGFS services: pcs node standby <HOSTNAME>Eg: pcs node standby T10PHGXSTOSSER01Note: This might take a few minutes

Step 5: Verify Status

Verify that the node’s Services have drained by running: pcs statusAfter the status is on standby move towards the next step

Step 6: Updating Firmware

We can update the firmware using 'mlxfwmanager' utility mlxgwmanager -i </path/tp/firmware_file>.bin -uEg: mlxgwmanager -i /root/fw-ConnectX7-rel-28_45_1200-SN37B06010_SN37B06011_AX-UEFI-14.38.16-FlexBoot-3.7.500.signed.bin -u

Step 7: Disable force_ib_speed

Before rebooting you have to disable and stop force_ib_speed by running commands: systemctl disable force_ib_speed systemctl stop forcr_ib_speed

Step 8: Reboot the Server

rebootNote: This might take some time (Approximate 10 to 15 minutes)

Step 9: Verify Storage Connections

This is to verify that the storage connections automatically reconnected ststemctl status eseries_nvme_ib nvme list-subsysNote: If you do not see the number of expected connections for your cluster, restart the 'eseries_nvme_ib' service and wait for all connections to be established.

Step 10: Make node Unstandby

To bring the node out of standby state you can the command: pcs node unstandby <HOSTNAME>Check cluster status pcs status

Step 11: Verify FW Version

ib_statIf it does not work, try after starting the pacemaker: systemctl status pacemaker systemctl start pacemaker (if not started)Again, try step 10

Step 12: Resource relocation

To relocate all the beegfs serices back to their preffered node, run: pcs resource relocate runThen check cluster status: pcs status Repeat the above steps for each node until all nodes in the cluster are updated

Related Articles

SOS Report collection from NetApp OSS Servers

How to Collect InfiniBand Transceiver Temperature (Non-Root Method)

Configure Date & Time on ASUS HGX Servers via ASMB11-iKVM (BMC)

How to collect diagnostic logs using the NetApp Log Collection Script

How to use official ThinkParQ script to collect detailed BeeGFS Logs

`pcs status`
If it is normal the state would show online
You can check the current firmware version by using command:
`ibstat`

`mstflint -d /dev/mst/mt4125_pciconf0 query > /root/mellanox_fw_backup.txt`

Make sure the firmware file is accessible, e.g.:
`ls /root/fw-ConnectX7-rel-28_45_1200-MCX755106AS-xxx.bin`

For the node targeted for upgrade, Place the node into standby mode to drain all BeeGFS services:
`pcs node standby <HOSTNAME>`
Eg: pcs node standby T10PHGXSTOSSER01
Note: This might take a few minutes

Verify that the node’s Services have drained by running:
`pcs status`
After the status is on standby move towards the next step

We can update the firmware using 'mlxfwmanager' utility
`mlxgwmanager -i </path/tp/firmware_file>.bin -u`
Eg: mlxgwmanager -i /root/fw-ConnectX7-rel-28_45_1200-SN37B06010_SN37B06011_AX-UEFI-14.38.16-FlexBoot-3.7.500.signed.bin -u

Before rebooting you have to disable and stop force_ib_speed by running commands:
`systemctl disable force_ib_speed systemctl stop forcr_ib_speed`

`reboot`
Note: This might take some time (Approximate 10 to 15 minutes)

This is to verify that the storage connections automatically reconnected
`ststemctl status eseries_nvme_ib nvme list-subsys`
Note: If you do not see the number of expected connections for your cluster, restart the 'eseries_nvme_ib' service and wait for all connections to be established.

To bring the node out of standby state you can the command:
`pcs node unstandby <HOSTNAME>`
Check cluster status
`pcs status`

`ib_stat`
If it does not work, try after starting the pacemaker:
`systemctl status pacemaker systemctl start pacemaker (if not started)`
Again, try step 10

To relocate all the beegfs serices back to their preffered node, run:
`pcs resource relocate run`
Then check cluster status: pcs status
Repeat the above steps for each node until all nodes in the cluster are updated