Home › Forums › FABRIC General Questions and Discussion › Request for Host Cold Power Cycle to Apply BlueField-3 DOCA SNAP Firmware Config
Tagged: Bluefield
- This topic has 3 replies, 2 voices, and was last updated 1 day, 18 hours ago by
yoursunny.
-
AuthorPosts
-
May 21, 2026 at 12:28 am #9796
Hello,
I am trying to enable DOCA SNAP on a BlueField-3 DPU, but I’m blocked on applying a required firmware configuration change.The specific firmware setting I need is:
PCI_SWITCH_EMULATION_ENABLE=1Right now it appears to be set to:
PCI_SWITCH_EMULATION_ENABLE=0According to NVIDIA’s BlueField-3 SNAP firmware configuration documentation, SNAP requires firmware configuration before use. In the same documentation, NVIDIA lists
PCI_SWITCH_EMULATION_ENABLEas the parameter that enables the PCI switch for emulated PFs, with valid values0/1. NVIDIA also states that after the configuration is complete, the host must be power-cycled for the changes to take effect.This setting is required for my use case because I want to enable DOCA SNAP / SNAP-4 on BlueField-3. NVIDIA’s DOCA SNAP-4 Service Guide describes SNAP-4 as running on BlueField-3 and presenting NVMe or virtio-blk storage to the host as a local PCIe block device, while the actual storage logic is handled by the SNAP framework on the DPU.
ref: https://docs.nvidia.com/doca/sdk/DOCA-SNAP-4-Service-Guide/index.html
My understanding is that if
PCI_SWITCH_EMULATION_ENABLEremains at0, the PCI switch emulation needed for SNAP emulated PCIe functions will not be enabled, so the host will not see the expected SNAP-emulated devices.What I tried so far:
mlxfwreset -d 03:00.0 -y -l 3 --sync 1 rref: https://docs.nvidia.com/doca/sdk/nvidia-bluefield-reset-and-reboot-procedures/index.html
That did not make the firmware configuration change take effect.
A FABRIC operator also performed a DPU power reset, but that also did not appear to apply the change.
So my current request is: Can someone perform a true cold reboot / full host power cycle of the physical host containing this BlueField-3 DPU?
sudo mst start sudo mlxconfig -d /dev/mst/mt41692_pciconf0 -e query | grep PCI_SWITCH_EMULATION_ENABLEExpected result:
PCI_SWITCH_EMULATION_ENABLE=1Slice: 92c182a7-c16c-4331-906f-03a6bbff30c4 (CEPH_DOCA_CLUSTER, node1, HAWI)
Thank you,
TanayMay 21, 2026 at 8:36 am #9797There are several features that require hard host reboot:
- Bluefield PCI switch emulation (mentioned by OP)
- ConnectX-6 GTP-U parser (encountered by myself before I moved to Cloudlab)
- FPGA flashing (well documented)
However, an immediate reboot would affect other nodes running on the same host.
I think FABRIC could solve this in several ways:
- The slice configuration declares requested settings, and then the control framework chooses a host that already has a card in that setting.
- If none of the available cards has this setting, but a host is idle, the control framework can apply the setting and reboot the host during the provisioning process.
- If a card exists with this setting but it’s occupied, the experiment is scheduled at a future date on this card, and the current slice on this card is blocked from extending.
- If none of the cards have this setting so that a reboot is unavoidable, the experiment is scheduled at a future date, and all existing slices are blocked from extending, so that the control framework has an opportunity to apply the setting and reboot the host.
This system would be fully automated and does not require recurring staff involvement.
May 21, 2026 at 2:14 pm #9799Hey @yoursunny, does CloudLab provision an easier way to perform cold reboots? Just curious.
Also, I completely align with your idea, that should cover all the cases!
Thanks,
TanayMay 21, 2026 at 5:09 pm #9800does CloudLab provision an easier way to perform cold reboots?
Both CloudLab and Chameleon Cloud offer physical machines, so that you can initiate the hard reboot from within the operating system. It’s also possible to perform a power cycle, but this isn’t usually necessary to apply firmware parameter changes.
The drawback is tighter experiment durations because there are fewer resources on these platforms.
-
AuthorPosts
- You must be logged in to reply to this topic.