Home › Forums › FABRIC General Questions and Discussion › BlueField-3 host-DPU communication issue on FABRIC
Tagged: BF3, Bluefield-3, DPU
- This topic has 2 replies, 2 voices, and was last updated 7 hours, 20 minutes ago by
Mert Cevik.
-
AuthorPosts
-
March 30, 2026 at 1:01 am #9612
We are working on a project that offloads UPF to BlueField-3 DPU, and our design requires host-to-DPU communication using the NVIDIA DOCA communication channel API. In the FABRIC environment, this API does not work for us even with the default DOCA 2.9 SDK provided by FABRIC. We are running sample application provided by Nvidia, so no issue with our application logic. [https://docs.nvidia.com/doca/sdk/doca-secure-channel-application-guide/index.html]
And the core issue is not basically the communication API, it’s probably related to driver or firmware synchronization.
Tested Image: dpu_ubuntu_24
Tested Site: FIU, DALL
On Host:
ubuntu@node2:~$ lspci | grep mellanox -i 07:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 08:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 09:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 0a:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) 0b:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) 0c:00.0 DMA controller: Mellanox Technologies MT43244 BlueField-3 SoC Management Interface (rev 01) ubuntu@node2: sudo /opt/mellanox/doca/tools/doca_caps --list-devs No DOCA device was foundThis shouldn’t happen. In a healthy setup, it would list doca devices and in this case one of them definitely being 0a:00.0. We tried rebooting the host as well. The issue persists.
In our setup, the PCIe device is visible from the host, but DOCA on the host does not enumerate usable DOCA devices correctly, and the communication-channel application fails to initialize.
We tried updating the DOCA SDK to 3.3 and relevant BFB image as well. After BFB install, we rebooted the host, could see the doca devices then on Host. However, the moment we try to start the DOCA communication channel using the PCIe, it doesn’t work. We observed host-side DevX object creation failures and connection-aborted errors while trying to run the DOCA secure channel client. We tried with every possible PCIe combination, so no, wrong PCIe isn’t the reason either.
ubuntu@node2:/tmp/build/secure_channel$ sudo ./doca_secure_channel -s 256 -n 10 -p 0000:0a:00.0 [2026-03-28 23:58:02:734859][3853104960][DOCA][INF][CORE][doca_log.cpp:900] DOCA version 3.3.0109 [2026-03-28 23:58:03:068693][3853104960][DOCA][ERR][CORE][linux_devx_obj.cpp:115] Failed to create devx object with syndrome=0xe5300 [2026-03-28 23:58:03:069245][3853104960][DOCA][ERR][CORE][doca_dev.cpp:2699] Failed to create devx object: failed to allocate devx object wrapper with exception: [2026-03-28 23:58:03:069322][3853104960][DOCA][ERR][CORE][doca_dev.cpp:2699] DOCA exception [DOCA_ERROR_DRIVER] with message Failed to create devx object [2026-03-28 23:58:03:069349][3853104960][DOCA][ERR][COMCH][cc_devx_2.cpp:265] Failed to create channel connection object with error DOCA_ERROR_DRIVER [2026-03-28 23:58:03:069368][3853104960][DOCA][ERR][COMCH][qp_channel_2.cpp:996] client registration failed for send side [2026-03-28 23:58:03:069392][3853104960][DOCA][ERR][COMCH][doca_comm_channel_2.cpp:853] client registration failed for doca_comm_channel_2_ep_client_connect() [2026-03-28 23:58:03:069410][3853104960][DOCA][ERR][COMCH][doca_comch_pe.cpp:413] failed to connect on client with error = DOCA_ERROR_CONNECTION_ABORTED [2026-03-28 23:58:03:074705][3853104960][DOCA][ERR][CORE][doca_pe.cpp:1119] Progress engine 0x60bd204c1380: Failed to start context=0x60bd204c4bc0. err=DOCA_ERROR_CONNECTION_ABORTED [2026-03-28 23:58:03:074732][3853104960][DOCA][ERR][COMCH_UTILS][comch_utils.c:535][comch_utils_fast_path_init] Failed to start comch client context: Connection abortedBecause the expected host-side DOCA-visible interface/device path is missing or nonfunctional, we suspect there may be a low-level firmware, driver binding, or host-exposure issue in the current FABRIC setup. At this point, it seems possible that recovery may require a hard reboot of the physical node, but we only have access to the VM and not to the underlying physical machine.
Could you please let us know:
- Whether there is any way for us to request access to a physical bare-metal node for BlueField/DPU testing.
- Whether the current FABRIC VM setup fully supports host-side BlueField DOCA communication-channel use cases.
Thanks in advance. Any help or guidance on the way forward would be greatly appreciated. We would be very grateful for any assistance, as we have not been able to resolve this issue so far.
-
This topic was modified 1 day, 6 hours ago by
Plabon Dutta.
-
This topic was modified 1 day, 6 hours ago by
Plabon Dutta.
March 30, 2026 at 11:14 am #9616Hello Plabon,
Whether there is any way for us to request access to a physical bare-metal node for BlueField/DPU testing.
FABRIC Testbed has only VM resources and does not provide physical bare-metal nodes for BlueField/DPU testing.
Whether the current FABRIC VM setup fully supports host-side BlueField DOCA communication-channel use cases.
We don’t have a specific statement that FABRIC Testbed fully supports host-side BlueField DOCA communication-channel use cases. However, as you already know, on FABRIC Testbed, you can create VMs and the network cards are attached via PCI passthrough.
For potential “low-level firmware, driver binding, or host-exposure issue in the current FABRIC setup”, we can work with you to identify the problems. We need some information from your “healthy setup” with respect to the DOCA (host and DPU) versions, firmware version from the DPU (specifically the output from
flint -d MST_DEVICE q. It will be also helpful if you share the outputs for the following items:- lspci (both host and DPU)
- doca_caps –list-devs (both host and DPU)
- doca_caps –list-rep-devs (from the DPU)
- mlxconfig -d <MST_DEVICE> q INTERNAL_CPU_OFFLOAD_ENGINE (both host and DPU)
-
This reply was modified 20 hours, 7 minutes ago by
Mert Cevik.
-
This reply was modified 20 hours, 4 minutes ago by
Mert Cevik.
March 31, 2026 at 12:01 am #9621Hello Plabon,
If you’re considering sharing the output from your system with us, can you also include the OS version?
On my test setup on the FABRIC Testbed, Ubuntu 24 consistently gave me an error with loading mlx5_ib module, however on a Ubuntu 22 “host” setup, it worked well and I can see the DOCA devices. It will be helpful for us to get some information from a reference system, if you share the information from your side.
-
This reply was modified 7 hours, 20 minutes ago by
Mert Cevik.
-
AuthorPosts
- You must be logged in to reply to this topic.