Forum Replies Created
-
AuthorPosts
-
Thank you Alexander for sharing this. I have shared the details with the network team. Will keep you posted.
Thanks,
Komal
Hi Philips,
At the moment, we do not support guaranteed QoS. This feature will be available soon. In the meantime, you can use tools such as
tc
to manage bandwidth on the VMs.Thanks,
KomalHi Nishant,
Please find my responses inline below:
Once a user has reserved a slice with an FPGA, that resource is locked and cannot be acquired or modified by other users until the slice is released.
You’re correct—if the FPGA has been flashed with a workflow other than the EsNet workflow, it may fail.
However, we cannot guarantee the validity or state of the bitstream that was previously flashed by another user before you acquired the slice. This may leave the FPGA in an inconsistent or unusable state. In our experience, reflashing the FPGA with a known good (golden) image typically restores it to a usable state.
We are planning to share this golden image along with the notebook with users soon, so they can perform the reflash themselves when needed. In the meantime, if you’re currently blocked, please let me know the specific site you’re working with—I’ll check whether we can assist with reflashing the FPGA for you.
Thanks,
Komal
Hi Alex,
The network team reviewed the configuration and found no issues on the switch side. However, they observed that the MAC addresses for these interfaces have not been learned by the switch.
As a next step, they recommend removing the L2Bridge service and connecting both interfaces directly to FabNetV4 to verify if the network connectivity is restored.
Please perform this change using slice modify, so the same VMs and interfaces can be reused for validation. This helps us rule out the possibility that recreating the VMs might inadvertently resolve the issue.
Refer to this notebook for guidance on how to modify the slice.
Thanks,
Komal
Could you please check your VM again?
All PCI devices had been disconnected. I have reconnected them to your VM. Please check it.
Also, could you please share the sequence of operations that lead your VM to this state?
It would be helpful to see if there is anything that needs to be fixed on our control software.
Thanks,
Komal
Please share your slice ID and also the output of the command:
ifconfig -a
Thanks,
Komal
Thank you Alex for sharing this observation! I temporarily assigned IP addresses to these interfaces on r3 and 4 nodes and do not see ping working between them.
Network service as provisioned looks ok. I am reaching out to the network team and will keep you posted.
Thanks,
Komal
Hi Ajay,
You can use the following code snippet to reboot the node:
slice = fablib.get_slice(slice_name)
node = slice.get_node(node_name)
node.os_reboot()
Also, please share your slice ID so we can take a look at it.
Thanks,
Komal
Thank you for your question.
What I meant is that once an FPGA is initially flashed with a provided bitstream, users can reflash it with a different bitstream of their choice—as long as the PCIe interface remains unchanged. Because of this flexibility, the actual state of the FPGA at a given site may differ from what’s shown in the shared sheet, depending on whether a user has reprogrammed it.
Best,
Komal
Thank you for your feedback, Philip!
You’re absolutely right—
node.add_fabnet()
attaches the FabNetV4 service to the node, enabling communication with other nodes over FABRIC’s data plane network via the FabNetV4 interface.In addition, all VMs provisioned in FABRIC are assigned a Management IP for administrative purposes. This interface allows inbound SSH access and supports outbound connections, including those required for operations like
docker pull
. However, please note that the management network is actively monitored and any torrent or insecure traffic may be flagged. Such activity can lead to enforcement actions, including possible slice termination. As a best practice, we recommend not using the management network for experimental traffic.Best,
Komal
Thank you for your inquiry Philip.
You are welcome to conduct experiments involving IPFS or BitTorrent on FABRIC, particularly for evaluating peer discovery and data transfer between FABRIC nodes. This type of testing is permissible as long as it is confined to FABnet or a custom Layer 2 network within the FABRIC infrastructure.
We kindly request that your experiment not initiate connections to external BitTorrent or IPFS servers outside the FABRIC environment.
Please feel free to reach out if you need any assistance with the experiment setup or have further questions.
Best regards,
Komal
Hi Nishanth,
Please find enclosed the most recent known status. Kindly note that users have the ability to flash their own binaries, so the actual state of the infrastructure may differ from what is captured in the attached sheet. As a first step toward addressing this, we are working to include notebook and Control Framework support in Release 1.9, enabling users to flash FPGAs within their workflows directly.
Thanks,
Komal
Hi Anthony,
Regarding your slice:
a5d2fff2-84fc-48d9-8d67-5ff96e120273
Start: 2025-04-18 14:53:43 +0000
End: 2025-05-02 14:53:42 +0000A renew operation was attempted for this slice, but it failed for the VM due to insufficient resources:
['core']
.Please note that we now support advance reservations, which allow users to reserve resources ahead of time. As a result, a renew request may fail if it conflicts with an existing advance reservation — which appears to be the case here.
It’s unclear how the renew was initiated, but if it was done through JupyterHub, the error would have been reported to the user. We suspect there may be a bug on the portal side where this error is not being surfaced correctly, and we will investigate and address that.
Unfortunately, the only available option at this point is to re-create the slice. We apologize for the inconvenience.
Thanks,
Komal
Hi Nishanth,
Thank you for sharing this.
Please note that the current implementation of
execute_thread
maintains the process only for the duration of the specified timeout. As you correctly observed, for longer-running processes, directly accessing the switch via SSH allows you to manually launchswitchd
.We will work on enhancing
execute_thread
to better support this use case and will keep you informed once the update is available.Thanks,
Komal
-
AuthorPosts