Forum Replies Created
-
AuthorPosts
-
Thank you for sharing your observations, @yoursunny. This was indeed a bug, and it has now been fixed in the Beyond Bleeding Edge container.
I’ll be rolling out the fix to the Bleeding Edge container shortly as well.
Best,
KomalMarch 24, 2026 at 9:44 am in reply to: Policy question: external download experiments and management-network usage on F #9606Hi Rasman,
Great question, and thanks for checking before running your experiments — we appreciate that!
As yoursunny mentioned, you’ll want to use FABNetv4Ext or FABNetv6Ext network services for your experiment rather than the management network. These provide dedicated public Internet connectivity for your slices and are designed for exactly this kind of bulk data transfer work. The management network is shared infrastructure and should not be used for high-volume traffic.
One important thing to note: FABNetv4Ext and FABNetv6Ext require additional project permissions that are not enabled by default. Your Project Lead will need to request the Net.FABNetv4Ext and/or Net.FABNetv6Ext permissions for your project through the FABRIC Portal (use the “Request additional project permissions” option under Experiments -> Projects).
Once you have those permissions, you should be all set to run sustained download experiments against NCBI/ENA without any issues on the FABRIC side.
Also, thanks yoursunny for jumping in with the helpful pointer!
Best,
KomalHi Fatih,
I looked into your slice (698e8e21). During the renewal attempt, several VMs failed to renew due to insufficient resources on the target workers. These closed on 2026-03-16 initial end date.
– 4 VMs failed due to insufficient RAM (on ncsa-w1 and other workers)
– 2 VMs failed due to insufficient cores (on mich-w2, mich-w3)These VM failures caused a cascade: their dependent network services (L2Bridge, L2PTP) were also closed on expiry i.e. function without the underlying VMs. In total, 85 out of 129 reservations were closed and 3 additional network services were cleaned up.
The slice was stuck in Configuring because some network reservations were waiting indefinitely for their dead predecessor VMs. I have deployed a fix that now properly detects this condition and closes those stuck reservations, which is why the slice has transitioned out of the Configuring state.
Unfortunately, this slice cannot be recovered in its current state — too many VMs and their dependent network services have been closed. I recommend deleting this slice and creating a new one. To avoid resource contention, you may want to check site availability before submitting and consider spreading your VMs across sites with more available capacity, or using smaller VM flavors.
Please let us know if you need any further assistance.
NOTE: Please note that with advanced reservations in play, renew/extend is not always guaranteed as the resources may have been acquired by someone else.
Best regards,
Komal-
This reply was modified 1 week, 2 days ago by
Komal Thareja.
Hi Nirmala,
This looks like a bug. I am investigating it and will work to deploy a fix for this soon. Apologies for the inconvenience.
Best,
Komal
March 5, 2026 at 2:55 pm in reply to: Issue with Node-to-Node Communication in Fabric Experiment #9565Hi Sree,
VMs cannot communicate with each other over the private IPs assigned to interfaces connected to the management network. The interfaces with addresses in the
10.*range belong to this management network. Inter-VM communication should instead occur over the data plane network, which in your case is the L2Bridge network.I reviewed your slice and noticed that you have three VMs and two L2Bridge networks configured. However, the IP addresses on the VM interfaces are not set up correctly. Each network must use a different subnet, and the corresponding VM interfaces should be assigned IP addresses from those respective subnets.
Please refer to the following example notebook, which demonstrates how to correctly configure the network:
jupyter-examples-*/fabric_examples/fablib_api/create_l2network_basic/create_l2network_basic_auto.ipynbMake sure to use separate subnets for each network and assign the appropriate IPs to the VM interfaces so that communication works properly.
Best,
KomalMarch 5, 2026 at 9:04 am in reply to: Issue with Node-to-Node Communication in Fabric Experiment #9558Hi Sree,
Could you please share your slice ID so we can look at it? In addition, please check some of the following examples available via
jupyter-examples-*/start_here.ipynbthat may be useful.
Thanks,
Komal
-
This reply was modified 3 weeks, 5 days ago by
Komal Thareja.
-
This reply was modified 3 weeks, 5 days ago by
Komal Thareja.
Hi Fatih,
Could you please try changing the following files:
/home/fabric/work/fabric_config/ssh_config/home/fabric/work/fabric_config/fabric_configchange
bastion.fabric-testbed.nettobastion-ncsa-1.fabric-testbed.netin both the files.Reload the kernel of your notebook and try the
node.executeThanks,
Komal
Hi Fatih,
Could you please check if you see any errors in the
/tmp/fablib/fablib.log?Also, could you please share which Jupyter Container are you using?
Best,
Komal
@Mert / @Khawar,
I attempted to recover the VM last night and shut it down as part of the process. During the investigation, I noticed that the
/home/ubuntu/.sshdirectory was missing from the VM. I tried to restore the SSH keys to regain access, but subsequently found that the VM was no longer bootable and consistently failed with filesystem errors.Further inspection showed that
/etc/fstabon the VM had been modified:LABEL=cloudimg-rootfs / ext4 discard,errors=remount-ro 0 1 LABEL=UEFI /boot/efi vfat umask=0077 0 1 vm0:/myvol /gss glusterfs defaults,_netdev,nofail 0 0I attempted to revert the
/etc/fstabchanges, but was unable to recover to a bootable state. It appears these modifications may have been introduced as part of your experiment, possibly unintentionally.Please be mindful when making system-level changes during experiments. In some cases, recovery is not possible if the VM state has been significantly altered and the changes are not fully known.
Best,
Komal
February 2, 2026 at 8:54 am in reply to: Building slice with large number of nodes and network services #9466Hi Meshal,
Could you please share your notebook? I was able to successfully create a slice with 100 VMs distributed across 6–8 sites without any issues. If you can share your notebook, I’d be happy to try reproducing the error and work on resolving it.
Best,
Komal
Hi Nishanth,
FABRIC currently has only three IPv4-capable sites: TOKY, BRIST, and FIU. BlueField devices are not available at BRIST or TOKY. I’ll work on reproducing the issue and investigate the connectivity problem on the IPv6 sites, and I’ll share my findings once I have more information. Thanks for your patience!
Best,
KomalHi Fatih,
The PCI devices had been disconnected from your VMs, but I’ve now re-attached them. You should be able to see them on your VM.
I’ll review the logs to determine what caused this. In the meantime, if you’re able to share any operations or actions triggered as part of your experiment, that would be very helpful in narrowing down the issue. Thanks so much for your help!
Best,
KomalJanuary 30, 2026 at 1:15 pm in reply to: Creating a P4 Switch for a research (production-level) #9458Hi Suhib,
To use P4 Tofino switches, your project lead can request the Switch.P4 permission directly through the FABRIC portal.
FABRIC also offers BlueField-3 DPUs, which support P4, as well as FPGAs—both of these resources similarly require explicit permission requests. You can find details on project roles and permissions here:
https://learn.fabric-testbed.net/knowledge-base/fabric-user-roles-and-project-permissions/#project-permissionsYou may also want to explore several example artifacts available at:
https://artifacts.fabric-testbed.net/artifacts/Best,
Komal
Hi Tejas,
Are you still observing the SSH issues?
Best,
Komal
Hi Tejas,
Could you please check the logs:
/tmp/fablib/fablib.logand also check if your bastion keys are not expired?Please re-run
jupyter-examples-*/configure_and_validate.ipynbnotebook to renew your SSH keys. Please try creating the slice again after this.Best,
Komal
-
This reply was modified 1 week, 2 days ago by
-
AuthorPosts