Komal Thareja

Forum Replies Created

Viewing 15 posts - 16 through 30 (of 558 total)

← 1 2 3 … 36 37 38 →

Author

Posts
May 4, 2026 at 2:06 pm in reply to: node.add_fabnet() raises ResourceNotFoundError #9752
Komal Thareja
Moderator
Hi Arash,

I’m looking at this will push out a fix soon.

Best,

Komal
April 27, 2026 at 9:51 am in reply to: Cannot allocate GPU + ConnectX-6 on same node #9727
Komal Thareja
Moderator
Portal view has been fixed too! Portal now shows the state of resources correctly.

Best,

Komal
April 26, 2026 at 4:30 pm in reply to: Cannot allocate GPU + ConnectX-6 on same node #9726
Komal Thareja
Moderator
Hi Bek,

Just a heads-up — the resource status on the portal isn’t quite matching the actual state of the resources right now. I’m working to get that sorted, but in the meantime you can use the fablib API to check availability and find an open slot for your target slice.

Here’s an artifact that should come in handy: https://artifacts.fabric-testbed.net/artifacts/e777ce3a-5b40-4e58-9666-7f31f655f03c

Best,

Komal
April 22, 2026 at 11:54 am in reply to: Request to Extend Slice Lease – unable to do it from portal #9702
Komal Thareja
Moderator
Hi Sree,

I’m investigating the extend/renew of this slice. That said, I’d strongly recommend backing up your data in the meantime — that way, if the slice ever needs to be recreated, you’ll have everything you need on hand.

Best,
Komal
April 22, 2026 at 11:37 am in reply to: Request to Extend Slice Lease – unable to do it from portal #9701
Komal Thareja
Moderator
Hi Sree,

Could you please share your slice ID?

Best,

Komal
April 10, 2026 at 6:41 am in reply to: Nodes in the same slice using FABNetv6 cannot reach each other #9681
Komal Thareja
Moderator
Hi Yifan,

When creating a slice through the Portal, the network configuration needs to be set up manually. However, if you create the slice via the JupyterHub interface (Portal → JupyterHub), the network configuration is handled automatically. You can follow the steps outlined here: https://learn.fabric-testbed.net/knowledge-base/creating-your-first-experiment-in-jupyter-hub/

Best,
Komal

1 user thanked author for this post.

Yifan Cai
April 9, 2026 at 11:25 pm in reply to: Nodes in the same slice using FABNetv6 cannot reach each other #9679
Komal Thareja
Moderator
Hi Yifan,

I’m not sure how the VMs were originally provisioned—whether auto configuration or manual setup was used, or which JupyterHub container was involved.

I checked your MASS VMs and noticed that IPv6 addresses were not assigned to the data plane interfaces and the required routes were missing. I manually configured both VMs by assigning IPv6 addresses and adding the appropriate routes:

mass-0:
```
sudo ip -6 addr add 2602:fcfb:7:1::2/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:7:1::1 dev enp7s0
```
mass-1:
```
sudo ip -6 addr add 2602:fcfb:7:1::3/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:7:1::1 dev enp7s0
```
After applying these changes, connectivity between the MASS VMs is working as expected (verified via ping).

I also attempted to access the UTAH and ATLA VMs, but I wasn’t able to SSH using the NOVA keys, so I couldn’t validate their configuration.

Could you please run the following commands on the remaining VMs to configure the data plane interfaces?

UTAH VMs

ut-0:
```
sudo ip -6 addr add 2602:fcfb:8:d1::2/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:8:d1::1 dev enp7s0
```
ut-1:
```
sudo ip -6 addr add 2602:fcfb:8:d1::3/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:8:d1::1 dev enp7s0
```
ATLA VMs

atl-0:
```
sudo ip -6 addr add 2602:fcfb:15:1::2/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:15:1::1 dev enp7s0
```
atl-1:
```
sudo ip -6 addr add 2602:fcfb:15:1::3/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:15:1::1 dev enp7s0
```
GATECH VMs

gatech-0:
```
sudo ip -6 addr add 2602:fcfb:11:2::3/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:11:2::1 dev enp7s0
```
gatech-1:
```
sudo ip -6 addr add 2602:fcfb:11:2::2/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:11:2::1 dev enp7s0
```
WASH VMs

wash-0:
```
sudo ip -6 addr add 2602:fcfb:a:1::3/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:a:1::1 dev enp7s0
```
wash-1:
```
sudo ip -6 addr add 2602:fcfb:a:1::2/64 dev enp7s0
sudo ip link set enp7s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:a:1::1 dev enp7s0
```
LOSA VMs

la-0 (uses enp6s0):
```
sudo ip -6 addr add 2602:fcfb:12:c::3/64 dev enp6s0
sudo ip link set enp6s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:12:c::1 dev enp6s0
```
la-1 (uses enp6s0):
```
sudo ip -6 addr add 2602:fcfb:12:c::2/64 dev enp6s0
sudo ip link set enp6s0 up
sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:12:c::1 dev enp6s0
```
Note: The LOSA VMs use enp6s0 instead of enp7s0 for the data plane interface.

Please let me know if you need any help with this.

Best,
Komal

1 user thanked author for this post.

Yifan Cai
April 9, 2026 at 10:50 pm in reply to: Nodes in the same slice using FABNetv6 cannot reach each other #9676
Komal Thareja
Moderator
Hi Yifan,

Could you please share your slice id?

Best,

Komal
April 4, 2026 at 9:03 pm in reply to: L2 network created successfully, but interfaces are not getting IPs #9655
Komal Thareja
Moderator
You should be able to re-use the existing slice.

Just run the following in a cell.

slice=fablib.get_slice(slice_name)

slice.post_boot_config()

slice.list_nodes();

slice.list_interfaces();

Thanks,

Komal
April 4, 2026 at 8:39 pm in reply to: L2 network created successfully, but interfaces are not getting IPs #9651
Komal Thareja
Moderator
Hi Rasman,

I tried both your shared NICs example and the iperf3 (CX5) notebook, and I do see IPs being configured on the VMs.

Could you please run the following notebook:
jupyter-examples-*/configure_and_validate/configure_and_validate.ipynb?

It’s possible that your bastion keys have expired, which may be preventing fablib from properly configuring the nodes.

I’ve attached a snapshot of the output from my runs below for reference.

Best,
Komal
April 4, 2026 at 6:01 pm in reply to: L2 network created successfully, but interfaces are not getting IPs #9647
Komal Thareja
Moderator
Hi Rasman,

Which JH container are you using?

Best,

Komal
March 30, 2026 at 11:17 pm in reply to: pin_cpu & poa(operation=”cpupin”) #9620
Komal Thareja
Moderator
Thank you for sharing your observations, @yoursunny. This was indeed a bug, and it has now been fixed in the Beyond Bleeding Edge container.

I’ll be rolling out the fix to the Bleeding Edge container shortly as well.

Best,
Komal
March 24, 2026 at 9:44 am in reply to: Policy question: external download experiments and management-network usage on F #9606
Komal Thareja
Moderator
Hi Rasman,

Great question, and thanks for checking before running your experiments — we appreciate that!

As yoursunny mentioned, you’ll want to use FABNetv4Ext or FABNetv6Ext network services for your experiment rather than the management network. These provide dedicated public Internet connectivity for your slices and are designed for exactly this kind of bulk data transfer work. The management network is shared infrastructure and should not be used for high-volume traffic.

One important thing to note: FABNetv4Ext and FABNetv6Ext require additional project permissions that are not enabled by default. Your Project Lead will need to request the Net.FABNetv4Ext and/or Net.FABNetv6Ext permissions for your project through the FABRIC Portal (use the “Request additional project permissions” option under Experiments -> Projects).

Once you have those permissions, you should be all set to run sustained download experiments against NCBI/ENA without any issues on the FABRIC side.

Also, thanks yoursunny for jumping in with the helpful pointer!

Best,
Komal
March 22, 2026 at 9:06 am in reply to: Slice Renewal Stuck in Configuring State #9602
Komal Thareja
Moderator
Hi Fatih,

I looked into your slice (698e8e21). During the renewal attempt, several VMs failed to renew due to insufficient resources on the target workers. These closed on 2026-03-16 initial end date.

– 4 VMs failed due to insufficient RAM (on ncsa-w1 and other workers)
– 2 VMs failed due to insufficient cores (on mich-w2, mich-w3)

These VM failures caused a cascade: their dependent network services (L2Bridge, L2PTP) were also closed on expiry i.e. function without the underlying VMs. In total, 85 out of 129 reservations were closed and 3 additional network services were cleaned up.

The slice was stuck in Configuring because some network reservations were waiting indefinitely for their dead predecessor VMs. I have deployed a fix that now properly detects this condition and closes those stuck reservations, which is why the slice has transitioned out of the Configuring state.

Unfortunately, this slice cannot be recovered in its current state — too many VMs and their dependent network services have been closed. I recommend deleting this slice and creating a new one. To avoid resource contention, you may want to check site availability before submitting and consider spreading your VMs across sites with more available capacity, or using smaller VM flavors.

Please let us know if you need any further assistance.

NOTE: Please note that with advanced reservations in play, renew/extend is not always guaranteed as the resources may have been acquired by someone else.

Best regards,
Komal
- This reply was modified 4 months ago by Komal Thareja.
March 17, 2026 at 10:22 pm in reply to: slice hungup on configuring #9588
Komal Thareja
Moderator
Hi Nirmala,

This looks like a bug. I am investigating it and will work to deploy a fix for this soon. Apologies for the inconvenience.

Best,

Komal
Author

Posts

Viewing 15 posts - 16 through 30 (of 558 total)

← 1 2 3 … 36 37 38 →

Forum Replies Created

1 user thanked author for this post.

UTAH VMs

ATLA VMs

GATECH VMs

WASH VMs

LOSA VMs

1 user thanked author for this post.