1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 540 total)
  • Author
    Posts
  • Komal Thareja
    Participant

      Hi Sree,

      I’m investigating the extend/renew of this slice. That said, I’d strongly recommend backing up your data in the meantime — that way, if the slice ever needs to be recreated, you’ll have everything you need on hand.

      Best,
      Komal

      Komal Thareja
      Participant

        Hi Sree,

        Could you please share your slice ID?

        Best,

        Komal

        Komal Thareja
        Participant

          Hi Yifan,

          When creating a slice through the Portal, the network configuration needs to be set up manually. However, if you create the slice via the JupyterHub interface (Portal → JupyterHub), the network configuration is handled automatically. You can follow the steps outlined here: https://learn.fabric-testbed.net/knowledge-base/creating-your-first-experiment-in-jupyter-hub/

          Best,
          Komal

          1 user thanked author for this post.
          Komal Thareja
          Participant

            Hi Yifan,

            I’m not sure how the VMs were originally provisioned—whether auto configuration or manual setup was used, or which JupyterHub container was involved.

            I checked your MASS VMs and noticed that IPv6 addresses were not assigned to the data plane interfaces and the required routes were missing. I manually configured both VMs by assigning IPv6 addresses and adding the appropriate routes:

            mass-0:

            sudo ip -6 addr add 2602:fcfb:7:1::2/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:7:1::1 dev enp7s0
            

            mass-1:

            sudo ip -6 addr add 2602:fcfb:7:1::3/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:7:1::1 dev enp7s0
            

            After applying these changes, connectivity between the MASS VMs is working as expected (verified via ping).

            I also attempted to access the UTAH and ATLA VMs, but I wasn’t able to SSH using the NOVA keys, so I couldn’t validate their configuration.

            Could you please run the following commands on the remaining VMs to configure the data plane interfaces?

            UTAH VMs

            ut-0:

            sudo ip -6 addr add 2602:fcfb:8:d1::2/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:8:d1::1 dev enp7s0
            

            ut-1:

            sudo ip -6 addr add 2602:fcfb:8:d1::3/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:8:d1::1 dev enp7s0
            

            ATLA VMs

            atl-0:

            sudo ip -6 addr add 2602:fcfb:15:1::2/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:15:1::1 dev enp7s0
            

            atl-1:

            sudo ip -6 addr add 2602:fcfb:15:1::3/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:15:1::1 dev enp7s0
            

            GATECH VMs

            gatech-0:

            sudo ip -6 addr add 2602:fcfb:11:2::3/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:11:2::1 dev enp7s0
            

            gatech-1:

            sudo ip -6 addr add 2602:fcfb:11:2::2/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:11:2::1 dev enp7s0
            

            WASH VMs

            wash-0:

            sudo ip -6 addr add 2602:fcfb:a:1::3/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:a:1::1 dev enp7s0
            

            wash-1:

            sudo ip -6 addr add 2602:fcfb:a:1::2/64 dev enp7s0
            sudo ip link set enp7s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:a:1::1 dev enp7s0
            

            LOSA VMs

            la-0 (uses enp6s0):

            sudo ip -6 addr add 2602:fcfb:12:c::3/64 dev enp6s0
            sudo ip link set enp6s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:12:c::1 dev enp6s0
            

            la-1 (uses enp6s0):

            sudo ip -6 addr add 2602:fcfb:12:c::2/64 dev enp6s0
            sudo ip link set enp6s0 up
            sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:12:c::1 dev enp6s0
            

            Note: The LOSA VMs use enp6s0 instead of enp7s0 for the data plane interface.

            Please let me know if you need any help with this.

            Best,
            Komal

            1 user thanked author for this post.
            Komal Thareja
            Participant

              Hi Yifan,

              Could you please share your slice id?

              Best,

              Komal

              Komal Thareja
              Participant

                You should be able to re-use the existing slice.

                Just run the following in a cell.

                slice=fablib.get_slice(slice_name)

                slice.post_boot_config()

                slice.list_nodes();

                slice.list_interfaces();

                Thanks,

                Komal

                Komal Thareja
                Participant

                  Hi Rasman,

                  I tried both your shared NICs example and the iperf3 (CX5) notebook, and I do see IPs being configured on the VMs.

                  Could you please run the following notebook:
                  jupyter-examples-*/configure_and_validate/configure_and_validate.ipynb?

                  It’s possible that your bastion keys have expired, which may be preventing fablib from properly configuring the nodes.

                  I’ve attached a snapshot of the output from my runs below for reference.

                  Best,
                  Komal

                  Komal Thareja
                  Participant

                    Hi Rasman,

                    Which JH container are you using?

                    Best,

                    Komal

                    in reply to: pin_cpu & poa(operation=”cpupin”) #9620
                    Komal Thareja
                    Participant

                      Thank you for sharing your observations, @yoursunny. This was indeed a bug, and it has now been fixed in the Beyond Bleeding Edge container.

                      I’ll be rolling out the fix to the Bleeding Edge container shortly as well.

                      Best,
                      Komal

                      Komal Thareja
                      Participant

                        Hi Rasman,

                        Great question, and thanks for checking before running your experiments — we appreciate that!

                        As yoursunny mentioned, you’ll want to use FABNetv4Ext or FABNetv6Ext network services for your experiment rather than the management network. These provide dedicated public Internet connectivity for your slices and are designed for exactly this kind of bulk data transfer work. The management network is shared infrastructure and should not be used for high-volume traffic.

                        One important thing to note: FABNetv4Ext and FABNetv6Ext require additional project permissions that are not enabled by default. Your Project Lead will need to request the Net.FABNetv4Ext and/or Net.FABNetv6Ext permissions for your project through the FABRIC Portal (use the “Request additional project permissions” option under Experiments -> Projects).

                        Once you have those permissions, you should be all set to run sustained download experiments against NCBI/ENA without any issues on the FABRIC side.

                        Also, thanks yoursunny for jumping in with the helpful pointer!

                        Best,
                        Komal

                        in reply to: Slice Renewal Stuck in Configuring State #9602
                        Komal Thareja
                        Participant

                          Hi Fatih,

                          I looked into your slice (698e8e21). During the renewal attempt, several VMs failed to renew due to insufficient resources on the target workers. These closed on 2026-03-16 initial end date.

                          – 4 VMs failed due to insufficient RAM (on ncsa-w1 and other workers)
                          – 2 VMs failed due to insufficient cores (on mich-w2, mich-w3)

                          These VM failures caused a cascade: their dependent network services (L2Bridge, L2PTP) were also closed on expiry i.e. function without the underlying VMs. In total, 85 out of 129 reservations were closed and 3 additional network services were cleaned up.

                          The slice was stuck in Configuring because some network reservations were waiting indefinitely for their dead predecessor VMs. I have deployed a fix that now properly detects this condition and closes those stuck reservations, which is why the slice has transitioned out of the Configuring state.

                          Unfortunately, this slice cannot be recovered in its current state — too many VMs and their dependent network services have been closed. I recommend deleting this slice and creating a new one. To avoid resource contention, you may want to check site availability before submitting and consider spreading your VMs across sites with more available capacity, or using smaller VM flavors.

                          Please let us know if you need any further assistance.

                          NOTE: Please note that with advanced reservations in play, renew/extend is not always guaranteed as the resources may have been acquired by someone else.

                          Best regards,
                          Komal

                          in reply to: slice hungup on configuring #9588
                          Komal Thareja
                          Participant

                            Hi Nirmala,

                            This looks like a bug. I am investigating it and will work to deploy a fix for this soon. Apologies for the inconvenience.

                            Best,

                            Komal

                            Komal Thareja
                            Participant

                              Hi Sree,

                              VMs cannot communicate with each other over the private IPs assigned to interfaces connected to the management network. The interfaces with addresses in the 10.* range belong to this management network. Inter-VM communication should instead occur over the data plane network, which in your case is the L2Bridge network.

                              I reviewed your slice and noticed that you have three VMs and two L2Bridge networks configured. However, the IP addresses on the VM interfaces are not set up correctly. Each network must use a different subnet, and the corresponding VM interfaces should be assigned IP addresses from those respective subnets.

                              Please refer to the following example notebook, which demonstrates how to correctly configure the network:
                              jupyter-examples-*/fabric_examples/fablib_api/create_l2network_basic/create_l2network_basic_auto.ipynb

                              Make sure to use separate subnets for each network and assign the appropriate IPs to the VM interfaces so that communication works properly.

                              Best,
                              Komal

                              Komal Thareja
                              Participant

                                Hi Sree,

                                Could you please share your slice ID so we can look at it? In addition, please check some of the following examples available via jupyter-examples-*/start_here.ipynb that may be useful.

                                 

                                Thanks,

                                Komal

                                in reply to: node.execute() hangs in FABRIC notebook #9519
                                Komal Thareja
                                Participant

                                  Hi Fatih,

                                  Could you please try changing the following files:

                                  /home/fabric/work/fabric_config/ssh_config

                                  /home/fabric/work/fabric_config/fabric_config

                                  change bastion.fabric-testbed.net to bastion-ncsa-1.fabric-testbed.net in both the files.

                                  Reload the kernel of your notebook and try the node.execute

                                  Thanks,

                                  Komal

                                Viewing 15 posts - 1 through 15 (of 540 total)