1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 545 total)
  • Author
    Posts
  • in reply to: UDP performance tuning for ubuntu 24.04 #9776
    Komal Thareja
    Participant

      Hi Jacob,

      Take a look at this artifact. While it focuses on TCP performance, it also covers OS tuning and CPU pinning / NUMA tuning, both of which should help with your performance work.

      One other thing worth considering is the type of NIC you’re using. Basic (virtual) NICs likely won’t give you peak performance — NIC_ConnectX-6 or NIC_ConnectX-5 would be much better candidates.

      Best,
      Komal

      in reply to: node.add_fabnet() raises ResourceNotFoundError #9753
      Komal Thareja
      Participant

        Hi Arash,

        Fix has been deployed on beyond bleeding edge container. Will be available in bleeding edge container later this evening. Please let me know if you run into any more issues. Apologies for the inconvenience.

        Best,

        Komal

        in reply to: node.add_fabnet() raises ResourceNotFoundError #9752
        Komal Thareja
        Participant

          Hi Arash,

          I’m looking at this will push out a fix soon.

          Best,

          Komal

          in reply to: Cannot allocate GPU + ConnectX-6 on same node #9727
          Komal Thareja
          Participant

            Portal view has been fixed too! Portal now shows the state of resources correctly.

            Best,

            Komal

            in reply to: Cannot allocate GPU + ConnectX-6 on same node #9726
            Komal Thareja
            Participant

              Hi Bek,

              Just a heads-up — the resource status on the portal isn’t quite matching the actual state of the resources right now. I’m working to get that sorted, but in the meantime you can use the fablib API to check availability and find an open slot for your target slice.

              Here’s an artifact that should come in handy: https://artifacts.fabric-testbed.net/artifacts/e777ce3a-5b40-4e58-9666-7f31f655f03c

              Best,

              Komal

              Komal Thareja
              Participant

                Hi Sree,

                I’m investigating the extend/renew of this slice. That said, I’d strongly recommend backing up your data in the meantime — that way, if the slice ever needs to be recreated, you’ll have everything you need on hand.

                Best,
                Komal

                Komal Thareja
                Participant

                  Hi Sree,

                  Could you please share your slice ID?

                  Best,

                  Komal

                  Komal Thareja
                  Participant

                    Hi Yifan,

                    When creating a slice through the Portal, the network configuration needs to be set up manually. However, if you create the slice via the JupyterHub interface (Portal → JupyterHub), the network configuration is handled automatically. You can follow the steps outlined here: https://learn.fabric-testbed.net/knowledge-base/creating-your-first-experiment-in-jupyter-hub/

                    Best,
                    Komal

                    1 user thanked author for this post.
                    Komal Thareja
                    Participant

                      Hi Yifan,

                      I’m not sure how the VMs were originally provisioned—whether auto configuration or manual setup was used, or which JupyterHub container was involved.

                      I checked your MASS VMs and noticed that IPv6 addresses were not assigned to the data plane interfaces and the required routes were missing. I manually configured both VMs by assigning IPv6 addresses and adding the appropriate routes:

                      mass-0:

                      sudo ip -6 addr add 2602:fcfb:7:1::2/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:7:1::1 dev enp7s0
                      

                      mass-1:

                      sudo ip -6 addr add 2602:fcfb:7:1::3/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:7:1::1 dev enp7s0
                      

                      After applying these changes, connectivity between the MASS VMs is working as expected (verified via ping).

                      I also attempted to access the UTAH and ATLA VMs, but I wasn’t able to SSH using the NOVA keys, so I couldn’t validate their configuration.

                      Could you please run the following commands on the remaining VMs to configure the data plane interfaces?

                      UTAH VMs

                      ut-0:

                      sudo ip -6 addr add 2602:fcfb:8:d1::2/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:8:d1::1 dev enp7s0
                      

                      ut-1:

                      sudo ip -6 addr add 2602:fcfb:8:d1::3/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:8:d1::1 dev enp7s0
                      

                      ATLA VMs

                      atl-0:

                      sudo ip -6 addr add 2602:fcfb:15:1::2/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:15:1::1 dev enp7s0
                      

                      atl-1:

                      sudo ip -6 addr add 2602:fcfb:15:1::3/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:15:1::1 dev enp7s0
                      

                      GATECH VMs

                      gatech-0:

                      sudo ip -6 addr add 2602:fcfb:11:2::3/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:11:2::1 dev enp7s0
                      

                      gatech-1:

                      sudo ip -6 addr add 2602:fcfb:11:2::2/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:11:2::1 dev enp7s0
                      

                      WASH VMs

                      wash-0:

                      sudo ip -6 addr add 2602:fcfb:a:1::3/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:a:1::1 dev enp7s0
                      

                      wash-1:

                      sudo ip -6 addr add 2602:fcfb:a:1::2/64 dev enp7s0
                      sudo ip link set enp7s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:a:1::1 dev enp7s0
                      

                      LOSA VMs

                      la-0 (uses enp6s0):

                      sudo ip -6 addr add 2602:fcfb:12:c::3/64 dev enp6s0
                      sudo ip link set enp6s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:12:c::1 dev enp6s0
                      

                      la-1 (uses enp6s0):

                      sudo ip -6 addr add 2602:fcfb:12:c::2/64 dev enp6s0
                      sudo ip link set enp6s0 up
                      sudo ip -6 route add 2602:fcfb:00::/40 via 2602:fcfb:12:c::1 dev enp6s0
                      

                      Note: The LOSA VMs use enp6s0 instead of enp7s0 for the data plane interface.

                      Please let me know if you need any help with this.

                      Best,
                      Komal

                      1 user thanked author for this post.
                      Komal Thareja
                      Participant

                        Hi Yifan,

                        Could you please share your slice id?

                        Best,

                        Komal

                        Komal Thareja
                        Participant

                          You should be able to re-use the existing slice.

                          Just run the following in a cell.

                          slice=fablib.get_slice(slice_name)

                          slice.post_boot_config()

                          slice.list_nodes();

                          slice.list_interfaces();

                          Thanks,

                          Komal

                          Komal Thareja
                          Participant

                            Hi Rasman,

                            I tried both your shared NICs example and the iperf3 (CX5) notebook, and I do see IPs being configured on the VMs.

                            Could you please run the following notebook:
                            jupyter-examples-*/configure_and_validate/configure_and_validate.ipynb?

                            It’s possible that your bastion keys have expired, which may be preventing fablib from properly configuring the nodes.

                            I’ve attached a snapshot of the output from my runs below for reference.

                            Best,
                            Komal

                            Komal Thareja
                            Participant

                              Hi Rasman,

                              Which JH container are you using?

                              Best,

                              Komal

                              in reply to: pin_cpu & poa(operation=”cpupin”) #9620
                              Komal Thareja
                              Participant

                                Thank you for sharing your observations, @yoursunny. This was indeed a bug, and it has now been fixed in the Beyond Bleeding Edge container.

                                I’ll be rolling out the fix to the Bleeding Edge container shortly as well.

                                Best,
                                Komal

                                Komal Thareja
                                Participant

                                  Hi Rasman,

                                  Great question, and thanks for checking before running your experiments — we appreciate that!

                                  As yoursunny mentioned, you’ll want to use FABNetv4Ext or FABNetv6Ext network services for your experiment rather than the management network. These provide dedicated public Internet connectivity for your slices and are designed for exactly this kind of bulk data transfer work. The management network is shared infrastructure and should not be used for high-volume traffic.

                                  One important thing to note: FABNetv4Ext and FABNetv6Ext require additional project permissions that are not enabled by default. Your Project Lead will need to request the Net.FABNetv4Ext and/or Net.FABNetv6Ext permissions for your project through the FABRIC Portal (use the “Request additional project permissions” option under Experiments -> Projects).

                                  Once you have those permissions, you should be all set to run sustained download experiments against NCBI/ENA without any issues on the FABRIC side.

                                  Also, thanks yoursunny for jumping in with the helpful pointer!

                                  Best,
                                  Komal

                                Viewing 15 posts - 1 through 15 (of 545 total)