1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 46 through 60 (of 526 total)
  • Author
    Posts
  • in reply to: Performance Drop on ConnectX-6 After Release 1.9 #9042
    Komal Thareja
    Participant

      Hi Rasman,

      I forgot to mention that the steps for installing iperf3 should be run as the root user. On your VM, I did the following:

      sudo su -
      
      curl -L https://github.com/esnet/iperf/releases/download/3.18/iperf-3.18.tar.gz > iperf-3.18.tar.gz
      tar -zxvf iperf-3.18.tar.gz
      cd iperf-3.18
      
      sudo apt update
      sudo apt install build-essential
      
      sudo ./configure; make; make install
      sudo ldconfig
      

      I also applied the following host tuning (node_tools/host_tune.sh) on both VMs:

      #!/bin/bash
      
      # Linux host tuning from https://fasterdata.es.net/host-tuning/linux/
      cat >> /etc/sysctl.conf <<EOL
      # allow testing with buffers up to 128MB
      net.core.rmem_max = 536870912
      net.core.wmem_max = 536870912
      # increase Linux autotuning TCP buffer limit to 64MB
      net.ipv4.tcp_rmem = 4096 87380 536870912
      net.ipv4.tcp_wmem = 4096 65536 536870912
      # recommended default congestion control is htcp or bbr
      net.ipv4.tcp_congestion_control = bbr
      # recommended for hosts with jumbo frames enabled
      net.ipv4.tcp_mtu_probing = 1
      # recommended to enable 'fair queueing'
      net.core.default_qdisc = fq
      #net.core.default_qdisc = fq_codel
      EOL
      
      sysctl --system
      
      # Turn on jumbo frames
      for dev in basename -a /sys/class/net/*; do
          ip link set dev $dev mtu 9000
      done
      

      With these changes, I’m now seeing bandwidth close to 10G (see snapshot below).

      Screenshot-2025-09-24-at-5.31.23-PM

      According to fablib.list_links(), links from GATECH are capped at 8G. I’d suggest trying a different site instead of GATECH.

      Screenshot-2025-09-24-at-5.41.06-PM

      Regarding the slice getting stuck at Submit: your keys may have expired. Please try running the notebook jupyter-examples-rel1.9.0/configure_and_validate/configure_an_validate.ipynb. This should automatically renew your keys if needed.

      If it still hangs at submit, please check /tmp/fablib/fablib.log for errors and share here.

      Best,

      Komal

      in reply to: Performance Drop on ConnectX-6 After Release 1.9 #9032
      Komal Thareja
      Participant

        Hi Rasman,

        By default, the standard iperf3 version does not perform well with multiple streams. ESnet provides a patched version that resolves this issue and delivers significantly better performance. This fixed iperf3 is already packaged inside the container.

        If you would like to run it directly on the host, you can install it with the following steps:

        curl -L https://github.com/esnet/iperf/releases/download/3.18/iperf-3.18.tar.gz > iperf-3.18.tar.gz
        tar -zxvf iperf-3.18.tar.gz
        cd iperf-3.18
        sudo apt update
        sudo apt install build-essential
        sudo ./configure
        make
        sudo make install
        

        Additionally, please make sure that the script node_tools/host_tune.sh (included with the notebook) has been executed on the relevant nodes.

        If you continue to see lower bandwidth, kindly share your slice ID so I can take a closer look.

        Thanks,
        Komal

        in reply to: can’t see nvidia card though VM shows component assigned #9025
        Komal Thareja
        Participant

          Thank you for reporting this Maureen! We have identified the issue and working to identify a solution.

          We will keep you posted about the resolution. Apologies for the inconvenience.

          Best,

          Komal

          in reply to: Availability of DPU-powered SmartNICs #8989
          Komal Thareja
          Participant

            Hi Tanay,

            We’re in the process of deploying them and are targeting DPU availability at KNIT11 around October 13–14.

            Best.

            Komal

            in reply to: Node Naming error #8931
            Komal Thareja
            Participant

              Hi Nishanth,

              Could you please try again? It should work now, freebsd fix introduced this check, i have disabled it now.

              Best,

              Komal

              Komal Thareja
              Participant

                Hi,

                Fix for this issue has been deployed on production. Please try creating a slice and let us know if you run into any issues.

                Best,

                Komal

                in reply to: Establishing connection between different Slices. #8911
                Komal Thareja
                Participant

                  Hi Tejas,

                  You can use FABRIC’s Layer 3 FabNetv4 or FabNetv6 Network Service to establish connectivity between slices.

                  Any VM connected to FabNetv* in one slice can communicate with a VM connected to FabNetv* in another slice, provided the routes are configured correctly. You just need to add the following routes:

                  ip route add 10.128.0.0/10 via <fabnetv4_gateway>
                  ip -6 route add 2602:FCFB:00::/40 via <fabnetv6_gateway>
                  

                  You may also find this example artifact helpful, as it demonstrates inter-slice connectivity using FabNetv4.

                  Best,

                  Komal

                  Komal Thareja
                  Participant

                    Thank you for sharing this, Nishant and YoursSunny.

                    I was able to reproduce the issue. On IPv4 sites, user SSH keys are not being injected, and on IPv6, SSH connections are failing completely. We’ll work on addressing this and will let you know once the fix has been deployed. Apologies for the inconvenience in the meantime.

                    Best,

                    Komal

                    in reply to: Internal Server Error when running JupyterHub cell #8883
                    Komal Thareja
                    Participant

                      Hi Dagim,

                      Thank you for sharing this observation. Could you update the instantiation of the fablib object in the first cell to the following and then try running the notebook again?

                      fablib = fablib_manager(project_id=project_id, validate_config=False)
                      

                      Thanks,
                      Komal

                       

                      Komal Thareja
                      Participant

                        Hi Zhihe,

                        This was a bug, a fix has been deployed on the default container. Could you please try running this notebook again?

                        Thanks,

                        Komal

                        in reply to: earlier versions of Jupyter examples #8877
                        Komal Thareja
                        Participant

                          Yes please Nirmala – Just keep the following in that file:

                          This should allow you to delete older examples.

                          {
                          "examples": [
                          {
                          "url": "default",
                          "location": "/home/fabric/work"
                          }
                          ]
                          }

                          Best,

                          Komal

                          Komal Thareja
                          Participant

                            Hi Philip,

                            We discussed this internally, and the minimum supported bandwidth value is 1 Gbps. At this time, we don’t have plans to provide more fine-grained options. If you need to simulate lower bandwidth, we recommend using tools such as tc.

                            Best,

                            Komal

                            in reply to: Energy monitoring of an allocation #8870
                            Komal Thareja
                            Participant

                              Hi Jacob,

                              Thank you for sharing the details. I discussed the energy consumption measurement topic with the team earlier today.

                              As mentioned before, we do not currently support energy consumption measurements on the VMs. The network team also confirmed that such measurement capabilities are not available on the network devices.

                              For identifying the geolocation of hops, we recommend using the L2PTP (Layer 2 Point-to-Point) network, where the user explicitly defines the network path by specifying the SITEs for each hop. You can then use fablib.list_sites() to obtain the geo-coordinates.

                              Please refer to this artifact for guidance on setting up an L2PTP slice:
                              https://artifacts.fabric-testbed.net/artifacts/7e439627-96be-45e0-ab67-50bb607f06e4

                               

                              Also, regarding the renewal request, I see that it was submitted yesterday. Michael will follow up on that through the ticket.

                              Best,

                              Komal

                              • This reply was modified 5 months, 3 weeks ago by Komal Thareja.
                              in reply to: earlier versions of Jupyter examples #8868
                              Komal Thareja
                              Participant

                                Hi Nirmala,

                                You don’t need to keep the example notebooks, so please go ahead and remove them.

                                Could you also take a look at the contents of /home/fabric/fabric_config/fabric_config.json? I suspect there may be multiple entries in that file. If so, please delete it as well—this should prevent the older examples from being retained when you log in again.

                                Best,

                                Komal

                                in reply to: Energy monitoring of an allocation #8867
                                Komal Thareja
                                Participant

                                  Hi Jacob,

                                  At the moment, energy consumption measurements are not passed into the VMs. I’ll bring this up in our planning meeting so it can be considered for inclusion in a future release.

                                  For location information, we currently expose the geo-coordinates for all FABRIC sites, which you can retrieve using fablib.list_sites(). One possible approach to determine the location of hops is to map IPs → Sites → Locations.

                                  Could you share your slice ID or specify the type of network service you’re using for your WAN experiment?

                                  Best,

                                  Komal

                                Viewing 15 posts - 46 through 60 (of 526 total)