1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 557 total)
  • Author
    Posts
  • in reply to: Reserving P4 switch fails #9870
    Komal Thareja
    Moderator

      Hi Garegin,

      There are only 5 P4 switches on the testbed, and right now all of them are in use, which is why you’re seeing the “Insufficient Resources” error.

      To check which sites have a P4 switch available, you can use this snippet:

      # Show all sites with P4 availability
      fields = ['name', 'state', "p4-switch_available"]
      hosts_table = fablib.list_sites(
          output='pandas',
          fields=fields,
          force_refresh=True,
          filter_function=lambda x: x['state'] == 'Active'
      )
      

      I checked just now and none are currently available. The UCSD P4 switch frees up on 2026-06-22, so you may want to set up an advanced reservation for it.

      Best,

      Komal

      in reply to: Slice shows Stable Error #9858
      Komal Thareja
      Moderator

        Hi Sourya,

        Unfortunately, if your resources can’t be extended because another user has a reservation ahead of yours, there isn’t much we can do on our end.

        We recommend structuring your experiment notebooks so that the experiment can be easily re-provisioned and spun back up. Additionally, consider using our Distributed Storage Volumes to persist your data and context — that way, even if you need to re-provision your experiment, you can pick up right where you left off.

        For details on Distributed Storage, please see this artifact: https://artifacts.fabric-testbed.net/artifacts/b2f2b34e-333e-4290-9aa7-6427b38bba15

        Best,

        Komal

        in reply to: Slice shows Stable Error #9855
        Komal Thareja
        Moderator

          Hi Sourya,

          When you renew or extend a slice, successful extension of all resources isn’t guaranteed. Another user may hold an advance reservation ahead of yours, blocking the renewal. This results in a partial extension, which is what’s happening here: the resources that couldn’t be extended will expire at the original expiry date, while the ones that were successfully extended remain in the Active state. You can also identify the failed slivers on the portal when you view the slice.

          Best,

          Komal

          Komal Thareja
          Moderator

            Service Project only gives you access to the Service. Please create your slice with LAMB project.

            Best,

            Komal

            Komal Thareja
            Moderator

              Hi Vinaya,

              Could you please your slice try again?

              Please post any further questions/concerns here: https://learn.fabric-testbed.net/forums/forum/fabric-general-questions-and-discussion/

              Best,

              Komal

              Komal Thareja
              Moderator

                Hi Seena,

                We recently had a Kafka outage (details here: https://learn.fabric-testbed.net/forums/topic/service-update-kafka-outage/). Unfortunately this caused your Renew to only partially succeed, leaving your slice stuck in the Configuring state. Your VMs are still set to expire on 06/03.

                There are two ways we can resolve this, both done administratively on our end:

                1. Delete the slice and re-provision it, or
                2. Force the state to StableOK.

                One thing to flag: a further slice extension may not be possible in either case.

                Let me know how you’d like to proceed, and apologies for the inconvenience.

                Best,
                Komal

                Komal Thareja
                Moderator

                  Hi Sree,

                  You should be able to pass the flag storage=True in multiple slices and still have the same volume mounted in both the slices simultaneously. Please feel free to reach out if you run into issues. Please post on any further questions here: https://learn.fabric-testbed.net/forums/forum/fabric-general-questions-and-discussion/

                  Best,

                  Komal

                  in reply to: Operations on Slices taking time #9810
                  Komal Thareja
                  Moderator

                    Thank you for reporting this Nirmala! System has been recovered. Please let us know if you continue to run into issues.

                    Best,

                    Komal

                    in reply to: Service Update — Kafka Outage #9809
                    Komal Thareja
                    Moderator

                      closing the topic!

                      in reply to: Service Update — Kafka Outage #9808
                      Komal Thareja
                      Moderator

                        Dear Users,

                        The issue has been resolved and service has been fully restored. We apologize for the inconvenience caused.

                        Happy Experimenting!

                        Best,

                        Komal

                        in reply to: Operations on Slices taking time #9805
                        Komal Thareja
                        Moderator

                          Hi Nirmala,

                          We’re currently investigating what appears to be an unplanned outage on our Kafka service, which may be causing the slowness you’re experiencing. We are actively working on recovery, and I’ll keep you updated on our progress.

                          Best, Komal

                          in reply to: Operations on Slices taking time #9803
                          Komal Thareja
                          Moderator

                            Hi Nirmala,

                            Could you please check if you see any errors in /tmp/fablib/fablib.log ?

                            Another thing for the Post Boot config delays could be expired bastion keys. Please run the notebook jupyter-examples-*/configure_and_validate/configure_and_validate.ipynb This shall renew your bastion keys if they are expired.

                            Please remember to update the Project ID in this notebook.

                            Please let me know in case you continue to run into issues.

                            Best,

                            Komal

                            in reply to: UDP performance tuning for ubuntu 24.04 #9776
                            Komal Thareja
                            Moderator

                              Hi Jacob,

                              Take a look at this artifact. While it focuses on TCP performance, it also covers OS tuning and CPU pinning / NUMA tuning, both of which should help with your performance work.

                              One other thing worth considering is the type of NIC you’re using. Basic (virtual) NICs likely won’t give you peak performance — NIC_ConnectX-6 or NIC_ConnectX-5 would be much better candidates.

                              Best,
                              Komal

                              in reply to: node.add_fabnet() raises ResourceNotFoundError #9753
                              Komal Thareja
                              Moderator

                                Hi Arash,

                                Fix has been deployed on beyond bleeding edge container. Will be available in bleeding edge container later this evening. Please let me know if you run into any more issues. Apologies for the inconvenience.

                                Best,

                                Komal

                                in reply to: node.add_fabnet() raises ResourceNotFoundError #9752
                                Komal Thareja
                                Moderator

                                  Hi Arash,

                                  I’m looking at this will push out a fix soon.

                                  Best,

                                  Komal

                                Viewing 15 posts - 1 through 15 (of 557 total)