1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 106 through 120 (of 554 total)
  • Author
    Posts
  • Komal Thareja
    Moderator

      We have scheduled maintenance from July 28 to early August. This feature is planned to be rolled out during that period and should be available afterward.

      Best regards,
      Komal
      in reply to: Error Creating Slices #8697
      Komal Thareja
      Moderator

        Hi Suhib,

        Issue has been resolved, and slice provisioning is now functioning correctly. Could you please try your slice again?

        Thanks,

        Komal

        Komal Thareja
        Moderator

          The Network AM has been restored, and slice provisioning is now functioning correctly.

          in reply to: I lost control on my slice. #8690
          Komal Thareja
          Moderator

            Hi Jiri,

            Thank you for bringing this to our attention. We’ve identified an issue with our Network AM and are actively investigating it. Apologies for the inconvenience.

            Your slice is currently in the Dead state, and all associated resources have been released. You can toggle the view on the portal to hide Dead/Closing slices if needed.

            Thanks,

            Komal

            in reply to: Error logging into Nodes #8686
            Komal Thareja
            Moderator

              Posting an update here:

              The component responsible for pushing SSH keys to bastion host encountered an error due to a network event at RENCI. This component has been restored and keys should work now.

              Geoff confirmed that his keys are working. @Suhib – Please try your slice/using the keys again and let us know if you still are running into this error.

              Thank you Geoff and Suhib for bringing this to our attention! Appreciate it!

              Best,

              Komal

              1 user thanked author for this post.
              in reply to: Error logging into Nodes #8683
              Komal Thareja
              Moderator

                Hi,

                For JH environment:

                Could you please try running the notebook: jupyter-examples-rel1.8.*/configure_and_validate/configure_and_validate.ipynb and share the output observed?

                Also, after this please try running the Hello Fabric notebook and share your observation.

                For your local setup:

                Could you please trying setting up the environment as suggested here: https://learn.fabric-testbed.net/knowledge-base/advanced-jupyter-hub/#running-fabric-containers-locally ?

                Best,

                Komal

                in reply to: Interconnection Details Between Hosts at the Same Site #8659
                Komal Thareja
                Moderator

                  Hi Fatih,

                  Could you please share your Slice ID and also what kind of NICs are you using in your slice?

                  Thanks,

                  Komal

                  in reply to: Lost network interface after rebooting of vm3 in a cluster #8623
                  Komal Thareja
                  Moderator

                    Hi Ajay,

                    Thanks for reaching out. Could you please share any details about what may have caused the VM to crash? This information will help us better understand the root cause.

                    It appears that the PCI devices were detached from your VM during the crash. I’ve gone ahead and restored the VM — you should now be able to access it and use the GPUs as expected.

                    Please let me know if you continue to face any issues.

                    Best,
                    Komal Thareja

                    in reply to: Isolated Network Environment #8618
                    Komal Thareja
                    Moderator

                      As a quick follow-up:

                      In addition to SmartNIC reservations, FABRIC also supports CPU pinning and NUMA tuning options, which can help further minimize resource contention on shared hosts. While these do not fully isolate you from other users on the physical host, they can significantly reduce interference for CPU and memory-bound workloads.

                      You can find working examples demonstrating how to request CPU pinning and NUMA-optimized resources in the FABRIC Jupyter notebook examples repository:

                      These examples show how you can specify pinned CPUs and memory placement via FABlib when creating your slice.

                      Please feel free to reach out if you’d like any assistance setting this up.

                      Best,

                      Komal

                      in reply to: Isolated Network Environment #8617
                      Komal Thareja
                      Moderator

                        Dear Fatih,

                        Thank you for reaching out.

                        At present, there is no way to prevent other users from having VMs on the same physical host unless you are able to reserve the entire host (when host doesn’t have other allocations). However, one option you may consider is requesting SmartNIC-based resources (such as CX6 or CX5). When you reserve a SmartNIC, the NIC is dedicated exclusively to your slice, ensuring that only your experiment’s traffic passes through that NIC. While this does not isolate CPU or memory resources on the host, it can minimize potential interference on the data plane network traffic.

                        Please let us know if you would like assistance with requesting such resources or if you have any further questions.

                        Best regards,
                        Komal Thareja

                        in reply to: JupyterHub not starting up #8616
                        Komal Thareja
                        Moderator

                          Jupyter Hub is back up and accessible. You should be able to use JH containers. GCP outage has been resolved.

                          Refer https://learn.fabric-testbed.net/forums/topic/out/ for more details.

                          Thanks,

                          Komal

                          Komal Thareja
                          Moderator

                            Update: JupyterHub Access Restored — GCP Incident Resolved

                            The earlier Google Cloud Platform service disruption that was affecting JupyterHub logins and volume attachments has now been fully resolved. As of now, users should be able to log in and start their JupyterHub environments normally.

                            The root cause of the issue was a Google Cloud Service Control incident that intermittently prevented volume attachments across multiple GCP services. Full details of the incident are available here:
                            🔗 GCP Incident Summary (June 12, 2025)

                            If you continue to encounter any issues starting your environment:

                            • Try restarting your server from the JupyterHub control panel.

                            • If the problem persists, please feel free to reach out to us.

                            Thank you for your patience while this upstream issue was being addressed.

                            Best regards,

                            Komal

                            in reply to: JupyterHub not starting up #8607
                            Komal Thareja
                            Moderator

                              Notice: Kubernetes PVC Attachment Errors Due to GCP Incident (June 12, 2025)

                              We are aware of an ongoing issue where some users may see errors when starting their JupyterHub environments. Affected users may encounter errors similar to:

                              AttachVolume.Attach failed for volume "pvc-..." : rpc error: code = Internal desc = Failed to getDisk: googleapi: Error 503: Policy checks are unavailable., backendError

                              Root cause:
                              This is due to a Google Cloud Platform (GCP) service disruption that is intermittently preventing Kubernetes from attaching persistent volumes. The issue is upstream of our environment and is being actively addressed by Google (see GCP Status).

                              What should you do:

                              • If you encounter this error when launching your JupyterHub environment, no action is needed on your part.

                              • In most cases, the issue is temporary and will resolve automatically as the underlying cloud services recover.

                              • We recommend waiting a few minutes and then retrying.

                              • Please avoid repeated restarts or resubmissions, as Kubernetes will continue to attempt recovery automatically.

                              We will continue to monitor the situation and will update as more information becomes available. Thank you for your patience.

                              Best regards,

                              Komal

                              in reply to: Availability of DPU-powered SmartNICs #8601
                              Komal Thareja
                              Moderator

                                Hi Tanay,

                                We are in the process of procuring them. While they may not be available for the Summer release, we are targeting an incremental release or including them in the Fall 2025 release.

                                Best,

                                Komal

                                 

                                in reply to: Unable to access VMs #8594
                                Komal Thareja
                                Moderator

                                  Hi Rodrigo,

                                  Could you please share your slice ID and let us know how you’re trying to access the VMs—whether through Jupyter Hub or from your local environment?

                                  Thanks,
                                  Komal

                                Viewing 15 posts - 106 through 120 (of 554 total)