1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 121 through 135 (of 557 total)
  • Author
    Posts
  • in reply to: JupyterHub not starting up #8607
    Komal Thareja
    Moderator

      Notice: Kubernetes PVC Attachment Errors Due to GCP Incident (June 12, 2025)

      We are aware of an ongoing issue where some users may see errors when starting their JupyterHub environments. Affected users may encounter errors similar to:

      AttachVolume.Attach failed for volume "pvc-..." : rpc error: code = Internal desc = Failed to getDisk: googleapi: Error 503: Policy checks are unavailable., backendError

      Root cause:
      This is due to a Google Cloud Platform (GCP) service disruption that is intermittently preventing Kubernetes from attaching persistent volumes. The issue is upstream of our environment and is being actively addressed by Google (see GCP Status).

      What should you do:

      • If you encounter this error when launching your JupyterHub environment, no action is needed on your part.

      • In most cases, the issue is temporary and will resolve automatically as the underlying cloud services recover.

      • We recommend waiting a few minutes and then retrying.

      • Please avoid repeated restarts or resubmissions, as Kubernetes will continue to attempt recovery automatically.

      We will continue to monitor the situation and will update as more information becomes available. Thank you for your patience.

      Best regards,

      Komal

      in reply to: Availability of DPU-powered SmartNICs #8601
      Komal Thareja
      Moderator

        Hi Tanay,

        We are in the process of procuring them. While they may not be available for the Summer release, we are targeting an incremental release or including them in the Fall 2025 release.

        Best,

        Komal

         

        in reply to: Unable to access VMs #8594
        Komal Thareja
        Moderator

          Hi Rodrigo,

          Could you please share your slice ID and let us know how you’re trying to access the VMs—whether through Jupyter Hub or from your local environment?

          Thanks,
          Komal

          Komal Thareja
          Moderator

            Hi Fatih,

            Thank you for your email and detailed questions.

            At this time, FABRIC does not currently support guaranteed capacity or QoS prioritization on L2P2P links. The service operates as best-effort by default, and DSCP/ToS or VLAN PCP markings are not currently enforced across the underlying infrastructure.

            That said, we are actively working to support guaranteed QoS using Explicit Route Options (ERO) in the L2P2P service. This capability is planned for inclusion in our upcoming Release 1.9, targeted for deployment in late July/early August. It will provide a way to request L2P2P links with specified bandwidth guarantees and rate-limiting.

            We will share more details and guidance on how to configure these options as part of the release.

            Please feel free to reach out with any further questions in the meantime.

            Best regards,
            Komal Thareja

            in reply to: L2Bridge not forwarding packets in SALT #8575
            Komal Thareja
            Moderator

              Hi Alexander,

              Based on our investigation so far, the VMs from your slice that are not passing traffic were hosted on salt-w3.fabric-testbed.net. We’ve identified that none of the VMs on this host are able to pass traffic. As a result, we have placed this worker into Maintenance mode and are actively investigating the issue.

              You should be able to create a new slice without encountering this problem, as salt-w3 is now in Maintenance and will not be used for any new slices on the SALT site.

              Thanks,

              Komal

              in reply to: Cannot login to MASS #8569
              Komal Thareja
              Moderator

                Hi Sourya,

                MASS is undergoing maintenance from June 2 to June 4, as noted [here].

                Since your slice is set to expire on June 9, it will remain unaffected by the maintenance window. As mentioned in the announcement, your VM will be recovered, and your data will persist.

                Thanks,
                Komal

                in reply to: L2Bridge not forwarding packets in SALT #8541
                Komal Thareja
                Moderator

                  Thank you Alexander for sharing this. I have shared the details with the network team. Will keep you posted.

                  Thanks,

                  Komal

                  in reply to: Prometheus/Grafana/Node Exporter example not working? #8539
                  Komal Thareja
                  Moderator

                    Sorry, wrong post!

                    in reply to: Reserve bandwidth for a slice #8538
                    Komal Thareja
                    Moderator

                      Hi Philips,

                      At the moment, we do not support guaranteed QoS. This feature will be available soon. In the meantime, you can use tools such as tc to manage bandwidth on the VMs.

                      Thanks,
                      Komal

                      in reply to: FPGA valid sites for Esnet toolchain #8522
                      Komal Thareja
                      Moderator

                        Hi Nishant,

                        Please find my responses inline below:

                        Once a user has reserved a slice with an FPGA, that resource is locked and cannot be acquired or modified by other users until the slice is released.

                        You’re correct—if the FPGA has been flashed with a workflow other than the EsNet workflow, it may fail.

                        However, we cannot guarantee the validity or state of the bitstream that was previously flashed by another user before you acquired the slice. This may leave the FPGA in an inconsistent or unusable state. In our experience, reflashing the FPGA with a known good (golden) image typically restores it to a usable state.

                        We are planning to share this golden image along with the notebook with users soon, so they can perform the reflash themselves when needed. In the meantime, if you’re currently blocked, please let me know the specific site you’re working with—I’ll check whether we can assist with reflashing the FPGA for you.

                        Thanks,

                        Komal

                        in reply to: L2Bridge not forwarding packets in SALT #8520
                        Komal Thareja
                        Moderator

                          Hi Alex,

                          The network team reviewed the configuration and found no issues on the switch side. However, they observed that the MAC addresses for these interfaces have not been learned by the switch.

                          As a next step, they recommend removing the L2Bridge service and connecting both interfaces directly to FabNetV4 to verify if the network connectivity is restored.

                          Please perform this change using slice modify, so the same VMs and interfaces can be reused for validation. This helps us rule out the possibility that recreating the VMs might inadvertently resolve the issue.

                          Refer to this notebook for guidance on how to modify the slice.

                          Thanks,

                          Komal

                          in reply to: Unable to SSH into my Nodes #8519
                          Komal Thareja
                          Moderator

                            Could you please check your VM again?

                            All PCI devices had been disconnected. I have reconnected them to your VM. Please check it.

                            Also, could you please share the sequence of operations that lead your VM to this state?

                            It would be helpful to see if there is anything that needs to be fixed on our control software.

                            Thanks,

                            Komal

                            in reply to: Unable to SSH into my Nodes #8517
                            Komal Thareja
                            Moderator

                              Please share your slice ID and also the output of the command: ifconfig -a

                              Thanks,

                              Komal

                              in reply to: L2Bridge not forwarding packets in SALT #8512
                              Komal Thareja
                              Moderator

                                Thank you Alex for sharing this observation! I temporarily assigned IP addresses to these interfaces on r3 and 4 nodes and do not see ping working between them.

                                Network service as provisioned looks ok. I am reaching out to the network team and will keep you posted.

                                Thanks,

                                Komal

                                in reply to: Unable to SSH into my Nodes #8511
                                Komal Thareja
                                Moderator

                                  Hi Ajay,

                                  You can use the following code snippet to reboot the node:

                                  slice = fablib.get_slice(slice_name)
                                  node = slice.get_node(node_name)
                                  node.os_reboot()

                                  Also, please share your slice ID so we can take a look at it.

                                  Thanks,

                                  Komal

                                Viewing 15 posts - 121 through 135 (of 557 total)