1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 431 total)
  • Author
    Posts
  • in reply to: L2Bridge not forwarding packets in SALT #8541
    Komal Thareja
    Participant

      Thank you Alexander for sharing this. I have shared the details with the network team. Will keep you posted.

      Thanks,

      Komal

      in reply to: Prometheus/Grafana/Node Exporter example not working? #8539
      Komal Thareja
      Participant

        Sorry, wrong post!

        in reply to: Reserve bandwidth for a slice #8538
        Komal Thareja
        Participant

          Hi Philips,

          At the moment, we do not support guaranteed QoS. This feature will be available soon. In the meantime, you can use tools such as tc to manage bandwidth on the VMs.

          Thanks,
          Komal

          in reply to: FPGA valid sites for Esnet toolchain #8522
          Komal Thareja
          Participant

            Hi Nishant,

            Please find my responses inline below:

            Once a user has reserved a slice with an FPGA, that resource is locked and cannot be acquired or modified by other users until the slice is released.

            You’re correct—if the FPGA has been flashed with a workflow other than the EsNet workflow, it may fail.

            However, we cannot guarantee the validity or state of the bitstream that was previously flashed by another user before you acquired the slice. This may leave the FPGA in an inconsistent or unusable state. In our experience, reflashing the FPGA with a known good (golden) image typically restores it to a usable state.

            We are planning to share this golden image along with the notebook with users soon, so they can perform the reflash themselves when needed. In the meantime, if you’re currently blocked, please let me know the specific site you’re working with—I’ll check whether we can assist with reflashing the FPGA for you.

            Thanks,

            Komal

            in reply to: L2Bridge not forwarding packets in SALT #8520
            Komal Thareja
            Participant

              Hi Alex,

              The network team reviewed the configuration and found no issues on the switch side. However, they observed that the MAC addresses for these interfaces have not been learned by the switch.

              As a next step, they recommend removing the L2Bridge service and connecting both interfaces directly to FabNetV4 to verify if the network connectivity is restored.

              Please perform this change using slice modify, so the same VMs and interfaces can be reused for validation. This helps us rule out the possibility that recreating the VMs might inadvertently resolve the issue.

              Refer to this notebook for guidance on how to modify the slice.

              Thanks,

              Komal

              in reply to: Unable to SSH into my Nodes #8519
              Komal Thareja
              Participant

                Could you please check your VM again?

                All PCI devices had been disconnected. I have reconnected them to your VM. Please check it.

                Also, could you please share the sequence of operations that lead your VM to this state?

                It would be helpful to see if there is anything that needs to be fixed on our control software.

                Thanks,

                Komal

                in reply to: Unable to SSH into my Nodes #8517
                Komal Thareja
                Participant

                  Please share your slice ID and also the output of the command: ifconfig -a

                  Thanks,

                  Komal

                  in reply to: L2Bridge not forwarding packets in SALT #8512
                  Komal Thareja
                  Participant

                    Thank you Alex for sharing this observation! I temporarily assigned IP addresses to these interfaces on r3 and 4 nodes and do not see ping working between them.

                    Network service as provisioned looks ok. I am reaching out to the network team and will keep you posted.

                    Thanks,

                    Komal

                    in reply to: Unable to SSH into my Nodes #8511
                    Komal Thareja
                    Participant

                      Hi Ajay,

                      You can use the following code snippet to reboot the node:

                      slice = fablib.get_slice(slice_name)
                      node = slice.get_node(node_name)
                      node.os_reboot()

                      Also, please share your slice ID so we can take a look at it.

                      Thanks,

                      Komal

                      in reply to: FPGA valid sites for Esnet toolchain #8499
                      Komal Thareja
                      Participant

                        Thank you for your question.

                        What I meant is that once an FPGA is initially flashed with a provided bitstream, users can reflash it with a different bitstream of their choice—as long as the PCIe interface remains unchanged. Because of this flexibility, the actual state of the FPGA at a given site may differ from what’s shown in the shared sheet, depending on whether a user has reprogrammed it.

                        Best,

                        Komal

                        in reply to: Testing BitTorrent and IPFS #8498
                        Komal Thareja
                        Participant

                          Thank you for your feedback, Philip!

                          You’re absolutely right—node.add_fabnet() attaches the FabNetV4 service to the node, enabling communication with other nodes over FABRIC’s data plane network via the FabNetV4 interface.

                          In addition, all VMs provisioned in FABRIC are assigned a Management IP for administrative purposes. This interface allows inbound SSH access and supports outbound connections, including those required for operations like docker pull. However, please note that the management network is actively monitored and any torrent or insecure traffic may be flagged. Such activity can lead to enforcement actions, including possible slice termination. As a best practice, we recommend not using the management network for experimental traffic.

                          Best,

                          Komal

                          in reply to: Testing BitTorrent and IPFS #8493
                          Komal Thareja
                          Participant

                            Thank you for your inquiry Philip.

                            You are welcome to conduct experiments involving IPFS or BitTorrent on FABRIC, particularly for evaluating peer discovery and data transfer between FABRIC nodes. This type of testing is permissible as long as it is confined to FABnet or a custom Layer 2 network within the FABRIC infrastructure.

                            We kindly request that your experiment not initiate connections to external BitTorrent or IPFS servers outside the FABRIC environment.

                            Please feel free to reach out if you need any assistance with the experiment setup or have further questions.

                            Best regards,

                            Komal

                            in reply to: FPGA valid sites for Esnet toolchain #8478
                            Komal Thareja
                            Participant

                              Hi Nishanth,

                              Please find enclosed the most recent known status. Kindly note that users have the ability to flash their own binaries, so the actual state of the infrastructure may differ from what is captured in the attached sheet. As a first step toward addressing this, we are working to include notebook and Control Framework support in Release 1.9, enabling users to flash FPGAs within their workflows directly.

                              Thanks,

                              Komal

                              in reply to: Slice showing as StableOK but is actually closed #8462
                              Komal Thareja
                              Participant

                                Hi Anthony,

                                Regarding your slice: a5d2fff2-84fc-48d9-8d67-5ff96e120273
                                Start: 2025-04-18 14:53:43 +0000
                                End: 2025-05-02 14:53:42 +0000

                                A renew operation was attempted for this slice, but it failed for the VM due to insufficient resources: ['core'].

                                Please note that we now support advance reservations, which allow users to reserve resources ahead of time. As a result, a renew request may fail if it conflicts with an existing advance reservation — which appears to be the case here.

                                It’s unclear how the renew was initiated, but if it was done through JupyterHub, the error would have been reported to the user. We suspect there may be a bug on the portal side where this error is not being surfaced correctly, and we will investigate and address that.

                                Unfortunately, the only available option at this point is to re-create the slice. We apologize for the inconvenience.

                                Thanks,

                                Komal

                                in reply to: Tofino bf_switchd process gets killed. #8460
                                Komal Thareja
                                Participant

                                  Hi Nishanth,

                                  Thank you for sharing this.

                                  Please note that the current implementation of execute_thread maintains the process only for the duration of the specified timeout. As you correctly observed, for longer-running processes, directly accessing the switch via SSH allows you to manually launch switchd.

                                  We will work on enhancing execute_thread to better support this use case and will keep you informed once the update is available.

                                  Thanks,

                                  Komal

                                   

                                Viewing 15 posts - 1 through 15 (of 431 total)