1. Nishanth Shyamkumar

Nishanth Shyamkumar

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 32 total)
  • Author
    Posts
  • in reply to: Bluefield3 external connectivity issue #9480
    Nishanth Shyamkumar
    Participant

      Hi Komal,

      I tested the linked notebook on my end and I was able to successfully resolve the DNS and access external websites from the ARM host on SEAT, which was earlier failing.

      The only change I needed to do on my end was to add:

      stdout, stderr = node1.execute(“sudo apt-get update”)

      before running,

      if ip.version == 4:
      stdout, stderr = node1.execute(“sudo ./node_tools/bf3_rshim.sh –mode ipv4”)
      else:
      stdout, stderr = node1.execute(“sudo ./node_tools/bf3_rshim.sh –mode ipv6”)

      I will integrate this script invocation into the existing notebook I have and check if it continues to work as expected. If there are any related issues, I will post them here. Thanks.

       

      in reply to: Bluefield3 external connectivity issue #9459
      Nishanth Shyamkumar
      Participant

        Thanks. Yes, I can see that DALL and SEAT use IPv6 addresses when requesting the DNS resolution from the Host(x86) system.

        However, the Bluefield ARM system communicates with the Host via IPv4. There is an IPv6 link local address on the interface as well, but it doesn’t seem to be an open communication channel. As a result, the IPv4 packets only hit the iptables, and not the ip6tables, so the upstream NAT64 never happens either.

        Essentially the problem is that an IPv6 packet never arrives at the interface used to communicate between the Host and Bluefield ARM, so it cannot be forwarded to the external global internet facing interface for DNS resolution.

        Can you list the sites that sit at the intersection of Bluefield3 availability and is on an IPv4 only network such as FIU. For the time being, until the IPv6 connectivity can be debugged, it will allow development to proceed.

         

         

        in reply to: Bluefield DPL pull failing due to timeout #9171
        Nishanth Shyamkumar
        Participant

          Thanks Komal, with the changes you mentioned it successfully executed the nvcr.io pull step.

          in reply to: Bluefield DPL pull failing due to timeout #9168
          Nishanth Shyamkumar
          Participant

            Hi Komal,

            Were you able to figure out if this is a FABRIC problem? I tried with another site SEAT, and it failed at the same operation.

            in reply to: Bluefield DPL pull failing due to timeout #9166
            Nishanth Shyamkumar
            Participant

              Hi Komal,

              It is running on DALL.

              in reply to: FPGA valid sites for Esnet toolchain #8524
              Nishanth Shyamkumar
              Participant

                Just as a follow up to this, were there any changes made to LOSA, that might explain why the bitfile is working now?

                Was it re-flashed over the last 2-3 days ?

                in reply to: FPGA valid sites for Esnet toolchain #8523
                Nishanth Shyamkumar
                Participant

                  Hi Komal,

                  As of today I am able to run the FPGA bitfile in TACC and LOSA without any errors. If I end up facing issues with these sites or if I need another site that ends up failing, I will inform the team here. Thanks.

                  in reply to: FPGA valid sites for Esnet toolchain #8501
                  Nishanth Shyamkumar
                  Participant

                    Please find attached the logs of the latest error from running it on LOSA.

                    in reply to: FPGA valid sites for Esnet toolchain #8500
                    Nishanth Shyamkumar
                    Participant

                      Hi Komal,

                      Thanks for the information. I understand what you are saying, but I would like some clarification on it:
                      1) When a user acquires the FPGA and flashes their own binary onto it, then am I right in understanding that no one other user can flash binaries onto that FPGA as long as my slice is active?  So the acquisition of the FPGA via the slice is in effect a lock?
                      2) The toolchain will stay consistent as mentioned in the attachment. That is, if I have a bitfile generated using the Esnet toolchain, and run on a site that says it has Esnet support, then assuming the bitfile is not corrupted, the flash should succeed. Similarly if I use my Esnet toolchain generated bitfile and try to flash it onto a site supporting NEU or XDMA toolchains , it should always fail correct ?

                      Right now, what I see is my bitfiles work on TACC. So they are valid bitfiles. However, when I try to run it on other sites that say they support the Esnet toolchain, the same bitfiles are not flashed correctly, and the health checks fail.

                      in reply to: FPGA valid sites for Esnet toolchain #8495
                      Nishanth Shyamkumar
                      Participant

                        Hi Komal,

                        Thanks for sharing the latest list of sites supported by the Esnet toolchain.

                        You mentioned the following,

                        “Kindly note that users have the ability to flash their own binaries, so the actual state of the infrastructure may differ from what is captured in the attached sheet”

                        I didn’t understand what you meant. Could you elaborate further?

                        in reply to: Unable to connect to http://linux.mirrors.es.net/ubuntu #8494
                        Nishanth Shyamkumar
                        Participant

                          Yes, it’s working now for me as well.

                          It seems to have been a transient issue. Thanks.

                          in reply to: Tofino bf_switchd process gets killed. #8463
                          Nishanth Shyamkumar
                          Participant

                            @yoursunny, Yes, the SIGHUP is sent when the user closes the terminal.
                            I was confused because I certainly wasn’t doing anything, so how was it getting generated. Now it makes sense, the node.execute_thread for this specific interactive mode, has an SSHClientInteraction which terminates if it doesn’t see the prompt after the timeout.

                            @Komal, I think just informing the user that bf_switchd will exit after 300seconds / timeout seconds, by adding extra information to the comment

                            # Keep the session open to prevent exit

                            should be enough guidance for us to increase the timeout as required.

                            in reply to: bastion key fails authentication #8252
                            Nishanth Shyamkumar
                            Participant

                              Hi,

                              I regenerated a fresh new keypair and it works now. Thanks.

                              in reply to: Infrastructure-metrics queries #7888
                              Nishanth Shyamkumar
                              Participant

                                Hi,

                                A follow up question on this,
                                1) Does this mean that HC always holds the correct value of that counter ?
                                2) What happens to non-HC counters when it exceeds 32 bits? Does it get set to 2^32 – 1, or does it overflow and we see the remainder (true value % (2^32)) in this field ?

                                in reply to: How to use long-lived tokens in experiments #7283
                                Nishanth Shyamkumar
                                Participant

                                  Thanks Komal, I tested it and it is working without any issues after the update.

                                Viewing 15 posts - 1 through 15 (of 32 total)