1. Luca Cetino

Luca Cetino

Forum Replies Created

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • in reply to: Assistance completing the “fpga_simple_p4” tutorial notebook #7810
    Luca Cetino
    Participant

      Hello all!
      First of all, thank you both for your support and for sharing with me the tips and the knowledge about how to better proceed.
      Thanks to you, I was able to move steps forward and solve the issues that I originally described in this thread.
      Though, I profit of the last response from you @Mohammad to further understand what could be going on in the situation you described.
      As of today, I found at least one case where the solution you proposed (to re-flash the fpga with a new artifact in order to make it exit its eventual “stale” state) does not work. You can observe it in the last slice I tested my artifacts on (slice ID: b90e6974-0134-4763-ab07-ab67b9a0ff16), the FPGA in SRI is not working even if I recently programmed it with different artifacts. Again, I observe the strange behavior from the pktgen application (it only sends 2079 pkts and then stops by crashing, don’t even letting the user close it). Also, no packets are traced by the FPGA probes and none are actually forwarded by the switch I programmed it with.

      Thank you for any further leads.

      Best regards,
      Luca

      Luca Cetino
      Participant

        Hello Komal!
        Thank you for your last update, i needed some time to try figure out what is wrong with docker stack and why those packets between containers keep getting lost.
        I wasn’t able to find the reason, however I found another useful tool which seems to be reporting correctly the traffic stats on the FPGA, it is sn-cfg –tls-insecure show switch stats –zeroes from inside the container smartnic-fw.
        It helped me tracing explicitly where the packets were going and once or twice I was finally able to reconfigure the smartnic switch to have the traffic forwarded correctly.
        One last error I’d like to know more about before closing this thread would be that somehow at certain point the FPGA is losing every packet that comes from the network, they are indeed reported in the drops_ovfl_from_cmac_0_pkt_count. The cmac status is UP as they’re enabled, they have the classic configuration with one queue per physical function (which isn’t related to cmacs) however I can’t control the queues for cmac and their depths.
        That can be observed in my current slice with id b73c3d3b-86ae-428e-8f4a-40095f8d36ec.
        What could that be due to? Any suggestion is indeed very helpful!

        Thank you so much for your time and your support so far.
        Best regards,

        Luca.

        Luca Cetino
        Participant

          Hi Komal, thank you for inspecting what the error could be.
          Indeed, the fact you pointed out seemed to be related to pktgen not working correctly. I found it to be caused by the stack being launched under the “smartnic-mgr-dpdk-manual”, wich locks the FPGA and prevents any interaction with it unless pktgen application is running. I solved (on the slice in SRI) by launching pktgen and manually restarting those two containers, that can now sense the FPGA as ready. As I mentioned, thanks to this pktgen is actually working now but the FPGA behavior is not what I would expect. Again I’m able to configure and start pktgen, but no packets are received at all on the other node(s).
          I programmed the FPGA internal paths (using sn-cli tool) as described in the jupyter notebook (completely bypassing the P4 logic) but this doesn’t reflect on the traffic being correctly forwarded from host0/host1 to cmac0/cmac1.
          I’d like to follow the paths taken by the traffic inside the smartnic, but the command sn-cli probe stats is giving me all 0s. This would have been hugely helpful to check whether and where the packets are dropped by the card and have a better idea on what the solution could be.
          I am currently working on a similarly configured slice in LOSA + SEAT (ID: b73c3d3b-86ae-428e-8f4a-40095f8d36ec), there the packets don’t get lost, but the probe is not working either.

          I thank you a lot for your time and your support!
          Luca

          Luca Cetino
          Participant

            Thank you Komal for your quick response and update.
            Would it be possible to have a list of the compatible sites in order to know where it’s worth to try to lock the resources?
            Anyway, is it correct that I can proceed through every step of the setup and configuration, even if the site’s FPGA has a different bitfile flash on it? I can’t determine if that is the issue, since to me the slice in IRI doesn’t seem to work.
            Thanks,
            Luca

            Luca Cetino
            Participant

              Hi Komal,
              Thank you for your response.
              I just created a new slice and tested it with the same error in SRI, it’s still up and its ID is: 6b02473e-4df5-445b-9dd6-e437a01f78b8.
              Previously, I encountered the same issue also in KANS, GATECH, FIU, LOSA and few other sites.
              The only slice working as expected and allowing me to see the results of the traffic generation has the FPGA node located in DALL. Its ID is ae3bfdac-dad4-4705-9ad8-fb8e4ab11e30.

              By further inspections, I can see using the sn-cli tool from the esnet-smartnic-fw image, that no counter on the probe stats of the device is beingĀ  updated, everything is showing 0. This happens both when pktgen is and is not running (when I’m using the stack under the two profiles smartnic-mgr-vfio-unlock and smartnic-mgr-dpdk-manual).

              I’m still available for any other information needed, meanwhile thanks again for your support.
              Kindly,
              Luca

            Viewing 5 posts - 1 through 5 (of 5 total)