1. James McCauley

James McCauley

Forum Replies Created

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • in reply to: Takes long time for complete the Fablib API #4990
    James McCauley
    Participant

      To me, it looks  to be back to its old self!  Thank you!

      in reply to: Takes long time for complete the Fablib API #4973
      James McCauley
      Participant

        .. and *now* I see the recent announcement that you’d tracked it down to a network problem. 🙂

        in reply to: Takes long time for complete the Fablib API #4972
        James McCauley
        Participant

          In my experience, the FABRIC API has never been *fast*, but it’s definitely been notably slower recently in JupyterHub.

          I, too, am getting retries doing various operations. Specifically, this seems to be caused by timeouts to both cm.fabric-testbed.net and orchestrator.fabric-testbed.net.

          This has resulted in a notebook that reliably completed in maybe 17 minutes or so becoming one that unreliably completes in upwards of 30.

          Manually making connections to orchestrator.fabric-testbed.net from the JupyterHub terminal confirms that sometimes these connections just hang (it’s hard to say exactly what’s going on since I can’t run tcpdump in this environment, but it’s certainly the case that the TCP connection isn’t getting established).

          Manually making connections to orchestrator.fabric-testbed.net from elsewhere seems to reliably work just fine.

          tracerouting to orchestrator.fabric-testbed.net from the JupyterHub terminal gets paths like this *most* of the time:

           4  ws-gw-to-hntvl-gw.ncren.net (128.109.9.22)  32.705 ms  32.858 ms  32.481 ms  32.101 ms  33.195 ms
           5  renci-to-ws-gw.ncren.net (128.109.70.174)  35.452 ms  35.676 ms  35.631 ms  35.553 ms  35.630 ms
           6  152.54.15.60 (152.54.15.60)  35.950 ms !X  35.669 ms !X  35.478 ms !X  35.570 ms !X  36.120 ms !X
          

          (Line 6 is orchestrator.fabric-testbed.net)

          However, there are sometimes timeouts starting at renci-to-ws-gw.ncren.net. Indeed, that line usually shows up fine or not all (all timeouts). 152.54.15.60/orchestrator shows occasional timeouts, which seem to be correlated with timeouts seen at renci-to-ws-gw.ncren.net.

          When I ran the tests from elsewhere, the path didn’t go through either of these ncren routers. I didn’t see any unusual timeouts via traceroute or ping.

          I don’t know what diagnosis y’all have done so far, but could this be as simple as packet loss between those two ncren routers? I can’t ping from JupyterHub and am sort of shooting in the dark, but I estimate there might be something like 3% or 4% loss there.

          in reply to: leaflet broken? #4971
          James McCauley
          Participant

            Yup, it’s working again.  Thanks!

            in reply to: Recent regression: no geographical locations? #4922
            James McCauley
            Participant

              Just ran again on the Fall 2023 container (now showing FIM 1.5.4), and it appears to be back to working as expected. Thanks!

            Viewing 5 posts - 1 through 5 (of 5 total)