Forum Replies Created
- 
		AuthorPosts
- 
		
			
				
To me, it looks to be back to its old self! Thank you! .. and *now* I see the recent announcement that you’d tracked it down to a network problem. 🙂 In my experience, the FABRIC API has never been *fast*, but it’s definitely been notably slower recently in JupyterHub. I, too, am getting retries doing various operations. Specifically, this seems to be caused by timeouts to both cm.fabric-testbed.net and orchestrator.fabric-testbed.net. This has resulted in a notebook that reliably completed in maybe 17 minutes or so becoming one that unreliably completes in upwards of 30. Manually making connections to orchestrator.fabric-testbed.net from the JupyterHub terminal confirms that sometimes these connections just hang (it’s hard to say exactly what’s going on since I can’t run tcpdump in this environment, but it’s certainly the case that the TCP connection isn’t getting established). Manually making connections to orchestrator.fabric-testbed.net from elsewhere seems to reliably work just fine. tracerouting to orchestrator.fabric-testbed.net from the JupyterHub terminal gets paths like this *most* of the time: 4 ws-gw-to-hntvl-gw.ncren.net (128.109.9.22) 32.705 ms 32.858 ms 32.481 ms 32.101 ms 33.195 ms 5 renci-to-ws-gw.ncren.net (128.109.70.174) 35.452 ms 35.676 ms 35.631 ms 35.553 ms 35.630 ms 6 152.54.15.60 (152.54.15.60) 35.950 ms !X 35.669 ms !X 35.478 ms !X 35.570 ms !X 36.120 ms !X (Line 6 is orchestrator.fabric-testbed.net) However, there are sometimes timeouts starting at renci-to-ws-gw.ncren.net. Indeed, that line usually shows up fine or not all (all timeouts). 152.54.15.60/orchestrator shows occasional timeouts, which seem to be correlated with timeouts seen at renci-to-ws-gw.ncren.net. When I ran the tests from elsewhere, the path didn’t go through either of these ncren routers. I didn’t see any unusual timeouts via traceroute or ping. I don’t know what diagnosis y’all have done so far, but could this be as simple as packet loss between those two ncren routers? I can’t ping from JupyterHub and am sort of shooting in the dark, but I estimate there might be something like 3% or 4% loss there. Yup, it’s working again. Thanks! Just ran again on the Fall 2023 container (now showing FIM 1.5.4), and it appears to be back to working as expected. Thanks! 
- 
		AuthorPosts
