Forum Replies Created
-
AuthorPosts
-
Right, we need to do a better job explaining it. Max-w1 is a valid worker node name, they are all named <site>-w<index>.fabric-testbed.net
But also you do not need this I don’t think. Experimenters only in rare cases need to place their VM on specific workers. Most of the time you want the placement algorithm do it for you so don’t specify host.
Can you please post the pointer, we will take a look.
Remove the host parameter. I think you’re thinking it’s the hostname of your VM but it’s actually the name of the worker node you’re telling it to land on and it doesn’t exist. The name of the VM is the name of the node I think combined with its sliver id.
August 10, 2023 at 10:32 am in reply to: Reachability issues between JH and FABRIC infrastructure #4986Dear experimenters,
We believe this problem has been addressed. It was traced to a incorrect configuration of UNC routers on the path which has now been corrected. We may still run some tests but you should not be experiencing any more persistent problems.
Thank you for your analysis. We are about where you are – there is either a route flapping or, perhaps, some kind of non-trivial packet loss specific to the path from JH to RENCI (we have been testing with just curl to various hosts at RENCI and the results are the same). We have notified the MCNC NOC as well as UNC ITS and are waiting to see what they say.
The problem does not appear to manifest itself from the worker nodes hosting JH, only from the Dockers inside so we are thinking perhaps a middle-box somewhere that is dropping some connections specific to JH originating IPs because they have a high rate of transactions to our infrastructure compared to the background of regular IPs. But this is just a theory. We will continue our investigation and we apologize for the inconvenience.
We do not recommend using the portal for anything, but the simplest experiments and also for visualizing the topologies. The portal does not support the full workflow of the experiment – it only creates topologies, leaving everything else (i.e. experiment configuration) a manual step.
Regarding IPv6 – this was not a choice for us – the amount of available IPv4 space is extremely limited at the hosting locations. At many of them IPv6 was the only option. Our systems deal with this transparently, however communicating to the outside world can be sometimes problematic, because despite the fact it is 2023, GitHub still doesn’t have IPv6 presence. Most other larger sites/services do and as we go forward we expect to have fewer problems with this issue.
Regarding PSC – the site underwent a power outage on Sunday, we are putting it back together.
By far the easiest way to connect VMs between each other is to use FABNetv4 (or FABNetv6) services. These will not allow the VMs to see the outside world through those interfaces, but they will ‘just work’ regardless of which sites you are on. You really do not need to rely on L2 services as much as you did in GENI, as FABRIC L3 services require far less configuration.
You are showing the screenshot from the portal, but I am assuming you are using the notebooks to build your slices – working from the portal requires a lot of manual steps to get things running.
The connectivity to outside world problem you are referring to is most likely due to the fact that you are ending up on sites that have IPv6 management network connectivity – in this case many sites (like yum/deb repos, GitHub) are not reachable directly, in which case you need to modify your DNS to allow the use of NAT64 (there is a notebook about it and also this article discusses it).
These calls can take longish time simetimes (the results of calls are cached, so the first caller gets a delay, but others do not for a while), however for the past couple of days we are seeing some connectivity issues between our Jupyter Hub hosted in Google and the rest of the testbed manifesting as various connection retries which can also cause additional delays. We are investigating the reasons for it. It appears to be specific to the Jupyter Hub environment.
Note that the latest ‘Bleeding Edge’ container has an example of a notebook that shows how to push extra SSH public keys into a slice at creation time.
I guess I’m still not sure what you are experiencing, let’s please break it down:
1. Which notebook are you using to set up the experiment? (please be sure to specify the version of the notebooks, the type of the container you are using in Jupyter Hub and the reported version of fablib – reported as part of the table in the first cell of the notebook)
2. Are you saying it works in the notebook, but you are trying to issue (presumably the same) commands from the console and it doesn’t?
3. Which host are you trying to ping from which other host using what command? (please provide output of
ip addr list
andip route list
for each host). Please also provide the slice ID.Can you please show some command outputs and describe which addresses you are trying to ping?
This article may be useful to explain what interfaces you should expect in your VM.
Thank you for reporting this – there has been a change in the underlying Nominatim API we are using to convert addresses to latlon that we failed to notice. We will update the Fall 2023 and the Bleeding Edge containers with the updated version of the underlying library that tracks this change. The 1.4.6 container is already end-of-life – we will not update it, code in it will continue to report 0 values for lat and lon.
We’ll look into it. Certainly not anything we did on purpose.
-
AuthorPosts