Forum Replies Created
-
AuthorPosts
-
Re: pinging errors. Can you check both nodes and see if the IP addresses are configured?
run: ip add list
This is a race condition that happens with the existing testbed version and the currently deployed FABlib version. If you see that one of the IPs is not set then it is probably this issue.
@Xusheng
This is an error message that is less than informative but is generated when fablib validates the network type with respect to the network type (in this case one chosen automatically). A newer version of fablib (coming soon.. probably this week) handles the error messages better.
The problem is that there are a few rules dictating what combination of interfaces is possible on each type of network. In this case you trying to create a network that is not going to work given the interface types and locations.
The underlying root cause is that, currently, Basic NICs (SR-IOV virtual functions on a Mellanox ConnectX-6 card) that are participating in a wide-area L2 network are not able to pass traffic to each other if they are on the same physical host machine. We are looking for the best way to fix this but all solutions have tradeoffs.
Generally, this shouldn’t be a problem because most reasonable experiments will avoid putting a lot of nodes on a wide-area broadcast network. Instead, they will have an explicit endpoint on each end of a wide area connection and will switch or route traffic onto a local network with larger numbers of nodes. Usually, there will be exactly two node on a wide-area broadcast network, each of which is a switch or router.
You have a few options.
- Use dedicated NIC. This will work for your current request but ultimately won’t scale because there are limited number of dedicated NICs on each site. As the number of FABRIC users/experiments grows, it will be even more difficult to deploy use a lot of dedicated nics.
- Use Basic NICs and explicitly set the a different physical host for each node (see the “node.set_host(host_name)” method). This is still limited by the number of physical hosts at each site. (Also, this might require the next fablib version to work)
- Use wide-area networks connect to pairs Basic NICs on nodes configured to be switches/routers that switch/route traffic between a wide-area link and a larger local network.
- *** Use a separate network for each wide-area connection that uses Basic NICs to connect a pair of nodes. Each node may have several Basic NICs connected other nodes. You probably don’t want to use this method if you need to connect all pairs of nodes.
Given what I know about your experiment, I think this last solution is the one you want. I think you are ultimately looking to design a wide-area topology of NDN routers. I don’t think you need/want this topology to be fully connected. Your small tests might be fully connected but as you scale up you will want to design a topology were each router has small-ish number of direct connections to its neighbors. I think, creating a topology like this is what a large NDN system running on dedicated physical infrastructure would look like.
@brandon
If you just want to turn on simple forwarding you can try something like the following code. If you want a router that runs interesting router protocols you will need to run something like Quagga (https://www.quagga.net/)
# Get 3 random sites [site1,site2,site3] = fablib.get_random_sites(count=3) print(f"{[site1,site2,site3] }") #Create Slice slice = fablib.new_slice(name="MySlice") # Node1 node1 = slice.add_node(name='node1', site=site1) iface1 = node1.add_component(model='NIC_Basic', name='nic1').get_interfaces()[0] # Node2 node2 = slice.add_node(name='node2', site=site2) iface2 = node2.add_component(model='NIC_Basic', name='nic1').get_interfaces()[0] # Node3 router = slice.add_node(name='router', site=site3) router_iface1 = router.add_component(model='NIC_Basic', name='nic1').get_interfaces()[0] router_iface2 = router.add_component(model='NIC_Basic', name='nic2').get_interfaces()[0] # Networks net1 = slice.add_l2network(name='net1', interfaces=[iface1, router_iface1]) net2 = slice.add_l2network(name='net2', interfaces=[iface2, router_iface2]) #Submit Slice Request slice_id = slice.submit()
wait for boot…. then run this:
from ipaddress import ip_address, IPv4Address, IPv6Address, IPv4Network, IPv6Network #subnet1 subnet1 = IPv4Network("192.168.1.0/24") subnet1_available_ips = list(subnet1)[1:] #subnet2 subnet2 = IPv4Network("192.168.2.0/24") subnet2_available_ips = list(subnet2)[1:] #Get IPs router_ip_addr1 = subnet1_available_ips.pop(0) router_ip_addr2 = subnet2_available_ips.pop(0) node1_ip_addr = subnet1_available_ips.pop(0) node2_ip_addr = subnet2_available_ips.pop(0) #Get Slice slice = fablib.get_slice(name="MySlice") # Router router = slice.get_node(name='router') router_iface1 = router.get_interface(network_name='net1') router_iface1.ip_addr_add(addr=router_ip_addr1, subnet=subnet1) router_iface2 = router.get_interface(network_name='net2') router_iface2.ip_addr_add(addr=router_ip_addr2, subnet=subnet2) # Node1 node1 = slice.get_node(name='node1') node1_iface = node1.get_interface(network_name='net1') node1_iface.ip_addr_add(addr=node1_ip_addr, subnet=subnet1) # Node2 node2 = slice.get_node(name='node2') node2_iface = node2.get_interface(network_name='net2') node2_iface.ip_addr_add(addr=node2_ip_addr, subnet=subnet2) #Turn on forwarding in the router stdout, stderr = router.execute('sudo sysctl -w net.ipv4.ip_forward=1') print(f"{stdout}") print(f"{stderr}") #Set node1 route to subnet2 via router_ip_addr1 node1.ip_route_add(subnet=subnet2, gateway=router_ip_addr1) #Set node2 route to subnet1 via router_ip_addr2 node2.ip_route_add(subnet=subnet1, gateway=router_ip_addr2) #test ping stdout, stderr = node1.execute(f"ping -c 5 {node2_ip_addr}") print(f"{stdout}") print(f"{stderr}")
- This reply was modified 2 years, 8 months ago by Paul Ruth.
- This reply was modified 2 years, 8 months ago by Paul Ruth.
- This reply was modified 2 years, 8 months ago by Paul Ruth.
Each interface can only be attached to one network. If you want a node to attach to several networks, you will need to add several NIC components to it.
A triangle example might look like the following.
Note that you don’t need to specify the network type if you don’t want to. It will pick the one you want based on where the nodes are located. In the future there will be other properties of networks that might require you to specify a specific network type but for most cases the automatically selected type is what you want. For now it is only dependent on the number and location of the nodes. One huge advantage of doing it this way is that you can change your experiment by simply changing the site locations.
If you want to add more nodes/networks you will probably need to add more NICs to the current nodes. Keep in mind you can create a lot of NIC_Basic NICs and only a few dedicated NICs. Although you get two interfaces per dedicated NIC so you would need half as many.
# Get 3 random sites [site1,site2,site3] = fablib.get_random_sites(count=3) print(f"{[site1,site2,site3] }") #Create Slice slice = fablib.new_slice(name="MySlice1") # Node1 node1 = slice.add_node(name='node1', site=site1) iface1a = node1.add_component(model='NIC_Basic', name='nic1').get_interfaces()[0] iface1b = node1.add_component(model='NIC_Basic', name='nic2').get_interfaces()[0] # Node2 node2 = slice.add_node(name='node2', site=site2) iface2a = node2.add_component(model='NIC_Basic', name='nic1').get_interfaces()[0] iface2b = node2.add_component(model='NIC_Basic', name='nic2').get_interfaces()[0] # Node3 node3 = slice.add_node(name='node3', site=site3) iface3a = node3.add_component(model='NIC_Basic', name='nic1').get_interfaces()[0] iface3b = node3.add_component(model='NIC_Basic', name='nic2').get_interfaces()[0] # Networks net1 = slice.add_l2network(name='net1', interfaces=[iface1a, iface2a]) net2 = slice.add_l2network(name='net2', interfaces=[iface1b, iface3a]) net3 = slice.add_l2network(name='net3', interfaces=[iface2b, iface3b]) #Submit Slice Request slice_id = slice.submit()
- This reply was modified 2 years, 8 months ago by Paul Ruth.
- This reply was modified 2 years, 8 months ago by Paul Ruth.
I think I have narrowed this down to STAR and UTAH. I am able the use jumbo frames between these sites on a FABRIC L2 data plane network but not using the management network (i.e. the public Internet
What is your data plane network configuration? Which network/IPs are you using for your application? I’m wondering if you are trying to use jumbo frames across the management network. If you use the management network to connect nodes from different sites, your traffic will go over the public Internet. We probably can’t fix MTU issues on the public Internet.
You can test by trying the following command. You can find the MTU by increasing the packet size until the ping starts failing.
ping -M do -s <packet size> <destination IP>
- This reply was modified 2 years, 8 months ago by Paul Ruth.
- This reply was modified 2 years, 8 months ago by Paul Ruth.
What are the source and destination sites/hosts for this test?
There might be a configuration error in a switch somewhere. Thanks for helping us find it.
Paul
You won’t be able to say
image=<my_docker_image>
. The lowest level abstraction needs to be a VM.We are working on a feature for fablib where you will be able to say something like (this is not ready yet):
node1 = slice.add_node(name='Node1', image='default_ubuntu_20', docker='<docker_hub_image_name>')
This would create a VM with the default unbuntu 20 image then automatically run a docker container in the VM using that docker image. In general, the philosophy is that VMs are good for multiplexing hardware and containers are good for packaging software.
We are working on a way to streamline deployment of docker images into VMs on FABRIC. Watch for some coming updates to the FABLib library.
The way to do this is to boot a VM then pull the docker image from somewhere like DockerHub. You can do this yourself right now.
In fablib you can do something like:
node1.execute("sudo apt-get -y install docker.io")
node1.execute("docker pull <docker hub image name>")
node1.ececute("docker run .... ")
Note that on IPv6 site you will need specify the IPv6 docker hub registry like this:
docker pull registry.ipv6.docker.com/<image name>
More info on DockerHub and IPv6 is here: https://www.docker.com/blog/beta-ipv6-support-on-docker-hub-registry/
You can’t set the management IP. It is picked from a pool managed by OpenStack.
In general, we will discourage using the management network for experimentation. For now it is acceptable because we have not yet developed a way to peer a dataplane network with the public Internet. This service is coming soon.
For your application, is it possible to boot the node, then find the IPv6 management IP, and finally start the service? It seems that if it works with a known IPv6 address that was assigned, then it should work if you automate getting the IPv6 management IP then starting the service.
It this doesn’t seem possible, can you describe your experiment in a bit more detail? I might be able to figure out the best way to get your experiment working.
Paul
We are aware of this are working on it.
Thanks for the bug report. If you see any other issues, feel free to post them here.
This is no problem.
We might have to have a zoom call to see what is going on here.
Have you ever had a slice work? If not, something might have gone wrong with your account creation.
It’s worth mentioning that starting Monday there will be a 2 week scheduled outage for maintenance. This might need to wait until after then. There will also be a lot of updates during the outage. You problem might be fixed by then.
Paul
Are you able to run the “Hello, FABRIC” example notebook?
In the FABRIC Juypyter hub it should just work. The tokens do expire and it is possible to invalidate your token. If you lost your token you will need to log out/in to get a new token.
Paul
The bastion key is only needed when you ssh to your VMs. Invalid token is a separate issue.
Are you using the FABRIC Jupyterhub or tying to install the API on your own system?
If the token is correct then it should work. One quirk of Jupyter notebooks is that if you changed something outside of an open notebook you likely need to force the notebook to re-read the update. I often see this when I update a library (i.e pip install X) or fetch a new token.
Depending on your setup, you will at least need to restart the Jupyter kernel by clicking the button with the arrow that loops in a circle. If you are running this in a custom setup, might need to reload the browser tab that is running the notebook.
Let me know if this solves the problem.
Paul
-
AuthorPosts