Home › Forums › FABRIC General Questions and Discussion › Questions about node ports and bandwidth between sites
- This topic has 6 replies, 2 voices, and was last updated 3 years, 5 months ago by Paul Ruth.
-
AuthorPosts
-
August 3, 2021 at 1:14 pm #475
I noticed that each node has two ports: eth0 and eth1, as shown below.
I am wondering what’s the difference between them? And which one should I use for the experiment?
I have this question because by using iperf, I noticed the bandwidth between two nodes at the same site can reach 10Gbps. But if two nodes are located at different sites, the bandwidth is very low. I guess I may not set them correctly. (Or is it due to the difference between “L2Bridge” and “L2STS”?)
Any comments would be highly appreciated!
August 3, 2021 at 1:34 pm #477Generally, eth0 is used as a management interface. This is the network that you use when you ssh to the node from the Internet. You should avoid using this network for experiments.
The interfaces numbered eth1 (or higher) will be the ones associated with the network component(s) that you have added to your node. These are the ones you should use for experiments.
Re: the slow performance of the experimental network. Our initial deployment does not yet use the dedicated L1 circuits that we will have as they become available. Instead it uses I2 AL2S. However, even with AL2S you should be able to get much higher bandwidth. I would expect you could get over 10Gbps (maybe even as much at 100Gbps). There are a couple of possible issues:
- Our network deployment needs to be configured/tuned correctly. This is such low bandwidth that I suspect something in the path is dropping packets. What slice configuration did you use? I assume you have one node at UKY and one at LBNL, is this true? Also, which components did you include on the nodes?
- Your end hosts need to be tuned for high-latency, high-bandwidth data transfers. From the ifconfig info I can see that your nodes are not using jumbo frames. There are probably some other tuning optimization you can make. ESnet has a great resource for learning about this: https://fasterdata.es.net/host-tuning/linux/
Please let us know which components you are using in the VMs. I would like to try this myself and see if I have the same issues.
Paul
August 3, 2021 at 3:02 pm #478Update:
I tried this myself and was able to get ~6Gbps but only after tuning as suggested by the ESnet site. I did this with VMs at UKY and LBNL. Both VMs were bigger than the default (32 cores, 64G ram… this is probably bigger than necessary).
I also found that jumbo frames is not yet possible between these sites. We are working on making this possible soon.
August 4, 2021 at 8:01 pm #511I used the management interface, eth0, for the testing. I believe it would be the case why the connection is so slow.
Could you share with me some resources about how to configure eth1 correctly? I don’t know how to use eth1 at one VM to connect eth1 at another VM.
In addition, could you share with me some resources about the difference between “L2Bridge” and “L2STS”? Which one should I use in the test?
August 6, 2021 at 4:17 pm #516The best examples will be in the Jupyterhub environment. We have pre-installed a suite of example notebooks from the following github repo. The notebooks are currently in development and will improve over time. You may need to do a git pull on the repo to get the newest example notebooks.
https://github.com/fabric-testbed/jupyter-examples
Specifically, the examples in this folder will be useful:
https://github.com/fabric-testbed/jupyter-examples/tree/master/fabric_examples/basic_examples
These notebooks are very new and maybe incomplete. I will try to update the notebooks today with the most current information.
More generally, there are 3 types of layer2 “Network Services” on FABRIC (layer3 services are still in development):
- L2Bridge: These bridges are like a local network switch/bridge that connects any number of local nodes within a single site. These local bridges are directly connected to the nodes so your bandwidth will be limited by the maximum bandwidth of the NICs that you are using (i.e. ConnectX_6 NICs will provide 100Gbps). This bridge is not programable and only performs simple MAC learning. The key use of these bridges are that they can only connect to nodes within a single FABRIC site.
- L2P2P (Peer-to-peer): These are peer-to-peer layer2 circuits that connect exactly 2 nodes. These nodes must be on different FABRIC sites. These circuits will have user specified QoS. QoS is currently in development and is not yet available. Once QoS is is available, user will be able to request dedicated bandwidths on these circuits.
- L2S2S (Site-to-site): Site-2-site is a hybrid of L2Bridge and L2P2P with some limitations. With S2S a user gets a pair of L2Bridges on different FABRIC sites that have a circuit connecting them. Any number of nodes on either of the sites can be connected to the single S2S Network Service. All nodes connected to the S2S service are on the same L2 network and can use the same layer3 subnet. One big limitation of S2S is that the wide are circuit will be best effort and will not have any guaranteed QoS. If you want guaranteed QoS, you will need to use L2P2P and setup your own routing or switching on each end.
Although the notebooks include examples of how to configure and use the data plane networks, the context below might be helpful.
When you add an interface to a network service you have a couple options. The interfaces can be tagged or untagged with VLANs. By default adding an interface to a network service will result in an untagged interface.
- untagged interfaces: These behave like an access port. In other words, any VLAN FABRIC assigned tags will be stripped from the layer2 traffic before it is passed to the user’s node. The user’s node will not need to process VLAN tags.
- tagged interfaces: The interfaces behave lit a trunk port. VLAN tags are left on the l2 traffic that enters the node. In this case, the user must process the VLAN tags with in the node. FABRIC allows the user to specify the VLAN tag that should be on the traffic as it enters the node. There are separate name spaces for these tags per interface so users are free to use any VLAN tag they wish and there will be no conflict with other users.
Tagged example that applies VLAN tag 200:
n1.add_component(model_type=ComponentModelType.SmartNIC_ConnectX_6, name='n1-nic1') n2.add_component(model_type=ComponentModelType.SmartNIC_ConnectX_5, name='n2-nic1') n1_iface=n1.interface_list[0] n2_iface=n2.interface_list[0] t.add_network_service(name='ptp1', nstype=ServiceType.L2PTP, interfaces=[n1_iface, n2_iface]) if_labels = n1_iface.get_property(pname="labels") if_labels.vlan = "200" i.set_properties(labels=if_labels) if_labels = n2_iface.get_property(pname="labels") if_labels.vlan = "200" i.set_properties(labels=if_labels)
Let me know if this helps
August 7, 2021 at 6:05 am #536Thanks for your so detailed explanation!
If I understand it correctly (Please correct me if I misunderstood this : ) :
- After I using L2Bridge to connect to two interfaces at two nodes
- the two “eth1” interfaces in these nodes are “physical” connected by a “cable”.
- Therefore, I can assign IP/subnet mask to each “eth1”, and manually set route in each node.
- Then, I can ping from one node to another node through “eth1”.
- If two nodes are located at different sites, I should use L2P2P (with QoS).
- If I have more than two nodes at two sites, I should use L2S2S (without QoS).
August 9, 2021 at 1:35 pm #537Yes, that is all correct. The only additional thing to think about is if you add VLAN tags to the interfaces. If there are VLAN tags you need create the virtual interfaces inside the VM.
Also, if you are using a L2P2P you must use a VLAN tag. If you are not seeing traffic this could be why.
Paul
-
AuthorPosts
- You must be logged in to reply to this topic.