Home › Forums › FABRIC General Questions and Discussion › L2Bridge without MAC learning?
- This topic has 12 replies, 5 voices, and was last updated 1 year, 3 months ago by Fraida Fund.
-
AuthorPosts
-
January 28, 2023 at 11:30 pm #3683
Hello,
It is not mentioned in the Network Services in FABRIC article, but it seems as if packets are filtered by MAC learning on the L2Bridge type network.
Is it possible to get an L2Bridge network (Ethernet service connecting multiple interfaces in a single site) without MAC learning?
- This topic was modified 1 year, 10 months ago by Fraida Fund.
January 30, 2023 at 9:48 am #3689it seems as if packets are filtered by MAC learning on the L2Bridge type network
What observation led you to this conclusion?
What are you trying to do, how it behaved, and how do you expect it to behave?
January 30, 2023 at 10:55 am #3694Well, one observation is that Paul said so elsewhere in the forum.
L2Bridge: These bridges are like a local network switch/bridge that connects any number of local nodes within a single site. These local bridges are directly connected to the nodes so your bandwidth will be limited by the maximum bandwidth of the NICs that you are using (i.e. ConnectX_6 NICs will provide 100Gbps). This bridge is not programable and only performs simple MAC learning. The key use of these bridges are that they can only connect to nodes within a single FABRIC site.
Another observation: suppose I create 4 VMs with a basic NIC on each, and connect each basic NIC to an L2Bridge-type network. I capture traffic on each of the NICs with tcpdump. A frame sent by host 1 with host 2’s address as the destination MAC only appears at the NIC on host 2, and not on host 3 or host 4.
Another observation: suppose I create a Linux switch connecting multiple hosts, using a L2Bridge between my Linux switch and each of the connected hosts. (as in e.g. this example.) Non-broadcast frames sent from the hosts don’t make it to the Linux switch interfaces, so the bridge does not work.
What I am trying to do: I am trying to connect multiple basic NIC interfaces with a network link, so that any frame sent by any NIC on the link appears at every other NIC on the link. Like the way an Ethernet segment behaves, or Ethernet segments connected by a hub, or Ethernet segments connected by a switch with MAC learning disabled.
January 30, 2023 at 11:12 am #3695NIC_Basic is a Virtual Function (VF) on the ConnectX-6 Ethernet adapter. The hardware Ethernet adapter is shared among many VFs, and it determines which VF shall receive an incoming packet by matching the destination address. Therefore, NIC_Basic cannot receive Ethernet frames whose destination address differs from its own address.
January 30, 2023 at 12:10 pm #3696Fraida,
Coincidentally, I ran into this issue recently when putting together an example that I intend to share with you in our meeting with Kate this week. I have a working prototype that looks like your example that uses a 5th VM to run a software OVS switch (https://witestlab.poly.edu/blog/basic-ethernet-switch-operation/).
There are actually a couple issues going on here that I had to work around… and its super impressive that Yoursunny identified the trickiest part.
The main issue is the one that Yoursunny pointed out related to the Basic NICs being SRIOV virtual functions on a ConnectX-6.
You can think of the ConnectX-6 as a mini-switch that uses its physical port(s) as trunks between the itself and the bigger dataplane switch. The mini-switch then has several access ports (i.e. SRIOV virtual functions) that that are passed through to the various VMs. The traffic on each of these access ports is basically a “pseudo wire” going through the ConnectX-6 between the VM and the dataplane switch. The problem is that the ConnectX-6 “mini-switch” is also doing MAC learning on the “pseudo wires” and is filtering the traffic. I think this is a unforeseen problem with our SRIOV configuration and just needs to be changed in the future. We are working on this.
The effect this has on your example is that an OVS VM that is using 4 Basic NICs connected to 4 other hosts will not see traffic sent directly to one host from another. The ARP request will go through because it is an broadcast but the ARP reply is filtered by the ConnectX-6 “mini-switch”. Without the ARPs, we don’t get very far.
My workaround is to use dedicated ConnectX-5s for the OVS switch VM (the hosts can be Basic NICs). The dedicated NIC are on access ports connected directly to the dataplane switch so there is no “mini-switch” filtering packets in between. This isn’t a great solution because it limits the degree of your OVS switch and uses a much more scarce resource type. The better long-term solution is for us to turn off MAC learning on the ConnectX-6 “mini-switches”.
I can tell you more about this later this week when we talk with Kate.
Paul
January 30, 2023 at 1:31 pm #3697Yes, let’s discuss further. I can think of a bunch of scenarios where we would want the interfaces to be in “promiscuous mode” and in some of them, it will not be practical to use dedicated interfaces (we need too many interfaces in “promiscuous mode”).
January 30, 2023 at 2:17 pm #3701We will open an internal ticket about it. The VFs are created on the worker node at boot and then given out by the Control Framework to the virtual machines and we need to check what options are set on them at creation time (typically they cannot be changed once created).
@yoursunny may be right and it may or may not be possible for us to change this behavior – we will report here once we know more. Thank you all for your feedback.
March 31, 2023 at 11:53 am #4011Hi! I wanted to follow up on this, since this functionality is used in educational materials, I am working to transition those materials ahead of the imminent retirement of InstaGENI, and I need to consider what platform to transition them to.
Is this issue expected to be fix-able? If yes, is there a rough timeline? (Is it likely to be fixed before InstaGENI is retired?)
April 1, 2023 at 4:54 pm #4020Ezra and I looked a this a while back and it seemed that the Mellanox cards were not handling these frames the way we expected given the config options that we used. We’ll need to revisit this. I’ll get back to you about this.
thanks,
Paul
August 24, 2023 at 10:30 am #5114Just to bring this back up – we are working with NVidia/Mellanox engineering support on this. Their engineers are able to reproduce the problem (which is good news). They are trying to figure out the difference between a working setup and a non-working setup.
August 24, 2023 at 10:38 am #5115Thanks, I appreciate the update!
September 18, 2023 at 11:05 am #5330Fraida – after some back-and-forth with NVIDIA/Mellanox, it appears the desired bridged virtual function forwarding behavior is not something currently supported. We are continuing to discuss alternatives with them to see if there is a solution we can support. Apologies for this issue dragging out for so long.
September 18, 2023 at 12:11 pm #5332Thanks for keeping me informed!
-
AuthorPosts
- You must be logged in to reply to this topic.