Bandwidth on FABRIC links

This topic has 5 replies, 2 voices, and was last updated 1 year, 2 months ago by Paul Ruth.

Viewing 6 posts - 1 through 6 (of 6 total)

Author

Posts
March 13, 2023 at 10:58 am #3955
Fraida Fund
Participant
Hello!

Do any of the network types on FABRIC offer better than best effort bandwidth, either on links within the same site or between sites?

What should our expectations be regarding bandwidth on FABRIC links? How does this vary depending on network interface type (real or “basic”), link type, and single-site vs multi-site?
March 13, 2023 at 4:40 pm #3956
Paul Ruth
Keymaster
Bandwidth QoS provisioning of WAN links is still being developed. Stay tuned.
- Basic NICs: The existing Basic NICs are implemented as SR-IOV virtual functions on a 100Gbps ConnectX-6. The only limitation is that the bandwidth is shared with the other Basic NICs on that port.
- ConnectX-6/5s: The dedicated ConnectX-6s come with 2 100Gpbs ports while the ConnectX-5s have 2 25Gbps ports. The dedicated ConnectX-6/5s are fully dedicate to a single VM and have full bandwidth to the switch.
Currently, there is little competition for bandwidth and you can see very nearly the full bandwidth in most cases (even with Basic NICs). This is especially true for connections that stay within a site.

WAN links vary in performance. Eventually, they will nearly all be on 100+ Gbps L1 connections owned by FABRIC. However, they are currently being deployed as fast as we can. In the mean time, many of the links are AL2S or other L2 service while we wait for the real links to be deployed. You will likely see lower bandwidth on these links.

Also, there are some quirks we are trying to work out where some SR-IOV NICs occasionally only get 25-30Gbps. It seem like they are being left in a weird state by a previous experiment. We are trying to figure out how to detect and reset these cases.

Generally, you should expect at lest 25Gbps and will often get close to 100Gbps. Note that in order to get these speeds you will need a bit more memory and cores than the default. Also, the app will need to be multi-threaded and many tools like iperf3 are single threaded even if you use ‘-P’ (https://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf/multi-stream-iperf3/)
- This reply was modified 1 year, 2 months ago by Paul Ruth.
March 14, 2023 at 4:39 pm #3960
Fraida Fund
Participant
Thanks! Could you clarify this point –

Basic NICs: The existing Basic NICs are implemented as SR-IOV virtual functions on a 100Gbps ConnectX-6. The only limitation is that the bandwidth is shared with the other Basic NICs on that port.

This means that 100 Gbps is divided by all of the Basic NICs on that port, and the port may be shared by Basic NICs across my slice but also other users’ slices? Hypothetically, if all 128 SR-IOV VFs on the port are used, then the bandwidth could max out at ~780 Mbps? (And I don’t have any visibility into how many SR-IOV VFs are on the port.)
March 14, 2023 at 5:04 pm #3961
Paul Ruth
Keymaster
The Basic NICs/VFs are all best effort through the NIC itself with a cap of 100Gbps shared between all VFs on a physical host. It is possible that each would max out at ~780Mbps but it is extremely unlikely. In practice you will likely see 10s of Gbps. Currently, your max bandwidth will likely be very close to 100Gbps.

If you want dedicated bandwidth, you will need to reserve dedicated NICs. These NICs are dedicated to your experiment and are limited only by their hardware. When QoS reservations are available they will be on the WAN links between the sites. Most of these links will be 100Gbps that can be divided and allocated to individual experiments. The “super core” links will be 1200Gbps and can be divided and allocated as well. One of the main uses of the “super core” will be to allocate many dedicated 100Gbps QoS links to individual experiments.
March 14, 2023 at 5:32 pm #3962
Fraida Fund
Participant
Thanks. Did I get this right –
- A dedicated ConnectX-6/5 has its full bandwidth within a site (even in a hypothetical situation where the site has high utilization)
- A dedicated ConnectX-6/5s is currently best effort between sites, but eventually we’ll be able to reserve bandwidth on these links between sites.
- Basic NICs have (and only ever will have) best effort, with a 780 Mbps minimum in the hypothetical where the site has high utilization.
March 14, 2023 at 5:56 pm #3963
Paul Ruth
Keymaster
Yes, that is correct.

It might be useful to add that with the SRI-OV NICs the virtual switching in the host is done in hardware. In most other cloud-ish systems the multi-tenant switching is done in software. These software switches use a lot of CPU which causes performance interference between the computation and networking (even between experiments). The SR-IOV solution should isolate the performance a bit more.

Also, it should be rare that there would be enough other active network-heavy experiments that you would see less than 10 or 20Gbps in the Basic NICs (I’m interesting testing this when we have more users). I think you could get a lot of good experiments done by using tc to rate limit your own traffic to some modest amount. (10-20 Gbps). I suspect, a lot of the network contention would be from your own experiment. Then, when you are ready, we could move you too a dedicated NIC an you could try higher bandwidths.
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.