Home › Forums › FABRIC General Questions and Discussion › Difference between throughput after maintenance
Tagged: evaluation, Throughput
- This topic has 6 replies, 2 voices, and was last updated 11 months, 2 weeks ago by Paul Ruth.
-
AuthorPosts
-
January 12, 2024 at 12:46 pm #6325
I have noticed that after maintaining/updating FABlib to version 1.6, the throughput between some nodes/sites is experiencing a type of limitation.
Below is the link to a document (GDocs) showing the topology and tests carried out yesterday (01/11/2024) using iperf3.
https://docs.google.com/document/d/1DSZDqPEBVjQ_we717I_5n0nedCiT0JqbKwAD7-K0LaQ/edit?usp=sharing
Is there any new feature that could cause this limitation?
January 16, 2024 at 8:05 am #6341I recreated the same slice and got the following results.
h1 > r1 [ 5] 0.00-0.94 sec 50.2 MBytes 448 Mbits/sec 9 sender [ 5] 0.00-0.98 sec 48.7 MBytes 416 Mbits/sec receiver r1 > r2 [ 5] 0.00-10.43 sec 50.6 MBytes 40.7 Mbits/sec 44 sender [ 5] 0.00-10.47 sec 46.8 MBytes 37.5 Mbits/sec receiver r1 > r3 [ 5] 0.00-82.03 sec 50.1 MBytes 5.13 Mbits/sec 356 sender [ 5] 0.00-82.07 sec 49.8 MBytes 5.09 Mbits/sec receiver r1 > r5 [ 5] 0.00-454.26 sec 50.1 MBytes 925 Kbits/sec 3455 sender [ 5] 0.00-454.30 sec 49.8 MBytes 919 Kbits/sec receiver r2 > r3 [ 5] 0.00-0.37 sec 50.8 MBytes 1.17 Gbits/sec 0 sender [ 5] 0.00-0.41 sec 50.0 MBytes 1.03 Gbits/sec receiver r3 > r4 [ 5] 0.00-1.16 sec 50.2 MBytes 362 Mbits/sec 0 sender [ 5] 0.00-1.21 sec 49.4 MBytes 343 Mbits/sec receiver r4 > r5 [ 5] 0.00-0.48 sec 51.2 MBytes 892 Mbits/sec 0 sender [ 5] 0.00-0.52 sec 50.2 MBytes 811 Mbits/sec receiver r4 > h2 [ 5] 0.00-0.13 sec 51.0 MBytes 3.34 Gbits/sec 0 sender [ 5] 0.00-0.17 sec 49.7 MBytes 2.52 Gbits/sec receiver
January 16, 2024 at 1:29 pm #6345Edgard,
I don’t think there would be anything that would limit you to bandwidth that low. All but a few sites should support 100 Gbps (a few can only provide 10 Gpbs). I would expect much higher bandwidth than you are seeing. Even using multiple software routes, I would expect 10s of Gbps.
What NIC types are you using?
What VM size are you using?
How are you forwarding traffic in you routers?
Are you tuning the TCP/IP configuration of your nodes (congestion control algorithm, MTU, buffer sizes, etc)?
Also, are you pinning your nodes to the NIC’s NUMA domain? NUMA pinning example.
Paul
- This reply was modified 1 year ago by Paul Ruth.
January 16, 2024 at 2:22 pm #6347Hi Paul, thanks for answering me!
What NIC types are you using?
I’m using
NIC_ConnectX_5
NICs on this test.What VM size are you using?
All nodes are
default_rocky_8
with 2 cores and 8 GB RAM.How are you forwarding traffic in you routers?
On these tests, I’m using static routes and different routes with TOS. Basically, all tests are made with iperf3 (TCP).
Are you tuning the TCP/IP configuration of your nodes (congestion control algorithm, MTU, buffer sizes, etc)?
We are investigating congestion control with the Cubic and BBR algorithms. These tests are using Cubic. All other settings have not been changed.
Also, are you pinning your nodes to the NIC’s NUMA domain? NUMA pinning example.
One of the main objectives of this experiment, in addition to investigating congestion control, is the replication property of all tests.
Apparently, in the topology presented, the main bottleneck is the node in Los Angeles (LOSA).
January 16, 2024 at 6:17 pm #6360One more question… were able to replicated this before? By replicate I mean run it once in one slice, then delete that slice and run it again in a new slice.
I think the main issue here is combination of VMs that are too small (memory/cores) to achieve good bandwidth and that you are not pinning cores to NUMA domains. Without pinning you will not likely get repeatable performance. The issue is that if your VM cores are not in the same NUMA domain as your NIC, you will get worse performance. This is especially true for the router nodes. When you create a slice, your virtual cores will float between the available physical cores. Since there are other users on the host, you will not know anything about the placement of your virtual cores.
I suggest using much larger VMs and pinning the cores to the appropriate NUMA domains.
One more thing, which version of iPerf3 are you using? The iPerf3 that is available in most linux repos is single threaded. I recommend using the new version suported by ESnet (https://github.com/esnet/iperf).
February 7, 2024 at 4:15 am #6537Hi Paul! Thanks for the feedback.
Sorry for the delay in responding to you. Our group was developing an article with the theme “Sliced WANs for Data-Intensive Science” and all experiments were carried out in FABRIC. After we submitted the paper, I found some time to go back to running other experiments.
I split the previous topology [LINK] into two different ones.
Topology 1: SEAT (h1), MASS (r1), SALT (r2), STAR (r3), NEWY (h2).
Topology 2: LOSA (h1), DALL (r1), ATLA (r2), WASH (r3), NEWY (h2).In topology 2, I obtained the following results (100MB each TCP flow, iPerf v. 3.5.0):
h1 > r1 [ 5] 0.00-230.51 sec 100 MBytes 3.64 Mbits/sec 697 sender [ 5] 0.00-230.55 sec 99.6 MBytes 3.62 Mbits/sec receiver r1 > r2 [ 5] 0.00-0.69 sec 101 MBytes 1.23 Gbits/sec 0 sender [ 5] 0.00-0.73 sec 100 MBytes 1.15 Gbits/sec receiver r2 > r3 [ 5] 0.00-0.53 sec 101 MBytes 1.60 Gbits/sec 0 sender [5] 0.00-0.57 sec 99.5 MBytes 1.47 Gbits/sec receiver r3 > h2 [ 5] 0.00-0.22 sec 100 MBytes 3.88 Gbits/sec 172 sender [ 5] 0.00-0.26 sec 99.4 MBytes 3.26 Gbits/sec receiver h1 > h2 [ 5] 0.00-459.85 sec 100 MBytes 1.83 Mbits/sec 740 sender [ 5] 0.00-459.92 sec 99.5 MBytes 1.82 Mbits/sec receiver
Again, results that pass through LOSA show a decrease in the transmission rate.
One question I still have: is whether there is a difference between L2 overlay L2STS and L2PTP for throughput tests?
- This reply was modified 11 months, 2 weeks ago by Edgard da Cunha Pontes.
- This reply was modified 11 months, 2 weeks ago by Edgard da Cunha Pontes.
February 8, 2024 at 10:21 am #6546There is nothing in particular that is different about LOSA.
What jumps out at me about these results is that they are at least an order of magnitude too low. With dedicated ConnectX-5 cards you should be seeing nearly 25 Gpbs. I suspect that your test case is too small. Your 100 MB test probably doesn’t get out of the TCP ramp up phase of the connection. You should try transferring several hundred GB… or better yet, run the tests for a set amount of time (at least 1 min). You should also use much larger VMs, set the MTUs to 9000, and consider adjusting your buffer sizes.
Try running the example iPerf3 notebook but manually set the sites to LOSA and DALL. You should see much higher bandwidths. Then tweak that test, in small steps, with your desired configuration and see what causes the bandwidth to drop.
I think your tests are really testing the performance capabilities of the VMs, buffers, etc. but not the network.
Also, if you really want repeatability, you will need to use the NUMA pinning examples. Without explicitly choosing the NUMA domain for your cores, you will get random physical cores that may result much lower performance.
For reference, here is the output of a the example iPerf3 notebook using LOSA and DALL. Note that you can get nearly 100 Gbps if you increase the VM size and pin the cores to the correct NUMA domain:
<pre>Connecting to host 10.137.3.2, port 5201 [ 5] local 10.133.130.2 port 56288 connected to 10.137.3.2 port 5201 [ 7] local 10.133.130.2 port 56294 connected to 10.137.3.2 port 5201 [ 9] local 10.133.130.2 port 56310 connected to 10.137.3.2 port 5201 [ 11] local 10.133.130.2 port 56318 connected to 10.137.3.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.01 sec 14.3 GBytes 12.3 Gbits/sec 11207 52.6 MBytes (omitted) [ 7] 0.00-10.01 sec 15.4 GBytes 13.2 Gbits/sec 12714 63.5 MBytes (omitted) [ 9] 0.00-10.01 sec 15.6 GBytes 13.4 Gbits/sec 11597 64.3 MBytes (omitted) [ 11] 0.00-10.01 sec 20.3 GBytes 17.4 Gbits/sec 31095 201 MBytes (omitted) [SUM] 0.00-10.01 sec 65.5 GBytes 56.2 Gbits/sec 66613 (omitted) - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 0.00-10.01 sec 11.4 GBytes 9.77 Gbits/sec 2531 84.9 MBytes [ 7] 0.00-10.01 sec 15.7 GBytes 13.4 Gbits/sec 3213 123 MBytes [ 9] 0.00-10.01 sec 17.7 GBytes 15.2 Gbits/sec 3833 143 MBytes [ 11] 0.00-10.01 sec 18.4 GBytes 15.8 Gbits/sec 3280 145 MBytes [SUM] 0.00-10.01 sec 63.2 GBytes 54.2 Gbits/sec 12857 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 10.01-20.01 sec 11.4 GBytes 9.79 Gbits/sec 0 89.5 MBytes [ 7] 10.01-20.01 sec 16.4 GBytes 14.1 Gbits/sec 0 124 MBytes [ 9] 10.01-20.01 sec 18.7 GBytes 16.1 Gbits/sec 0 144 MBytes [ 11] 10.01-20.01 sec 18.7 GBytes 16.0 Gbits/sec 0 142 MBytes [SUM] 10.01-20.01 sec 65.2 GBytes 56.0 Gbits/sec 0 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 20.01-30.00 sec 11.0 GBytes 9.43 Gbits/sec 3639 86.7 MBytes [ 7] 20.01-30.00 sec 15.7 GBytes 13.5 Gbits/sec 5665 124 MBytes [ 9] 20.01-30.00 sec 17.9 GBytes 15.4 Gbits/sec 6044 139 MBytes [ 11] 20.01-30.00 sec 17.6 GBytes 15.1 Gbits/sec 6159 139 MBytes [SUM] 20.01-30.00 sec 62.1 GBytes 53.4 Gbits/sec 21507 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-30.00 sec 33.8 GBytes 9.66 Gbits/sec 6170 sender [ 5] 0.00-30.05 sec 33.6 GBytes 9.61 Gbits/sec receiver [ 7] 0.00-30.00 sec 47.7 GBytes 13.7 Gbits/sec 8878 sender [ 7] 0.00-30.05 sec 48.0 GBytes 13.7 Gbits/sec receiver [ 9] 0.00-30.00 sec 54.3 GBytes 15.5 Gbits/sec 9877 sender [ 9] 0.00-30.05 sec 54.5 GBytes 15.6 Gbits/sec receiver [ 11] 0.00-30.00 sec 54.7 GBytes 15.7 Gbits/sec 9439 sender [ 11] 0.00-30.05 sec 54.6 GBytes 15.6 Gbits/sec receiver [SUM] 0.00-30.00 sec 190 GBytes 54.5 Gbits/sec 34364 sender [SUM] 0.00-30.05 sec 191 GBytes 54.5 Gbits/sec receiver </pre>
- This reply was modified 11 months, 2 weeks ago by Paul Ruth.
- This reply was modified 11 months, 2 weeks ago by Paul Ruth.
- This reply was modified 11 months, 2 weeks ago by Paul Ruth.
-
AuthorPosts
- You must be logged in to reply to this topic.