Difference between throughput after maintenance

Tagged: evaluation, Throughput

This topic has 6 replies, 2 voices, and was last updated 1 year, 9 months ago by Paul Ruth.

Viewing 7 posts - 1 through 7 (of 7 total)

Author

Posts
January 12, 2024 at 12:46 pm #6325
Edgard da Cunha Pontes
Participant
I have noticed that after maintaining/updating FABlib to version 1.6, the throughput between some nodes/sites is experiencing a type of limitation.

Below is the link to a document (GDocs) showing the topology and tests carried out yesterday (01/11/2024) using iperf3.

https://docs.google.com/document/d/1DSZDqPEBVjQ_we717I_5n0nedCiT0JqbKwAD7-K0LaQ/edit?usp=sharing

Is there any new feature that could cause this limitation?
January 16, 2024 at 8:05 am #6341
Edgard da Cunha Pontes
Participant
I recreated the same slice and got the following results.
```
h1  >  r1
[  5]   0.00-0.94   sec  50.2 MBytes   448 Mbits/sec    9             sender
[  5]   0.00-0.98   sec  48.7 MBytes   416 Mbits/sec                  receiver
r1  >  r2
[  5]   0.00-10.43  sec  50.6 MBytes  40.7 Mbits/sec   44             sender
[  5]   0.00-10.47  sec  46.8 MBytes  37.5 Mbits/sec                  receiver
r1  >  r3
[  5]   0.00-82.03  sec  50.1 MBytes  5.13 Mbits/sec  356             sender
[  5]   0.00-82.07  sec  49.8 MBytes  5.09 Mbits/sec                  receiver
r1  >  r5
[  5]   0.00-454.26 sec  50.1 MBytes   925 Kbits/sec  3455             sender
[  5]   0.00-454.30 sec  49.8 MBytes   919 Kbits/sec                  receiver
r2  >  r3
[  5]   0.00-0.37   sec  50.8 MBytes  1.17 Gbits/sec    0             sender
[  5]   0.00-0.41   sec  50.0 MBytes  1.03 Gbits/sec                  receiver
r3  >  r4
[  5]   0.00-1.16   sec  50.2 MBytes   362 Mbits/sec    0             sender
[  5]   0.00-1.21   sec  49.4 MBytes   343 Mbits/sec                  receiver
r4  >  r5
[  5]   0.00-0.48   sec  51.2 MBytes   892 Mbits/sec    0             sender
[  5]   0.00-0.52   sec  50.2 MBytes   811 Mbits/sec                  receiver
r4  >  h2
[  5]   0.00-0.13   sec  51.0 MBytes  3.34 Gbits/sec    0             sender
[  5]   0.00-0.17   sec  49.7 MBytes  2.52 Gbits/sec                  receiver
```
January 16, 2024 at 1:29 pm #6345
Paul Ruth
Keymaster
Edgard,

I don’t think there would be anything that would limit you to bandwidth that low. All but a few sites should support 100 Gbps (a few can only provide 10 Gpbs). I would expect much higher bandwidth than you are seeing. Even using multiple software routes, I would expect 10s of Gbps.

What NIC types are you using?

What VM size are you using?

How are you forwarding traffic in you routers?

Are you tuning the TCP/IP configuration of your nodes (congestion control algorithm, MTU, buffer sizes, etc)?

Also, are you pinning your nodes to the NIC’s NUMA domain? NUMA pinning example.

Paul
- This reply was modified 1 year, 10 months ago by Paul Ruth.
January 16, 2024 at 2:22 pm #6347
Edgard da Cunha Pontes
Participant
Hi Paul, thanks for answering me!

What NIC types are you using?

I’m using NIC_ConnectX_5 NICs on this test.

What VM size are you using?

All nodes are default_rocky_8 with 2 cores and 8 GB RAM.

How are you forwarding traffic in you routers?

On these tests, I’m using static routes and different routes with TOS. Basically, all tests are made with iperf3 (TCP).

Are you tuning the TCP/IP configuration of your nodes (congestion control algorithm, MTU, buffer sizes, etc)?

We are investigating congestion control with the Cubic and BBR algorithms. These tests are using Cubic. All other settings have not been changed.

Also, are you pinning your nodes to the NIC’s NUMA domain? NUMA pinning example.

One of the main objectives of this experiment, in addition to investigating congestion control, is the replication property of all tests.

Apparently, in the topology presented, the main bottleneck is the node in Los Angeles (LOSA).
January 16, 2024 at 6:17 pm #6360
Paul Ruth
Keymaster
One more question… were able to replicated this before? By replicate I mean run it once in one slice, then delete that slice and run it again in a new slice.

I think the main issue here is combination of VMs that are too small (memory/cores) to achieve good bandwidth and that you are not pinning cores to NUMA domains. Without pinning you will not likely get repeatable performance. The issue is that if your VM cores are not in the same NUMA domain as your NIC, you will get worse performance. This is especially true for the router nodes. When you create a slice, your virtual cores will float between the available physical cores. Since there are other users on the host, you will not know anything about the placement of your virtual cores.

I suggest using much larger VMs and pinning the cores to the appropriate NUMA domains.

One more thing, which version of iPerf3 are you using? The iPerf3 that is available in most linux repos is single threaded. I recommend using the new version suported by ESnet (https://github.com/esnet/iperf).
February 7, 2024 at 4:15 am #6537
Edgard da Cunha Pontes
Participant
Hi Paul! Thanks for the feedback.

Sorry for the delay in responding to you. Our group was developing an article with the theme “Sliced WANs for Data-Intensive Science” and all experiments were carried out in FABRIC. After we submitted the paper, I found some time to go back to running other experiments.

I split the previous topology [LINK] into two different ones.
Topology 1: SEAT (h1), MASS (r1), SALT (r2), STAR (r3), NEWY (h2).
Topology 2: LOSA (h1), DALL (r1), ATLA (r2), WASH (r3), NEWY (h2).

In topology 2, I obtained the following results (100MB each TCP flow, iPerf v. 3.5.0):
```
h1 > r1
[ 5] 0.00-230.51 sec 100 MBytes 3.64 Mbits/sec 697 sender
[ 5] 0.00-230.55 sec 99.6 MBytes 3.62 Mbits/sec receiver
r1 > r2
[ 5] 0.00-0.69 sec 101 MBytes 1.23 Gbits/sec 0 sender
[ 5] 0.00-0.73 sec 100 MBytes 1.15 Gbits/sec receiver
r2 > r3
[ 5] 0.00-0.53 sec 101 MBytes 1.60 Gbits/sec 0 sender
[5] 0.00-0.57 sec 99.5 MBytes 1.47 Gbits/sec receiver
r3 > h2
[ 5] 0.00-0.22 sec 100 MBytes 3.88 Gbits/sec 172 sender
[ 5] 0.00-0.26 sec 99.4 MBytes 3.26 Gbits/sec receiver
h1 > h2
[ 5] 0.00-459.85 sec 100 MBytes 1.83 Mbits/sec 740 sender
[ 5] 0.00-459.92 sec 99.5 MBytes 1.82 Mbits/sec receiver
```
Again, results that pass through LOSA show a decrease in the transmission rate.

One question I still have: is whether there is a difference between L2 overlay L2STS and L2PTP for throughput tests?
- This reply was modified 1 year, 9 months ago by Edgard da Cunha Pontes.
- This reply was modified 1 year, 9 months ago by Edgard da Cunha Pontes.
February 8, 2024 at 10:21 am #6546
Paul Ruth
Keymaster
There is nothing in particular that is different about LOSA.

What jumps out at me about these results is that they are at least an order of magnitude too low. With dedicated ConnectX-5 cards you should be seeing nearly 25 Gpbs. I suspect that your test case is too small. Your 100 MB test probably doesn’t get out of the TCP ramp up phase of the connection. You should try transferring several hundred GB… or better yet, run the tests for a set amount of time (at least 1 min). You should also use much larger VMs, set the MTUs to 9000, and consider adjusting your buffer sizes.

Try running the example iPerf3 notebook but manually set the sites to LOSA and DALL. You should see much higher bandwidths. Then tweak that test, in small steps, with your desired configuration and see what causes the bandwidth to drop.

I think your tests are really testing the performance capabilities of the VMs, buffers, etc. but not the network.

Also, if you really want repeatability, you will need to use the NUMA pinning examples. Without explicitly choosing the NUMA domain for your cores, you will get random physical cores that may result much lower performance.

For reference, here is the output of a the example iPerf3 notebook using LOSA and DALL. Note that you can get nearly 100 Gbps if you increase the VM size and pin the cores to the correct NUMA domain:
```
<pre>Connecting to host 10.137.3.2, port 5201
[  5] local 10.133.130.2 port 56288 connected to 10.137.3.2 port 5201
[  7] local 10.133.130.2 port 56294 connected to 10.137.3.2 port 5201
[  9] local 10.133.130.2 port 56310 connected to 10.137.3.2 port 5201
[ 11] local 10.133.130.2 port 56318 connected to 10.137.3.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-10.01  sec  14.3 GBytes  12.3 Gbits/sec  11207   52.6 MBytes       (omitted)
[  7]   0.00-10.01  sec  15.4 GBytes  13.2 Gbits/sec  12714   63.5 MBytes       (omitted)
[  9]   0.00-10.01  sec  15.6 GBytes  13.4 Gbits/sec  11597   64.3 MBytes       (omitted)
[ 11]   0.00-10.01  sec  20.3 GBytes  17.4 Gbits/sec  31095    201 MBytes       (omitted)
[SUM]   0.00-10.01  sec  65.5 GBytes  56.2 Gbits/sec  66613             (omitted)
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   0.00-10.01  sec  11.4 GBytes  9.77 Gbits/sec  2531   84.9 MBytes       
[  7]   0.00-10.01  sec  15.7 GBytes  13.4 Gbits/sec  3213    123 MBytes       
[  9]   0.00-10.01  sec  17.7 GBytes  15.2 Gbits/sec  3833    143 MBytes       
[ 11]   0.00-10.01  sec  18.4 GBytes  15.8 Gbits/sec  3280    145 MBytes       
[SUM]   0.00-10.01  sec  63.2 GBytes  54.2 Gbits/sec  12857             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]  10.01-20.01  sec  11.4 GBytes  9.79 Gbits/sec    0   89.5 MBytes       
[  7]  10.01-20.01  sec  16.4 GBytes  14.1 Gbits/sec    0    124 MBytes       
[  9]  10.01-20.01  sec  18.7 GBytes  16.1 Gbits/sec    0    144 MBytes       
[ 11]  10.01-20.01  sec  18.7 GBytes  16.0 Gbits/sec    0    142 MBytes       
[SUM]  10.01-20.01  sec  65.2 GBytes  56.0 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]  20.01-30.00  sec  11.0 GBytes  9.43 Gbits/sec  3639   86.7 MBytes       
[  7]  20.01-30.00  sec  15.7 GBytes  13.5 Gbits/sec  5665    124 MBytes       
[  9]  20.01-30.00  sec  17.9 GBytes  15.4 Gbits/sec  6044    139 MBytes       
[ 11]  20.01-30.00  sec  17.6 GBytes  15.1 Gbits/sec  6159    139 MBytes       
[SUM]  20.01-30.00  sec  62.1 GBytes  53.4 Gbits/sec  21507             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  33.8 GBytes  9.66 Gbits/sec  6170             sender
[  5]   0.00-30.05  sec  33.6 GBytes  9.61 Gbits/sec                  receiver
[  7]   0.00-30.00  sec  47.7 GBytes  13.7 Gbits/sec  8878             sender
[  7]   0.00-30.05  sec  48.0 GBytes  13.7 Gbits/sec                  receiver
[  9]   0.00-30.00  sec  54.3 GBytes  15.5 Gbits/sec  9877             sender
[  9]   0.00-30.05  sec  54.5 GBytes  15.6 Gbits/sec                  receiver
[ 11]   0.00-30.00  sec  54.7 GBytes  15.7 Gbits/sec  9439             sender
[ 11]   0.00-30.05  sec  54.6 GBytes  15.6 Gbits/sec                  receiver
[SUM]   0.00-30.00  sec   190 GBytes  54.5 Gbits/sec  34364             sender
[SUM]   0.00-30.05  sec   191 GBytes  54.5 Gbits/sec                  receiver
</pre>
```
- This reply was modified 1 year, 9 months ago by Paul Ruth.
- This reply was modified 1 year, 9 months ago by Paul Ruth.
- This reply was modified 1 year, 9 months ago by Paul Ruth.
Author

Posts

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.