Get Physical Topology of Slice

This topic has 15 replies, 4 voices, and was last updated 2 years, 9 months ago by Ertza.

Viewing 15 posts - 1 through 15 (of 16 total)

1 2 →

Author

Posts
October 13, 2022 at 2:38 pm #3308
Ertza
Participant
Hi, I have a 6-node slice on MAX named “ultima7” with slice-id aff3273b-5837-4010-b1b4-543bf1a28d22.

I am interested in knowing the physical topology of how my 6 nodes are connected, are they part of a single server? How many switches connect which slices and any and all information about their topology would be really helpful.

Thank you.
October 14, 2022 at 11:02 am #3309
Ilya Baldin
Participant
Ertza,

The worker node assignment should be available in the slice model when you print out slice or sliver details (sorry don’t remember off the top of my head). All workers are connected to a single switch, either a Cisco 5500 or a 5700. That info is available in the site advertisement, but not in the slice model.
October 24, 2022 at 7:55 pm #3355
Ertza
Participant
Hi Ilya, where can I print out the slice/silver details from? I see slice details in “Experiments->My Slices” and can click on the slice to view its details, but it does not show how the nodes are actually physically connected. I also couldn’t find the site advertisement can you help me find that as well to know if they’re connected by a Cisco 5500 or 5700? Thanks a ton.
October 24, 2022 at 7:58 pm #3356
Ertza
Participant
Additionally, just to mention, I am interested in the connectivity and topology of the Connectx NICs I assigned IPs to myself and connected using “slice.add_l2network(name=network_name, interfaces=ifaces)” command, and not the management connections. So are the nodes then simply connected by a VLAN or something on top of the Cisco 5500 in that case?
October 25, 2022 at 10:14 am #3357
Ilya Baldin
Participant
Short answer is yes – when you connect ConnectX interfaces via any services, locally it looks like a VLAN. What happens after depends on the service type. L2Bridge is indeed a local VLAN, P2P and S2S services are more complicated because they connect sites.

And you *should not* use management connections for your experiments. That is very expressly discouraged.

You may find this article useful https://learn.fabric-testbed.net/knowledge-base/network-interfaces-in-fabric-vms/

I will ask Paul to respond further on the details of fablib APIs.
October 25, 2022 at 10:52 am #3358
Paul Ruth
Keymaster
@Ertza – If you are using an L2Bridge at a site then your VMs are directly connected with a VLAN on one of the Cisco switches. If have 6 nodes, you can find the host each of them is on. The topology is simply all of those hosts connected directly to a single Cisco 5500 or 5700. The Cisco switch model depends on which the site. Most sites have a 5700 a few have a 5500.

Do you need more info? Are you asking to know the model of the specific switch you are using?
- This reply was modified 2 years, 9 months ago by Paul Ruth.
October 28, 2022 at 2:26 pm #3401
yoursunny
Participant
I agree that knowing the physical topology may be useful, especially when the slice spans multiple sites.
For example, if I create a slice with L2PTP link between STAR and UCSD, there are multiple possible paths, and it’s good to know which path my traffic is taking.

Add onto this, it’s useful to have observability into other traffic on the same links.
The link between WASH and STAR has 100 Gbps capacity.
When I benchmark my app running over L2PTP link between these two sites, I may receive throughput of 60 Gbps.
This could be caused by issues in my application (e.g. congestion control algorithm not tuned for >10ms RTT), or caused by other competing traffic on the link.
If FABRIC portal can include a diagram of near-realtime link utilization, I can better understand which one is the more likely cause.

In another angle, it’s also useful to be able to control the physical topology.
Suppose I want to develop my congestion control algorithm over a high latency link, the longest RTT can be achieved today is between MASS and UCSD.
If I can specify the physical topology as part of slice definition, I could make a link to take a scenic route such as WASH-ATLA-DALL-LOSA-SALT-KANS-STAR, which provides a longer natural latency.
October 28, 2022 at 3:53 pm #3403
Ilya Baldin
Participant
There are a couple of things we have or are working on that may be of assistance here:

1. We do have support for mirror ports (not yet included in fablib) that let’s you listen in on traffic on any physical port at any site so that way you can in principle
monitor what is going on.
2. We have on the roadmap support for ERO (explicit route objects) so that you can have your traffic take the ‘scenic route’ as you say 🙂
3. You can on your own add delay to your traffic using traditional Linux tc tools.
October 29, 2022 at 4:23 pm #3408
Ertza
Participant
@Paul – Yes it’d be great if I can know the switches in MAX and TACC sites, are they 5500 or 5700, thanks.
October 29, 2022 at 4:24 pm #3409
Ertza
Participant
Thanks Ilya, yes I am not using management interfaces but just wanted to be clear in what I am interested in so mentioned them.
October 29, 2022 at 6:15 pm #3410
Ertza
Participant
Also, any way to check the drop stats etc. in the switch? Or any way to directly connect 2 VMs? I am seeing some packet drops and upon checking NIC counters it seems the drops are occurring in the switch… Any suggestions on how to avoid that, other than congestion control? Maybe doing some config at switch side?
October 30, 2022 at 9:55 pm #3412
Ertza
Participant
Hi all, running Iperf I saw the max bandwidth I can get in VMs is around 19-20 Gbps, has this been set by Fabric and any way to increase this?
October 31, 2022 at 8:38 am #3413
yoursunny
Participant
We have on the roadmap support for ERO (explicit route objects) so that you can have your traffic take the ‘scenic route’ as you say

This will be very useful.

You can on your own add delay to your traffic using traditional Linux tc tools.

This would not work well in speeds higher than 10 Gbps.

running Iperf I saw the max bandwidth I can get in VMs is around 19-20 Gbps

You might get higher speeds if you request dedicated NIC.
I can get 60 Gbps on ConnectX-6 dual-port 100 Gbps and 33 Gbps on ConnectX-5 dual-port 25 Gbps, tested in SALT location.
This number is “goodput” reported by my DPDK-based program, and it’s the sum of traffic in both directions.
iperf3 is unidirectional traffic so it could be half of this.
October 31, 2022 at 9:58 am #3414
Ilya Baldin
Participant
Remember that the speed you get heavily depends on the flavor of the VM you are using. Tuning performance to get close to 100Gbps is not trivial, I do believe Paul has some recipes for this though.
October 31, 2022 at 5:41 pm #3417
Paul Ruth
Keymaster
Ertza – We are able to get close to 100G on most links. Are you using iperf3? iperf3 is single threaded and the highest bandwidth you will see is ~25Gbps. You will need to create multiple processes and add their results.

See: https://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf/multi-stream-iperf3/
Author

Posts

Viewing 15 posts - 1 through 15 (of 16 total)

1 2 →

You must be logged in to reply to this topic.