1. Get Physical Topology of Slice

Get Physical Topology of Slice

Home Forums FABRIC General Questions and Discussion Get Physical Topology of Slice

Viewing 15 posts - 1 through 15 (of 16 total)
  • Author
    Posts
  • #3308
    Ertza
    Participant

      Hi, I have a 6-node slice on MAX named “ultima7” with slice-id aff3273b-5837-4010-b1b4-543bf1a28d22.

      I am interested in knowing the physical topology of how my 6 nodes are connected, are they part of a single server? How many switches connect which slices and any and all information about their topology would be really helpful.

      Thank you.

      #3309
      Ilya Baldin
      Participant

        Ertza,

        The worker node assignment should be available in the slice model when you print out slice or sliver details (sorry don’t remember off the top of my head). All workers are connected to a single switch, either a Cisco 5500 or a 5700. That info is available in the site advertisement, but not in the slice model.

        #3355
        Ertza
        Participant

          Hi Ilya, where can I print out the slice/silver details from? I see slice details in “Experiments->My Slices” and can click on the slice to view its details, but it does not show how the nodes are actually physically connected. I also couldn’t find the site advertisement can you help me find that as well to know if they’re connected by a Cisco 5500 or 5700? Thanks a ton.

          #3356
          Ertza
          Participant

            Additionally, just to mention, I am interested in the connectivity and topology of the Connectx NICs I assigned IPs to myself and connected using “slice.add_l2network(name=network_name, interfaces=ifaces)” command, and not the management connections. So are the nodes then simply connected by a VLAN or something on top of the Cisco 5500 in that case?

            #3357
            Ilya Baldin
            Participant

              Short answer is yes – when you connect ConnectX interfaces via any services, locally it looks like a VLAN. What happens after depends on the service type. L2Bridge is indeed a local VLAN, P2P and S2S services are more complicated because they connect sites.

              And you *should not* use management connections for your experiments. That is very expressly discouraged.

              You may find this article useful https://learn.fabric-testbed.net/knowledge-base/network-interfaces-in-fabric-vms/

              I will ask Paul to respond further on the details of fablib APIs.

              #3358
              Paul Ruth
              Keymaster

                @Ertza – If you are using an L2Bridge at a site then your VMs are directly connected with a VLAN on one of the Cisco switches.  If have 6 nodes, you can find the host each of them is on. The topology is simply all of those hosts connected directly to a single Cisco 5500 or 5700.  The Cisco switch model depends on which the site.  Most sites have a 5700 a few have a 5500.

                Do you need more info? Are you asking to know the model of the specific switch you are using?

                • This reply was modified 2 years, 2 months ago by Paul Ruth.
                #3401
                yoursunny
                Participant

                  I agree that knowing the physical topology may be useful, especially when the slice spans multiple sites.
                  For example, if I create a slice with L2PTP link between STAR and UCSD, there are multiple possible paths, and it’s good to know which path my traffic is taking.

                  Add onto this, it’s useful to have observability into other traffic on the same links.
                  The link between WASH and STAR has 100 Gbps capacity.
                  When I benchmark my app running over L2PTP link between these two sites, I may receive throughput of 60 Gbps.
                  This could be caused by issues in my application (e.g. congestion control algorithm not tuned for >10ms RTT), or caused by other competing traffic on the link.
                  If FABRIC portal can include a diagram of near-realtime link utilization, I can better understand which one is the more likely cause.

                  In another angle, it’s also useful to be able to control the physical topology.
                  Suppose I want to develop my congestion control algorithm over a high latency link, the longest RTT can be achieved today is between MASS and UCSD.
                  If I can specify the physical topology as part of slice definition, I could make a link to take a scenic route such as WASH-ATLA-DALL-LOSA-SALT-KANS-STAR, which provides a longer natural latency.

                  #3403
                  Ilya Baldin
                  Participant

                    There are a couple of things we have or are working on that may be of assistance here:

                    1. We do have support for mirror ports (not yet included in fablib) that let’s you listen in on traffic on any physical port at any site so that way you can in principle
                    monitor what is going on.
                    2. We have on the roadmap support for ERO (explicit route objects) so that you can have your traffic take the ‘scenic route’ as you say 🙂
                    3. You can on your own add delay to your traffic using traditional Linux tc tools.

                    #3408
                    Ertza
                    Participant

                      @Paul – Yes it’d be great if I can know the switches in MAX and TACC sites, are they 5500 or 5700, thanks.

                      #3409
                      Ertza
                      Participant

                        Thanks Ilya, yes I am not using management interfaces but just wanted to be clear in what I am interested in so mentioned them.

                        #3410
                        Ertza
                        Participant

                          Also, any way to check the drop stats etc. in the switch? Or any way to directly connect 2 VMs? I am seeing some packet drops and upon checking NIC counters it seems the drops are occurring in the switch… Any suggestions on how to avoid that, other than congestion control? Maybe doing some config at switch side?

                          #3412
                          Ertza
                          Participant

                            Hi all, running Iperf I saw the max bandwidth I can get in VMs is around 19-20 Gbps, has this been set by Fabric and any way to increase this?

                            #3413
                            yoursunny
                            Participant

                              We have on the roadmap support for ERO (explicit route objects) so that you can have your traffic take the ‘scenic route’ as you say

                              This will be very useful.

                              You can on your own add delay to your traffic using traditional Linux tc tools.

                              This would not work well in speeds higher than 10 Gbps.

                              running Iperf I saw the max bandwidth I can get in VMs is around 19-20 Gbps

                              You might get higher speeds if you request dedicated NIC.
                              I can get 60 Gbps on ConnectX-6 dual-port 100 Gbps and 33 Gbps on ConnectX-5 dual-port 25 Gbps, tested in SALT location.
                              This number is “goodput” reported by my DPDK-based program, and it’s the sum of traffic in both directions.
                              iperf3 is unidirectional traffic so it could be half of this.

                              #3414
                              Ilya Baldin
                              Participant

                                Remember that the speed you get heavily depends on the flavor of the VM you are using. Tuning performance to get close to 100Gbps is not trivial, I do believe Paul has some recipes for this though.

                                #3417
                                Paul Ruth
                                Keymaster

                                  Ertza – We are able to get close to 100G on most links. Are you using iperf3? iperf3 is single threaded and the highest bandwidth you will see is ~25Gbps. You will need to create multiple processes and add their results.

                                  See: https://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf/multi-stream-iperf3/

                                Viewing 15 posts - 1 through 15 (of 16 total)
                                • You must be logged in to reply to this topic.