1. IPv6 on FABRIC: A hop with a low MTU

IPv6 on FABRIC: A hop with a low MTU

Home Forums FABRIC General Questions and Discussion IPv6 on FABRIC: A hop with a low MTU

Viewing 15 posts - 1 through 15 (of 16 total)
  • Author
    Posts
  • #1535
    Justin Presley
    Participant

      I (we) deployed our application for a few minutes and found that for some reason our packets were being dropped when using IPv6 and having the packet be around ~8300 bytes. However, with smaller packets (~1400 bytes), it works. Our conclusion is that a hop in between our nodes is dropping larger packets.

      The nodes themselves do not have a MTU limit (set at 9000 I believe) which means it likely is a hop in between the two nodes that is dropping the packets. Is this a known thing / is this a concern?

      I can of course limit the packet size via our application just fine (and it does work), but it would be nice to use larger packets. I was told by other people that FABRIC supports super-size packets or something? Perhaps, this issue is sorted out via our slice configuration?

       

      Kind regards,

      Justin

      #1536
      Paul Ruth
      Keymaster

        What are the source and destination sites/hosts for this test?

        There might be a configuration error in a switch somewhere. Thanks for helping us find it.

        Paul

         

        #1537
        Justin Presley
        Participant

          I have two of the IPv6 node addresses if you can find the sites (my partner can only access the slice):

          • [2001:1948:417:7:f816:3eff:fe94:5413]
          • [2001:400:a100:3030:f816:3eff:feef:dba1]

          If I find out the sites, I will reply again. To add, we have only tested two sites, so there very well could be more hops that limit packet sizes. We will let you know as we test more! Thank you.

           

          Kind regards,

          Justin

          #1542
          Justin Presley
          Participant

            We are expressing a low MTU switch when communicating between site “UTAH” and “STAR”.

            #1543
            Paul Ruth
            Keymaster

              I think I have narrowed this down to STAR and UTAH.  I am able the use jumbo frames between these sites on a FABRIC L2 data plane network but not using the management network (i.e. the public Internet

              What is your data plane network configuration?   Which network/IPs are you using for your application? I’m wondering if you are trying to use jumbo frames across the management network.   If you use the management network to connect nodes from different sites, your traffic will go over the public Internet.  We probably can’t fix MTU issues on the public Internet.

              You can test by trying the following command. You can find the MTU by increasing the packet size until the ping starts failing.

              ping -M do -s <packet size> <destination IP>

               

              • This reply was modified 2 years, 8 months ago by Paul Ruth.
              • This reply was modified 2 years, 8 months ago by Paul Ruth.
              #4059
              yoursunny
              Participant

                I’m seeing MTU issues on the data plane network between SALT and UTAH.
                The scenario is using NIC_ConnectX_5 NICs and L2PTP network service.

                I increased the MTU of VLAN netifs to 9000, and assigned IPv4 addresses to both ends.
                The maximum ICMP ping size that can pass through is 1472.

                ubuntu@9bf529e4-3efc-4988-a9f8-5089ccfa08af-nb:~$ ping -M do -c 4 -s 1472 192.168.8.1
                PING 192.168.8.1 (192.168.8.1) 1472(1500) bytes of data.
                1480 bytes from 192.168.8.1: icmp_seq=1 ttl=64 time=0.348 ms
                1480 bytes from 192.168.8.1: icmp_seq=2 ttl=64 time=0.225 ms
                1480 bytes from 192.168.8.1: icmp_seq=3 ttl=64 time=0.227 ms
                1480 bytes from 192.168.8.1: icmp_seq=4 ttl=64 time=0.192 ms
                
                --- 192.168.8.1 ping statistics ---
                4 packets transmitted, 4 received, 0% packet loss, time 3051ms
                rtt min/avg/max/mdev = 0.192/0.248/0.348/0.059 ms
                
                ubuntu@9bf529e4-3efc-4988-a9f8-5089ccfa08af-nb:~$ ping -M do -c 4 -s 1473 192.168.8.1
                PING 192.168.8.1 (192.168.8.1) 1473(1501) bytes of data.
                
                --- 192.168.8.1 ping statistics ---
                4 packets transmitted, 0 received, 100% packet loss, time 3074ms
                
                
                
                #4080
                yoursunny
                Participant

                  MTU issue is discovered between MASS and STAR on the experiment network.
                  I increased MTU of every netif to 9000, but the largest IPv4 ping that can pass through is 1424.
                  Slice ID: 3b8d1e30-8c17-45b2-9e78-4e59f69cfc3e

                  ubuntu@NA:~$ ping -M do -c 4 -s 1424 192.168.8.2
                  PING 192.168.8.2 (192.168.8.2) 1424(1452) bytes of data.
                  1432 bytes from 192.168.8.2: icmp_seq=1 ttl=64 time=26.8 ms
                  1432 bytes from 192.168.8.2: icmp_seq=2 ttl=64 time=26.7 ms
                  1432 bytes from 192.168.8.2: icmp_seq=3 ttl=64 time=26.7 ms
                  1432 bytes from 192.168.8.2: icmp_seq=4 ttl=64 time=26.7 ms
                  
                  --- 192.168.8.2 ping statistics ---
                  4 packets transmitted, 4 received, 0% packet loss, time 3005ms
                  rtt min/avg/max/mdev = 26.659/26.695/26.765/0.041 ms
                  ubuntu@NA:~$ ping -M do -c 4 -s 1425 192.168.8.2
                  PING 192.168.8.2 (192.168.8.2) 1425(1453) bytes of data.
                  
                  --- 192.168.8.2 ping statistics ---
                  4 packets transmitted, 0 received, 100% packet loss, time 3050ms

                   

                  #4083
                  Ilya Baldin
                  Participant

                    Connection to MASS unfortunately goes through providers that do not support jumbo frames – it is concatenated from two L2 services from regional providers and no other options exist. We will start updating the topology advertisements you see to indicate the MTU a given link can support, currently feel free to ask us.

                    #4084
                    Ilya Baldin
                    Participant

                      UTAH-SALT we will look into – that is a L1 connection and this is surprising, thank you for letting us know @yoursunny

                      #4094
                      Ilya Baldin
                      Participant

                        @yoursunny – we looked into SALT-UTAH – you are correct, the MTU was incorrectly set on the switch on the 25Gbps ports (for 100G it seems to be correct). We will remedy this next week and will check settings on other sites so the experimenters can get as consistent an experience as possible. Thank you for reporting this.

                        #4106
                        Ilya Baldin
                        Participant

                          @yoursunny I tested SALT to UTAH and this is what I have as the max MTU:

                          try:
                            node1 = slice.get_node(name=node1_name)
                            stdout, stderr = node1.execute(f'ping -M do -c 5 -s 8928 {node2_addr}')
                          except Exception as e:
                            print(f"Exception: {e}")
                          
                          PING 192.168.1.2 (192.168.1.2) 8928(8956) bytes of data.
                          8936 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.268 ms
                          8936 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.214 ms
                          8936 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.225 ms
                          8936 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=0.239 ms
                          8936 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=0.246 ms
                          
                          --- 192.168.1.2 ping statistics ---
                          5 packets transmitted, 5 received, 0% packet loss, time 4118ms
                          rtt min/avg/max/mdev = 0.214/0.238/0.268/0.023 ms
                          #4112
                          yoursunny
                          Participant

                            MTU is good now (except MASS).
                            I made a slice in every available location with FABNetv4 network service, tested ping with a few MTUs (256, 1280, 1420, 1500, 8900, 8948, 9000).
                            They can all support MTU 8948 (IPv4 ping -s 8920), but not MTU 9000 (IPv4 ping -s 8972).

                            IPv4 ping MTU and RTT
                            src\dst  |   CERN   |   UCSD   |   DALL   |   NCSA   |   CLEM   |   TACC   |   MAX    |   WASH   |   GPN    |   INDI   |   FIU    |   MICH   |   MASS   |   UTAH   |   SALT   |   STAR  
                            ---------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------
                            CERN     | 9000   0 | 8948 148 | 8948 122 | 8948 105 | 8948 105 | 8948 128 | 8948  91 | 8948  88 | 8948 156 | 8948 107 | 8948 115 | 8948 108 | 1420 101 | 8948 133 | 8948 133 | 8948 102
                            UCSD     | 8948 148 | 9000   0 | 8948  43 | 8948  48 | 8948  76 | 8948  49 | 8948  62 | 8948  59 | 8948  37 | 8948  50 | 8948  86 | 8948  50 | 1420  72 | 8948  14 | 8948  14 | 8948  45
                            DALL     | 8948 122 | 8948  43 | 9000   0 | 8948  22 | 8948  50 | 8948   5 | 8948  36 | 8948  34 | 8948  51 | 8948  24 | 8948  60 | 8948  25 | 1420  46 | 8948  28 | 8948  28 | 8948  19
                            NCSA     | 8948 105 | 8948  48 | 8948  22 | 9000   0 | 8948  32 | 8948  28 | 8948  19 | 8948  16 | 8948  56 | 8948   7 | 8948  43 | 8948   7 | 1420  29 | 8948  33 | 8948  33 | 8948   2
                            CLEM     | 8948 105 | 8948  76 | 8948  50 | 8948  32 | 9000   0 | 8948  56 | 8948  19 | 8948  16 | 8948  84 | 8948  35 | 8948  43 | 8948  35 | 1420  28 | 8948  61 | 8948  61 | 8948  30
                            TACC     | 8948 128 | 8948  49 | 8948   5 | 8948  28 | 8948  56 | 9000   0 | 8948  42 | 8948  39 | 8948  57 | 8948  30 | 8948  66 | 8948  30 | 1420  52 | 8948  34 | 8948  34 | 8948  25
                            MAX      | 8948  91 | 8948  62 | 8948  36 | 8948  19 | 8948  19 | 8948  42 | 9000   0 | 8948   2 | 8948  70 | 8948  21 | 8948  29 | 8948  22 | 1420  15 | 8948  47 | 8948  47 | 8948  17
                            WASH     | 8948  88 | 8948  59 | 8948  34 | 8948  16 | 8948  16 | 8948  39 | 8948   2 | 9000   0 | 8948  67 | 8948  18 | 8948  26 | 8948  19 | 1420  12 | 8948  44 | 8948  44 | 8948  14
                            GPN      | 8948 156 | 8948  37 | 8948  51 | 8948  56 | 8948  84 | 8948  57 | 8948  70 | 8948  67 | 9000   0 | 8948  58 | 8948  94 | 8948  58 | 1420  80 | 8948  22 | 8948  23 | 8948  53
                            INDI     | 8948 107 | 8948  50 | 8948  24 | 8948   7 | 8948  35 | 8948  30 | 8948  21 | 8948  18 | 8948  58 | 9000   0 | 8948  45 | 8948   9 | 1420  31 | 8948  35 | 8948  35 | 8948   4
                            FIU      | 8948 115 | 8948  86 | 8948  60 | 8948  43 | 8948  43 | 8948  66 | 8948  29 | 8948  26 | 8948  94 | 8948  45 | 9000   0 | 8948  46 | 1420  39 | 8948  71 | 8948  71 | 8948  40
                            MICH     | 8948 108 | 8948  50 | 8948  25 | 8948   7 | 8948  35 | 8948  30 | 8948  22 | 8948  19 | 8948  58 | 8948   9 | 8948  46 | 9000   0 | 1420  31 | 8948  36 | 8948  35 | 8948   5
                            MASS     | 1420 101 | 1420  72 | 1420  46 | 1420  29 | 1420  28 | 1420  52 | 1420  15 | 1420  12 | 1420  80 | 1420  31 | 1420  39 | 1420  31 | 9000   0 | 1420  57 | 1420  57 | 1420  26
                            UTAH     | 8948 133 | 8948  14 | 8948  28 | 8948  33 | 8948  61 | 8948  34 | 8948  47 | 8948  44 | 8948  22 | 8948  35 | 8948  71 | 8948  36 | 1420  57 | 9000   0 | 8948   0 | 8948  30
                            SALT     | 8948 133 | 8948  14 | 8948  28 | 8948  33 | 8948  61 | 8948  34 | 8948  47 | 8948  44 | 8948  23 | 8948  35 | 8948  71 | 8948  35 | 1420  57 | 8948   0 | 9000   0 | 8948  30
                            STAR     | 8948 102 | 8948  45 | 8948  19 | 8948   2 | 8948  30 | 8948  25 | 8948  17 | 8948  14 | 8948  53 | 8948   4 | 8948  40 | 8948   5 | 1420  26 | 8948  30 | 8948  30 | 9000   0
                            
                            #4113
                            Ilya Baldin
                            Participant

                              Very nice! We are discussing internally the change we need to make to allow full 9000 MTU – we plan to implement the change after the workshop next week.

                              #4168
                              Tom Lehman
                              Participant

                                @yoursunny Thanks for the nice testing table/results.  We do have some links in the network which need to be set higher to allow hosts to use MTU of 9000.   We need to do some more testing for all the links in the network to see if we can find a single value that works everywhere.  We are also adding many new wide area links over the next month.   So we will probably update the MTU settings across the network sometime in June.   We will report back here on the forum with more details once we have made this change.   For now, hopefully you can use MTU 8948 as you show in your table, until we make the change.

                                The MASS site/links should now support a host MTU of 8948.  Please let us know if you still have issues with MASS node.

                                #4183
                                yoursunny
                                Participant

                                  We need to do some more testing for all the links in the network to see if we can find a single value that works everywhere.

                                  Use my script:

                                  https://github.com/yoursunny/fabric/blob/5d434c3117314730a9ab38ffd4eefcab70f13779/util/mtu.py

                                Viewing 15 posts - 1 through 15 (of 16 total)
                                • You must be logged in to reply to this topic.