1. lost management network connection

lost management network connection

Home Forums FABRIC General Questions and Discussion lost management network connection

Viewing 13 posts - 1 through 13 (of 13 total)
  • Author
    Posts
  • #4184
    Fengping Hu
    Participant

      Two VMs(node1 and node4) out of four VMs in a slice lost management network connections. This is a long running slice and we are trying to understand what might have caused this. We can still get into the VM via the dataplane public IPs. This is more to understand the issue than an actual problem that affects the application we run. Please let us know if you need any other information in troubleshooting.

      Here’s the slice information:
      Slice
      ID 37dffa35-36ee-4da1-a1bd-e84f36e2f69f
      Name ServiceXSlice
      Lease Expiration (UTC) 2023-04-07 22:56:21 +0000
      Lease Start (UTC) 2023-04-06 22:56:23 +0000
      Project ID aac04e0e-e0fe-4421-8985-068c117d7437
      State StableOK
      Nodes
      ID Name Cores RAM Disk Image Image Type Host Site Username Management IP State Error SSH Command Public SSH Key File Private SSH Key File
      978f6eff-42e7-48c1-86fc-e694ccb367e6 node1 60 384 100 default_ubuntu_20 qcow2 cern-w4.fabric-testbed.net CERN ubuntu 2001:400:a100:3090:f816:3eff:fe5c:dd28 Active ssh π‘ˆπ‘ π‘’π‘Ÿπ‘›π‘Žπ‘šπ‘’@
      {Management IP} /home/fabric/work/fabric_config/slice_key.pub /home/fabric/work/fabric_config/slice_key
      010fab90-cf7a-40be-aa74-1bfb2442a611 node2 60 384 100 default_ubuntu_20 qcow2 cern-w2.fabric-testbed.net CERN ubuntu 2001:400:a100:3090:f816:3eff:fe2d:38d0 Active ssh π‘ˆπ‘ π‘’π‘Ÿπ‘›π‘Žπ‘šπ‘’@
      {Management IP} /home/fabric/work/fabric_config/slice_key.pub /home/fabric/work/fabric_config/slice_key
      bf79de38-f8c2-413c-82ae-0daf50ebf76f node3 60 384 100 default_ubuntu_20 qcow2 cern-w6.fabric-testbed.net CERN ubuntu 2001:400:a100:3090:f816:3eff:feff:b39d Active ssh π‘ˆπ‘ π‘’π‘Ÿπ‘›π‘Žπ‘šπ‘’@
      {Management IP} /home/fabric/work/fabric_config/slice_key.pub /home/fabric/work/fabric_config/slice_key
      a1001ee4-c1d5-4157-a60d-d5dbc491ff05 node4 60 384 100 default_ubuntu_20 qcow2 cern-w3.fabric-testbed.net CERN ubuntu 2001:400:a100:3090:f816:3eff:fe99:a061 Active ssh π‘ˆπ‘ π‘’π‘Ÿπ‘›π‘Žπ‘šπ‘’@
      {Management IP}

      “ip -6 nei” output from those 2 that lost connection
      fe80::f816:3eff:feac:1ca0 dev ens3 lladdr fa:16:3e:ac:1c:a0 router STALE

      Thanks,
      Fengping

      #4185
      Mert Cevik
      Moderator

        Hello Fengping,

        From one VM we need to see the output of Β “ip a”, “ip route”, “ip -6 route”. It can be also good to see the status of the network service (network, NetworkManager whichever is active).

         

        #4186
        Fengping Hu
        Participant

          Hi Mert,

          Here are the outputs of those commands.

          Thanks,
          Fengping

          `root@node1:/home/ubuntu# ip a
          1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
          valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
          valid_lft forever preferred_lft forever
          2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000
          link/ether fa:16:3e:5c:dd:28 brd ff:ff:ff:ff:ff:ff
          inet 10.30.6.141/23 brd 10.30.7.255 scope global dynamic ens3
          valid_lft 76573sec preferred_lft 76573sec
          inet6 2001:400:a100:3090:f816:3eff:fe5c:dd28/64 scope global dynamic mngtmpaddr noprefixroute
          valid_lft 86351sec preferred_lft 14351sec
          inet6 fe80::f816:3eff:fe5c:dd28/64 scope link
          valid_lft forever preferred_lft forever
          3: ens7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 02:34:c0:a0:19:80 brd ff:ff:ff:ff:ff:ff
          inet 10.143.1.2/24 scope global ens7
          valid_lft forever preferred_lft forever
          inet6 fe80::34:c0ff:fea0:1980/64 scope link
          valid_lft forever preferred_lft forever
          4: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 02:78:91:60:1e:80 brd ff:ff:ff:ff:ff:ff
          inet6 2602:fcfb:100::10/64 scope global
          valid_lft forever preferred_lft forever
          inet6 fe80::78:91ff:fe60:1e80/64 scope link
          valid_lft forever preferred_lft forever
          5: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 0a:39:34:14:6a:36 brd ff:ff:ff:ff:ff:ff
          inet6 2602:fcfb:1d:2::2/64 scope global
          valid_lft forever preferred_lft forever
          inet6 fe80::839:34ff:fe14:6a36/64 scope link
          valid_lft forever preferred_lft forever
          6: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
          link/ether f2:c7:6e:00:6e:83 brd ff:ff:ff:ff:ff:ff
          inet 10.233.0.1/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.0.3/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.42.207/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.29.52/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.45.34/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.16.205/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.61.6/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.17.21/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.46.5/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.51.50/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.15.146/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.8.168/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.60.15/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.24.72/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.57.146/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.11.198/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.39.214/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.30.20/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.23.34/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.53.112/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.55.86/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.7.33/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.1.29/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet 10.233.32.37/32 scope global kube-ipvs0
          valid_lft forever preferred_lft forever
          inet6 2602:fcfb:1d:2::31/128 scope global
          valid_lft forever preferred_lft forever
          inet6 fd85:ee78:d8a6:8607::11d7/128 scope global
          valid_lft forever preferred_lft forever
          inet6 2602:fcfb:1d:2::30/128 scope global
          valid_lft forever preferred_lft forever
          inet6 fd85:ee78:d8a6:8607::1417/128 scope global
          valid_lft forever preferred_lft forever
          10: cali5260a6a960b@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
          link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-f19391b0-0d22-5e0a-091b-f26982131f4a
          inet6 fe80::ecee:eeff:feee:eeee/64 scope link
          valid_lft forever preferred_lft forever
          11: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
          link/ether e6:a9:23:78:b6:d9 brd ff:ff:ff:ff:ff:ff
          inet 169.254.25.10/32 scope global nodelocaldns
          valid_lft forever preferred_lft forever
          23: califc30ed8b084@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
          link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-0e75fb3c-1182-2e6e-16c9-80ba6ce3844f
          inet6 fe80::ecee:eeff:feee:eeee/64 scope link
          valid_lft forever preferred_lft forever
          821: cali0894f966cdf@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
          link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-ee474322-80e1-7721-a798-bc29a08f2339
          inet6 fe80::ecee:eeff:feee:eeee/64 scope link
          valid_lft forever preferred_lft forever
          855: cali68b9b56a958@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
          link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-e232ac9b-3c1d-fe2e-dd33-cc9f06d121a2
          inet6 fe80::ecee:eeff:feee:eeee/64 scope link
          valid_lft forever preferred_lft forever
          741: cali3de775f9cb2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
          link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-3b3f6547-73f2-7f83-add2-d43a23bc1a83
          inet6 fe80::ecee:eeff:feee:eeee/64 scope link
          valid_lft forever preferred_lft forever
          747: cali0fe3bc102a1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
          link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-9d7ab47a-4e5c-cb1b-86da-65fb323732fb
          inet6 fe80::ecee:eeff:feee:eeee/64 scope link
          valid_lft forever preferred_lft forever
          754: calif913c9d9a21@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
          link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-7b67f49b-0c8a-cf9d-e976-0439a6057cdd
          inet6 fe80::ecee:eeff:feee:eeee/64 scope link
          valid_lft forever preferred_lft forever
          767: calieb3159a0e6d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
          link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-0841de8e-b94c-b3cb-50d0-60b2156da119
          inet6 fe80::ecee:eeff:feee:eeee/64 scope link
          valid_lft forever preferred_lft forever
          root@node1:/home/ubuntu# ip route
          default via 10.143.1.1 dev ens7
          10.30.6.0/23 dev ens3 proto kernel scope link src 10.30.6.141
          10.143.1.0/24 dev ens7 proto kernel scope link src 10.143.1.2
          10.233.71.0/26 via 10.143.1.4 dev ens7 proto bird
          10.233.74.64/26 via 10.143.1.5 dev ens7 proto bird
          10.233.75.0/26 via 10.143.1.3 dev ens7 proto bird
          blackhole 10.233.102.128/26 proto bird
          10.233.102.129 dev cali5260a6a960b scope link
          10.233.102.134 dev cali68b9b56a958 scope link
          10.233.102.139 dev califc30ed8b084 scope link
          10.233.102.161 dev cali3de775f9cb2 scope link
          10.233.102.166 dev cali0fe3bc102a1 scope link
          10.233.102.173 dev calif913c9d9a21 scope link
          10.233.102.185 dev cali0894f966cdf scope link
          10.233.102.188 dev calieb3159a0e6d scope link
          root@node1:/home/ubuntu# ip -6 route
          ::1 dev lo proto kernel metric 256 pref medium
          2001:400:a100:3090::/64 dev ens3 proto ra metric 100 expires 86399sec pref medium
          2602:fcfb:1d:2::/64 dev ens9 proto kernel metric 256 pref medium
          2602:fcfb:1d:2::/64 dev ens9 metric 1024 pref medium
          2602:fcfb:100::/64 dev ens8 proto kernel metric 256 pref medium
          2602:fcfb:100::/64 dev ens8 metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:340/122 via 2602:fcfb:1d:2::5 dev ens9 proto bird metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:6800/122 via 2602:fcfb:1d:2::3 dev ens9 proto bird metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:8700/122 via 2602:fcfb:1d:2::4 dev ens9 proto bird metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:a681 dev cali5260a6a960b metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:a686 dev cali68b9b56a958 metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:a68b dev califc30ed8b084 metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:a6a1 dev cali3de775f9cb2 metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:a6a6 dev cali0fe3bc102a1 metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:a6ad dev calif913c9d9a21 metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:a6b9 dev cali0894f966cdf metric 1024 pref medium
          fd85:ee78:d8a6:8607::1:a6bc dev calieb3159a0e6d metric 1024 pref medium
          blackhole fd85:ee78:d8a6:8607::1:a680/122 dev lo proto bird metric 1024 pref medium
          fe80::a9fe:a9fe via fe80::f816:3eff:feac:1ca0 dev ens3 proto ra metric 1024 expires 299sec pref medium
          fe80::/64 dev ens3 proto kernel metric 256 pref medium
          fe80::/64 dev ens7 proto kernel metric 256 pref medium
          fe80::/64 dev ens8 proto kernel metric 256 pref medium
          fe80::/64 dev ens9 proto kernel metric 256 pref medium
          fe80::/64 dev cali5260a6a960b proto kernel metric 256 pref medium
          fe80::/64 dev califc30ed8b084 proto kernel metric 256 pref medium
          fe80::/64 dev cali3de775f9cb2 proto kernel metric 256 pref medium
          fe80::/64 dev cali0fe3bc102a1 proto kernel metric 256 pref medium
          fe80::/64 dev calif913c9d9a21 proto kernel metric 256 pref medium
          fe80::/64 dev calieb3159a0e6d proto kernel metric 256 pref medium
          fe80::/64 dev cali0894f966cdf proto kernel metric 256 pref medium
          fe80::/64 dev cali68b9b56a958 proto kernel metric 256 pref medium
          default via 2602:fcfb:1d:2::1 dev ens9 metric 10 pref medium
          default via fe80::f816:3eff:feac:1ca0 dev ens3 proto ra metric 100 expires 299sec mtu 9000 pref medium
          root@node1:/home/ubuntu# ip -6 rule show
          0: from all lookup local
          32761: from 2602:fcfb:100::/64 lookup v6peering
          32762: from 2001:400:a100:3090::/64 lookup admin
          32766: from all lookup main
          root@node1:/home/ubuntu# ip -6 route show table admin
          default via fe80::f816:3eff:feac:1ca0 dev ens3 metric 1024 pref medium

          root@node1:/home/ubuntu# systemctl status systemd-networkd
          ● systemd-networkd.service – Network Service
          Loaded: loaded (/lib/systemd/system/systemd-networkd.service; enabled; vendor preset: enabled)
          Active: active (running) since Fri 2023-04-07 06:46:29 UTC; 1 months 2 days ago
          TriggeredBy: ● systemd-networkd.socket
          Docs: man:systemd-networkd.service(8)
          Main PID: 313692 (systemd-network)
          Status: “Processing requests…”
          Tasks: 1 (limit: 464085)
          Memory: 21.9M
          CGroup: /system.slice/systemd-networkd.service
          └─313692 /lib/systemd/systemd-networkd

          May 09 19:29:21 node1 systemd-networkd[313692]: califcd611d2e5a: Lost carrier
          May 09 19:29:23 node1 systemd-networkd[313692]: cali6d43fe7c75d: Link DOWN
          May 09 19:29:23 node1 systemd-networkd[313692]: cali6d43fe7c75d: Lost carrier
          May 09 19:54:08 node1 systemd-networkd[313692]: cali0905f2a129d: Link DOWN
          May 09 19:54:08 node1 systemd-networkd[313692]: cali0905f2a129d: Lost carrier
          May 09 19:54:45 node1 systemd-networkd[313692]: calidba7a9afee0: Link DOWN
          May 09 19:54:45 node1 systemd-networkd[313692]: calidba7a9afee0: Lost carrier
          May 09 19:55:37 node1 systemd-networkd[313692]: cali68b9b56a958: Link UP
          May 09 19:55:37 node1 systemd-networkd[313692]: cali68b9b56a958: Gained carrier
          May 09 19:55:39 node1 systemd-networkd[313692]: cali68b9b56a958: Gained IPv6LL

          #4187
          Mert Cevik
          Moderator

            Thank you for the output. I will look at this carefully later, but with a quick view, it looks like the default IPv6 route is switched to the dataplane (can be FABv6) network (see the output from ip -6 route). Management (ssh) access is routed from the ens3 interface (IPv6 2001:400:a100:3090:f816:3eff:fe5c:dd28/64 ) and default IPv6 gateway for this subnet is 2001:400:a100:3090::1. Looks like a configuration change occurred.

            #4188
            Fengping Hu
            Participant

              Yep. We configured the default route to be on the public network on the dataplane. The management network is routed via policy based routing.

              #4189
              Mert Cevik
              Moderator

                It seems that the root cause of the connectivity issue is known then, probably you need to review your policy based routing. Do you still need anything from us on this?

                #4190
                Fengping Hu
                Participant

                  It appears the linklocal address of the router is stale. So it looks as if it lost the layer 2 connection. The policy based routing seems ok and it’s the same as with the other two nodes that are working.

                  ip -6 nei
                  fe80::f816:3eff:feac:1ca0 dev ens3 lladdr fa:16:3e:ac:1c:a0 router STALE

                  #4192
                  David Bank
                  Moderator

                    Fengping,

                    This is Ubuntu?

                    We configured the default route to be on the public network on the dataplane. The management network is routed via policy based routing.

                    Can you post that config? Are the PBR rules based on interface? IP?

                    #4196
                    Fengping Hu
                    Participant

                      Hi David,

                      Yep. It’s an ubuntu.

                      root@node1:/home/ubuntu# ip -6 rule list
                      0: from all lookup local
                      32761: from 2602:fcfb:100::/64 lookup v6peering
                      32762: from 2001:400:a100:3090::/64 lookup admin
                      32766: from all lookup main
                      root@node1:/home/ubuntu# ip -6 route show table admin
                      default via fe80::f816:3eff:feac:1ca0 dev ens3 metric 1024 pref medium

                      Thanks,
                      Fengping

                      #4215
                      Paul Ruth
                      Keymaster

                        @Fengping

                        This seems like an issue with the policy based routing.Β  Is there a way to make your setup more resilient to external factors?Β  Β The management network connects to the local campus or provider. I suspect we can’t control everything on the management network.

                        Paul

                         

                        #4220
                        Fengping Hu
                        Participant

                          Thanks for the clarification. We do still have access to the VM via the data plane public ip. So I think our setup is resilient in that sense.

                          Thanks,
                          Fengping

                          #4221
                          Ilya Baldin
                          Participant

                            @Fengpin – our authority at individual sites w.r.t. management plane connections end on our switch, so if the campus network blinks, we have no control over that. Is there a way for you to force PBR to pick the default via management connection again?

                            #4229
                            Fengping Hu
                            Participant

                              Thanks Ilya. I tried ip link down and up and this was able to bring the management network online again. I would just need to readd PBR since ip link down would clear that it seems. But that’s not a problem.Β  With what you said I think I will just manually deal with potential blinks and not worry about it too much.

                              Fengping

                            Viewing 13 posts - 1 through 13 (of 13 total)
                            • You must be logged in to reply to this topic.