Home › Forums › FABRIC General Questions and Discussion › lost management network connection
- This topic has 12 replies, 5 voices, and was last updated 2 years, 5 months ago by Fengping Hu. 
- 
		AuthorPosts
- 
		
			
				
May 9, 2023 at 3:48 pm #4184Two VMs(node1 and node4) out of four VMs in a slice lost management network connections. This is a long running slice and we are trying to understand what might have caused this. We can still get into the VM via the dataplane public IPs. This is more to understand the issue than an actual problem that affects the application we run. Please let us know if you need any other information in troubleshooting. Here’s the slice information: 
 Slice
 ID 37dffa35-36ee-4da1-a1bd-e84f36e2f69f
 Name ServiceXSlice
 Lease Expiration (UTC) 2023-04-07 22:56:21 +0000
 Lease Start (UTC) 2023-04-06 22:56:23 +0000
 Project ID aac04e0e-e0fe-4421-8985-068c117d7437
 State StableOK
 Nodes
 ID Name Cores RAM Disk Image Image Type Host Site Username Management IP State Error SSH Command Public SSH Key File Private SSH Key File
 978f6eff-42e7-48c1-86fc-e694ccb367e6 node1 60 384 100 default_ubuntu_20 qcow2 cern-w4.fabric-testbed.net CERN ubuntu 2001:400:a100:3090:f816:3eff:fe5c:dd28 Active ssh ππ ππππππ@
 {Management IP} /home/fabric/work/fabric_config/slice_key.pub /home/fabric/work/fabric_config/slice_key
 010fab90-cf7a-40be-aa74-1bfb2442a611 node2 60 384 100 default_ubuntu_20 qcow2 cern-w2.fabric-testbed.net CERN ubuntu 2001:400:a100:3090:f816:3eff:fe2d:38d0 Active ssh ππ ππππππ@
 {Management IP} /home/fabric/work/fabric_config/slice_key.pub /home/fabric/work/fabric_config/slice_key
 bf79de38-f8c2-413c-82ae-0daf50ebf76f node3 60 384 100 default_ubuntu_20 qcow2 cern-w6.fabric-testbed.net CERN ubuntu 2001:400:a100:3090:f816:3eff:feff:b39d Active ssh ππ ππππππ@
 {Management IP} /home/fabric/work/fabric_config/slice_key.pub /home/fabric/work/fabric_config/slice_key
 a1001ee4-c1d5-4157-a60d-d5dbc491ff05 node4 60 384 100 default_ubuntu_20 qcow2 cern-w3.fabric-testbed.net CERN ubuntu 2001:400:a100:3090:f816:3eff:fe99:a061 Active ssh ππ ππππππ@
 {Management IP}“ip -6 nei” output from those 2 that lost connection 
 fe80::f816:3eff:feac:1ca0 dev ens3 lladdr fa:16:3e:ac:1c:a0 router STALEThanks, 
 FengpingMay 9, 2023 at 4:04 pm #4185Hello Fengping, From one VM we need to see the output of Β “ip a”, “ip route”, “ip -6 route”. It can be also good to see the status of the network service (network, NetworkManager whichever is active). May 9, 2023 at 5:15 pm #4186Hi Mert, Here are the outputs of those commands. Thanks, 
 Fengping`root@node1:/home/ubuntu# ip a 
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host
 valid_lft forever preferred_lft forever
 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000
 link/ether fa:16:3e:5c:dd:28 brd ff:ff:ff:ff:ff:ff
 inet 10.30.6.141/23 brd 10.30.7.255 scope global dynamic ens3
 valid_lft 76573sec preferred_lft 76573sec
 inet6 2001:400:a100:3090:f816:3eff:fe5c:dd28/64 scope global dynamic mngtmpaddr noprefixroute
 valid_lft 86351sec preferred_lft 14351sec
 inet6 fe80::f816:3eff:fe5c:dd28/64 scope link
 valid_lft forever preferred_lft forever
 3: ens7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
 link/ether 02:34:c0:a0:19:80 brd ff:ff:ff:ff:ff:ff
 inet 10.143.1.2/24 scope global ens7
 valid_lft forever preferred_lft forever
 inet6 fe80::34:c0ff:fea0:1980/64 scope link
 valid_lft forever preferred_lft forever
 4: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
 link/ether 02:78:91:60:1e:80 brd ff:ff:ff:ff:ff:ff
 inet6 2602:fcfb:100::10/64 scope global
 valid_lft forever preferred_lft forever
 inet6 fe80::78:91ff:fe60:1e80/64 scope link
 valid_lft forever preferred_lft forever
 5: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
 link/ether 0a:39:34:14:6a:36 brd ff:ff:ff:ff:ff:ff
 inet6 2602:fcfb:1d:2::2/64 scope global
 valid_lft forever preferred_lft forever
 inet6 fe80::839:34ff:fe14:6a36/64 scope link
 valid_lft forever preferred_lft forever
 6: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
 link/ether f2:c7:6e:00:6e:83 brd ff:ff:ff:ff:ff:ff
 inet 10.233.0.1/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.0.3/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.42.207/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.29.52/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.45.34/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.16.205/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.61.6/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.17.21/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.46.5/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.51.50/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.15.146/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.8.168/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.60.15/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.24.72/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.57.146/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.11.198/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.39.214/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.30.20/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.23.34/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.53.112/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.55.86/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.7.33/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.1.29/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet 10.233.32.37/32 scope global kube-ipvs0
 valid_lft forever preferred_lft forever
 inet6 2602:fcfb:1d:2::31/128 scope global
 valid_lft forever preferred_lft forever
 inet6 fd85:ee78:d8a6:8607::11d7/128 scope global
 valid_lft forever preferred_lft forever
 inet6 2602:fcfb:1d:2::30/128 scope global
 valid_lft forever preferred_lft forever
 inet6 fd85:ee78:d8a6:8607::1417/128 scope global
 valid_lft forever preferred_lft forever
 10: cali5260a6a960b@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-f19391b0-0d22-5e0a-091b-f26982131f4a
 inet6 fe80::ecee:eeff:feee:eeee/64 scope link
 valid_lft forever preferred_lft forever
 11: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
 link/ether e6:a9:23:78:b6:d9 brd ff:ff:ff:ff:ff:ff
 inet 169.254.25.10/32 scope global nodelocaldns
 valid_lft forever preferred_lft forever
 23: califc30ed8b084@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-0e75fb3c-1182-2e6e-16c9-80ba6ce3844f
 inet6 fe80::ecee:eeff:feee:eeee/64 scope link
 valid_lft forever preferred_lft forever
 821: cali0894f966cdf@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-ee474322-80e1-7721-a798-bc29a08f2339
 inet6 fe80::ecee:eeff:feee:eeee/64 scope link
 valid_lft forever preferred_lft forever
 855: cali68b9b56a958@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-e232ac9b-3c1d-fe2e-dd33-cc9f06d121a2
 inet6 fe80::ecee:eeff:feee:eeee/64 scope link
 valid_lft forever preferred_lft forever
 741: cali3de775f9cb2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-3b3f6547-73f2-7f83-add2-d43a23bc1a83
 inet6 fe80::ecee:eeff:feee:eeee/64 scope link
 valid_lft forever preferred_lft forever
 747: cali0fe3bc102a1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-9d7ab47a-4e5c-cb1b-86da-65fb323732fb
 inet6 fe80::ecee:eeff:feee:eeee/64 scope link
 valid_lft forever preferred_lft forever
 754: calif913c9d9a21@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-7b67f49b-0c8a-cf9d-e976-0439a6057cdd
 inet6 fe80::ecee:eeff:feee:eeee/64 scope link
 valid_lft forever preferred_lft forever
 767: calieb3159a0e6d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-0841de8e-b94c-b3cb-50d0-60b2156da119
 inet6 fe80::ecee:eeff:feee:eeee/64 scope link
 valid_lft forever preferred_lft forever
 root@node1:/home/ubuntu# ip route
 default via 10.143.1.1 dev ens7
 10.30.6.0/23 dev ens3 proto kernel scope link src 10.30.6.141
 10.143.1.0/24 dev ens7 proto kernel scope link src 10.143.1.2
 10.233.71.0/26 via 10.143.1.4 dev ens7 proto bird
 10.233.74.64/26 via 10.143.1.5 dev ens7 proto bird
 10.233.75.0/26 via 10.143.1.3 dev ens7 proto bird
 blackhole 10.233.102.128/26 proto bird
 10.233.102.129 dev cali5260a6a960b scope link
 10.233.102.134 dev cali68b9b56a958 scope link
 10.233.102.139 dev califc30ed8b084 scope link
 10.233.102.161 dev cali3de775f9cb2 scope link
 10.233.102.166 dev cali0fe3bc102a1 scope link
 10.233.102.173 dev calif913c9d9a21 scope link
 10.233.102.185 dev cali0894f966cdf scope link
 10.233.102.188 dev calieb3159a0e6d scope link
 root@node1:/home/ubuntu# ip -6 route
 ::1 dev lo proto kernel metric 256 pref medium
 2001:400:a100:3090::/64 dev ens3 proto ra metric 100 expires 86399sec pref medium
 2602:fcfb:1d:2::/64 dev ens9 proto kernel metric 256 pref medium
 2602:fcfb:1d:2::/64 dev ens9 metric 1024 pref medium
 2602:fcfb:100::/64 dev ens8 proto kernel metric 256 pref medium
 2602:fcfb:100::/64 dev ens8 metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:340/122 via 2602:fcfb:1d:2::5 dev ens9 proto bird metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:6800/122 via 2602:fcfb:1d:2::3 dev ens9 proto bird metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:8700/122 via 2602:fcfb:1d:2::4 dev ens9 proto bird metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:a681 dev cali5260a6a960b metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:a686 dev cali68b9b56a958 metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:a68b dev califc30ed8b084 metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:a6a1 dev cali3de775f9cb2 metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:a6a6 dev cali0fe3bc102a1 metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:a6ad dev calif913c9d9a21 metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:a6b9 dev cali0894f966cdf metric 1024 pref medium
 fd85:ee78:d8a6:8607::1:a6bc dev calieb3159a0e6d metric 1024 pref medium
 blackhole fd85:ee78:d8a6:8607::1:a680/122 dev lo proto bird metric 1024 pref medium
 fe80::a9fe:a9fe via fe80::f816:3eff:feac:1ca0 dev ens3 proto ra metric 1024 expires 299sec pref medium
 fe80::/64 dev ens3 proto kernel metric 256 pref medium
 fe80::/64 dev ens7 proto kernel metric 256 pref medium
 fe80::/64 dev ens8 proto kernel metric 256 pref medium
 fe80::/64 dev ens9 proto kernel metric 256 pref medium
 fe80::/64 dev cali5260a6a960b proto kernel metric 256 pref medium
 fe80::/64 dev califc30ed8b084 proto kernel metric 256 pref medium
 fe80::/64 dev cali3de775f9cb2 proto kernel metric 256 pref medium
 fe80::/64 dev cali0fe3bc102a1 proto kernel metric 256 pref medium
 fe80::/64 dev calif913c9d9a21 proto kernel metric 256 pref medium
 fe80::/64 dev calieb3159a0e6d proto kernel metric 256 pref medium
 fe80::/64 dev cali0894f966cdf proto kernel metric 256 pref medium
 fe80::/64 dev cali68b9b56a958 proto kernel metric 256 pref medium
 default via 2602:fcfb:1d:2::1 dev ens9 metric 10 pref medium
 default via fe80::f816:3eff:feac:1ca0 dev ens3 proto ra metric 100 expires 299sec mtu 9000 pref medium
 root@node1:/home/ubuntu# ip -6 rule show
 0: from all lookup local
 32761: from 2602:fcfb:100::/64 lookup v6peering
 32762: from 2001:400:a100:3090::/64 lookup admin
 32766: from all lookup main
 root@node1:/home/ubuntu# ip -6 route show table admin
 default via fe80::f816:3eff:feac:1ca0 dev ens3 metric 1024 pref mediumroot@node1:/home/ubuntu# systemctl status systemd-networkd 
 β systemd-networkd.service – Network Service
 Loaded: loaded (/lib/systemd/system/systemd-networkd.service; enabled; vendor preset: enabled)
 Active: active (running) since Fri 2023-04-07 06:46:29 UTC; 1 months 2 days ago
 TriggeredBy: β systemd-networkd.socket
 Docs: man:systemd-networkd.service(8)
 Main PID: 313692 (systemd-network)
 Status: “Processing requests…”
 Tasks: 1 (limit: 464085)
 Memory: 21.9M
 CGroup: /system.slice/systemd-networkd.service
 ββ313692 /lib/systemd/systemd-networkdMay 09 19:29:21 node1 systemd-networkd[313692]: califcd611d2e5a: Lost carrier 
 May 09 19:29:23 node1 systemd-networkd[313692]: cali6d43fe7c75d: Link DOWN
 May 09 19:29:23 node1 systemd-networkd[313692]: cali6d43fe7c75d: Lost carrier
 May 09 19:54:08 node1 systemd-networkd[313692]: cali0905f2a129d: Link DOWN
 May 09 19:54:08 node1 systemd-networkd[313692]: cali0905f2a129d: Lost carrier
 May 09 19:54:45 node1 systemd-networkd[313692]: calidba7a9afee0: Link DOWN
 May 09 19:54:45 node1 systemd-networkd[313692]: calidba7a9afee0: Lost carrier
 May 09 19:55:37 node1 systemd-networkd[313692]: cali68b9b56a958: Link UP
 May 09 19:55:37 node1 systemd-networkd[313692]: cali68b9b56a958: Gained carrier
 May 09 19:55:39 node1 systemd-networkd[313692]: cali68b9b56a958: Gained IPv6LLMay 9, 2023 at 5:30 pm #4187Thank you for the output. I will look at this carefully later, but with a quick view, it looks like the default IPv6 route is switched to the dataplane (can be FABv6) network (see the output from ip -6 route). Management (ssh) access is routed from the ens3 interface (IPv6 2001:400:a100:3090:f816:3eff:fe5c:dd28/64 ) and default IPv6 gateway for this subnet is 2001:400:a100:3090::1. Looks like a configuration change occurred. May 9, 2023 at 5:48 pm #4188Yep. We configured the default route to be on the public network on the dataplane. The management network is routed via policy based routing. May 9, 2023 at 7:17 pm #4189It seems that the root cause of the connectivity issue is known then, probably you need to review your policy based routing. Do you still need anything from us on this? May 9, 2023 at 7:51 pm #4190It appears the linklocal address of the router is stale. So it looks as if it lost the layer 2 connection. The policy based routing seems ok and it’s the same as with the other two nodes that are working. ip -6 nei 
 fe80::f816:3eff:feac:1ca0 dev ens3 lladdr fa:16:3e:ac:1c:a0 router STALEMay 9, 2023 at 8:43 pm #4192Fengping, This is Ubuntu? We configured the default route to be on the public network on the dataplane. The management network is routed via policy based routing. Can you post that config? Are the PBR rules based on interface? IP? May 10, 2023 at 2:12 pm #4196Hi David, Yep. It’s an ubuntu. root@node1:/home/ubuntu# ip -6 rule list 
 0: from all lookup local
 32761: from 2602:fcfb:100::/64 lookup v6peering
 32762: from 2001:400:a100:3090::/64 lookup admin
 32766: from all lookup main
 root@node1:/home/ubuntu# ip -6 route show table admin
 default via fe80::f816:3eff:feac:1ca0 dev ens3 metric 1024 pref mediumThanks, 
 FengpingMay 11, 2023 at 3:38 pm #4215@Fengping This seems like an issue with the policy based routing.Β Is there a way to make your setup more resilient to external factors?Β Β The management network connects to the local campus or provider. I suspect we can’t control everything on the management network. Paul May 11, 2023 at 3:55 pm #4220Thanks for the clarification. We do still have access to the VM via the data plane public ip. So I think our setup is resilient in that sense. Thanks, 
 FengpingMay 11, 2023 at 4:33 pm #4221@Fengpin – our authority at individual sites w.r.t. management plane connections end on our switch, so if the campus network blinks, we have no control over that. Is there a way for you to force PBR to pick the default via management connection again? May 12, 2023 at 6:04 pm #4229Thanks Ilya. I tried ip link down and up and this was able to bring the management network online again. I would just need to readd PBR since ip link down would clear that it seems. But that’s not a problem.Β With what you said I think I will just manually deal with potential blinks and not worry about it too much. Fengping 
- 
		AuthorPosts
- You must be logged in to reply to this topic.
