Home › Forums › FABRIC General Questions and Discussion › Unable to SSH into my Nodes
- This topic has 9 replies, 3 voices, and was last updated 1 month ago by
Ajay Kumar.
-
AuthorPosts
-
April 2, 2025 at 12:06 pm #8407
Hello,
I am facing a problem when I SSH to all nodes in my slice.ID 214f735b-7760-4efd-88c5-93c3c739836f Name P4DPDK_HH20 Lease Expiration (UTC) 2025-04-12 19:03:34 +0000 Lease Start (UTC) 2025-01-19 20:03:34 +0000 Project ID 8eaa3ec2-65e7-49a3-8c09-e1761141a6ad State StableOK error message when I SSH:
Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
choueiri_0000118746@bastion.fabric-testbed.net: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535error message when I run commands through jupyter:
AuthenticationException: Authentication failed.April 2, 2025 at 3:34 pm #8409Hi Samia,
I verified all the VMs in your slice are accessible via SSH. The error you are noticing is probably because of expired bastion keys. Could you please try to re-execute the notebook:
jupyter-examples-rel1.8.1/configure_and_validate/configure_and_validate.ipynb
?This shall renew your bastion keys. If you are doing SSH from your laptop, please download the renewed bastion keys from
/home/fabrirc/work/fabric_config
directory after executing the notebook above to replace the keys in.ssh
directory.Please let me know if you run into any issues or have questions.
Thanks,
Komal
April 2, 2025 at 4:24 pm #8410Thank you Komal, I am using the jupyter hub for now and it works.
May 20, 2025 at 4:49 pm #8510Does anyone know, how to reboot a node even if ping and ssh not working to that same node?
May 21, 2025 at 5:54 am #8511Hi Ajay,
You can use the following code snippet to reboot the node:
slice = fablib.get_slice(slice_name)
node = slice.get_node(node_name)
node.os_reboot()
Also, please share your slice ID so we can take a look at it.
Thanks,
Komal
May 21, 2025 at 12:36 pm #8515Thank you so much, Komal, it worked for me.
Following on that, I noticed my interface (enp9s0) is not found, earlier it was there. I have used this interface to connect with other nodes in the cluster. Could you please help me to make it UP again?
(base) ubuntu@Node4:~$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
RX packets 179 bytes 18635 (18.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 168 bytes 22384 (22.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 110 bytes 8928 (8.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 110 bytes 8928 (8.9 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0May 21, 2025 at 12:40 pm #8517Please share your slice ID and also the output of the command:
ifconfig -a
Thanks,
Komal
May 21, 2025 at 12:42 pm #8518My Slice ID: 09255c48-5512-4e3c-bdc6-ad7d4fd37d07
Output ofifconfig -a
command:(base) ubuntu@Node4:~$ ifconfig -a
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
RX packets 541 bytes 53260 (53.2 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 417 bytes 56772 (56.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 114 bytes 9436 (9.4 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 114 bytes 9436 (9.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0May 21, 2025 at 1:14 pm #8519Could you please check your VM again?
All PCI devices had been disconnected. I have reconnected them to your VM. Please check it.
Also, could you please share the sequence of operations that lead your VM to this state?
It would be helpful to see if there is anything that needs to be fixed on our control software.
Thanks,
Komal
May 21, 2025 at 2:31 pm #8521Thank you very much, now it works fine, double hands up for your help, Komal.
-
AuthorPosts
- You must be logged in to reply to this topic.