Unable to SSH into my Nodes

This topic has 9 replies, 3 voices, and was last updated 2 months, 2 weeks ago by Ajay Kumar.

Viewing 10 posts - 1 through 10 (of 10 total)

Author

Posts

April 2, 2025 at 12:06 pm #8407

Participant

Hello,
I am facing a problem when I SSH to all nodes in my slice.

ID	214f735b-7760-4efd-88c5-93c3c739836f
Name	P4DPDK_HH20
Lease Expiration (UTC)	2025-04-12 19:03:34 +0000
Lease Start (UTC)	2025-01-19 20:03:34 +0000
Project ID	8eaa3ec2-65e7-49a3-8c09-e1761141a6ad
State	StableOK

error message when I SSH:
Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
choueiri_0000118746@bastion.fabric-testbed.net: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535

error message when I run commands through jupyter:
AuthenticationException: Authentication failed.

April 2, 2025 at 3:34 pm #8409

Komal Thareja

Participant

Hi Samia,

I verified all the VMs in your slice are accessible via SSH. The error you are noticing is probably because of expired bastion keys. Could you please try to re-execute the notebook: jupyter-examples-rel1.8.1/configure_and_validate/configure_and_validate.ipynb ?

This shall renew your bastion keys. If you are doing SSH from your laptop, please download the renewed bastion keys from /home/fabrirc/work/fabric_config directory after executing the notebook above to replace the keys in .ssh directory.

Please let me know if you run into any issues or have questions.

Thanks,

Komal

April 2, 2025 at 4:24 pm #8410

Samia Choueiri

Participant

Thank you Komal, I am using the jupyter hub for now and it works.

May 20, 2025 at 4:49 pm #8510

Ajay Kumar

Participant

Does anyone know, how to reboot a node even if ping and ssh not working to that same node?

May 21, 2025 at 5:54 am #8511

Komal Thareja

Participant

Hi Ajay,

You can use the following code snippet to reboot the node:
slice = fablib.get_slice(slice_name) node = slice.get_node(node_name) node.os_reboot()

Also, please share your slice ID so we can take a look at it.

Thanks,

Komal

May 21, 2025 at 12:36 pm #8515

Ajay Kumar

Participant

Thank you so much, Komal, it worked for me.

Following on that, I noticed my interface (enp9s0) is not found, earlier it was there. I have used this interface to connect with other nodes in the cluster. Could you please help me to make it UP again?

(base) ubuntu@Node4:~$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
RX packets 179 bytes 18635 (18.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 168 bytes 22384 (22.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 110 bytes 8928 (8.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 110 bytes 8928 (8.9 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

May 21, 2025 at 12:40 pm #8517

Komal Thareja

Participant

Please share your slice ID and also the output of the command: ifconfig -a

Thanks,

Komal

May 21, 2025 at 12:42 pm #8518

Ajay Kumar

Participant

My Slice ID: 09255c48-5512-4e3c-bdc6-ad7d4fd37d07
Output of ifconfig -a command:

(base) ubuntu@Node4:~$ ifconfig -a
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
RX packets 541 bytes 53260 (53.2 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 417 bytes 56772 (56.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 114 bytes 9436 (9.4 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 114 bytes 9436 (9.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

May 21, 2025 at 1:14 pm #8519

Komal Thareja

Participant

Could you please check your VM again?

All PCI devices had been disconnected. I have reconnected them to your VM. Please check it.

Also, could you please share the sequence of operations that lead your VM to this state?

It would be helpful to see if there is anything that needs to be fixed on our control software.

Thanks,

Komal

May 21, 2025 at 2:31 pm #8521

Ajay Kumar

Participant

Thank you very much, now it works fine, double hands up for your help, Komal.

Author

Posts

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.