Lost SSH login to a node

This topic has 6 replies, 3 voices, and was last updated 1 month, 2 weeks ago by Mert Cevik.

Viewing 7 posts - 1 through 7 (of 7 total)

Author

Posts
February 9, 2026 at 8:10 pm #9491
Khawar Shehzad
Participant
Hi,

I’ve suddenly lost ssh connection to a vm today, slice id : fe17fbc7-d6ad-4d71-9305-8457f52e9ba4
sliver and bastion seem to be fine since i can access my other experiments.
I’ve also tried logging in via fabric but no success. Is there a way to salvage ssh connection to this slice since i’ve some important results stored there.

Thanks,
Khawar
February 10, 2026 at 8:36 am #9492
Mert Cevik
Moderator
Hello Khawar,

Your VM was shut down by the hypervisor and I started it now. Please let us know if you have any other issues. We will be investigating the main cause of this shut down internally.

Best regards,
Mert
February 10, 2026 at 11:03 am #9493
Khawar Shehzad
Participant
Thanks Mert for having a look. I’m still unable to login however. Here’s the error that im getting now :

ssh -F ~/.ssh/fabric_ssh_config -i ~/.ssh/sliver ubuntu@2001:400:a100:3030:f816:3eff:fe30:9bac  255 ✘ base  10:01:17 a.m. 

Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
channel 0: open failed: connect failed: Connection refused
stdio forwarding failed
Connection closed by UNKNOWN port 65535

Best,
K
February 10, 2026 at 12:41 pm #9494
Mert Cevik
Moderator
I checked your VM and found it in a crashed state. I’m not sure about the reason, when/how it was crashed or rebooted without digging into the logs, but the worker node (star-w2) it’s running on is fully occupied with VMs and we will look into possible out of memory issues on the hypervisor. It can be good if you re-create this VM on another worker node on STAR or use a smaller flavor to run a VM on star-w2.
February 10, 2026 at 3:00 pm #9495
Khawar Shehzad
Participant
I’m not sure if i understand correctly. To give some context, ive one node serving as the master node in the cluster – connected to a 8 node setup on cloudlab side. While the cloudlab side of the cluster is working fine – i’ve just lost the access to the master node on fabric which had all the data.
Coming back to your reply above, I’m not sure what node/setup are you referring to by “star-w2”. I did however create another experiment with similar topology which is working fine.
Although I would much appreciate if the node that i lost access to can be restored since it has some critical data.

Thanks.
February 10, 2026 at 5:34 pm #9496
Komal Thareja
Participant
@Mert / @Khawar,

I attempted to recover the VM last night and shut it down as part of the process. During the investigation, I noticed that the /home/ubuntu/.ssh directory was missing from the VM. I tried to restore the SSH keys to regain access, but subsequently found that the VM was no longer bootable and consistently failed with filesystem errors.

Further inspection showed that /etc/fstab on the VM had been modified:
```
LABEL=cloudimg-rootfs / ext4 discard,errors=remount-ro 0 1
LABEL=UEFI /boot/efi vfat umask=0077 0 1
vm0:/myvol /gss glusterfs defaults,_netdev,nofail 0 0
```
I attempted to revert the /etc/fstab changes, but was unable to recover to a bootable state. It appears these modifications may have been introduced as part of your experiment, possibly unintentionally.

Please be mindful when making system-level changes during experiments. In some cases, recovery is not possible if the VM state has been significantly altered and the changes are not fully known.

Best,

Komal
February 11, 2026 at 10:20 am #9498
Mert Cevik
Moderator
Thank you Komal for the information.

Khawar, can you please describe the directory where your “critical data” resides on the VM?
Author

Posts

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.