Home › Forums › FABRIC General Questions and Discussion › Lost SSH login to a node
- This topic has 6 replies, 3 voices, and was last updated 5 days, 16 hours ago by
Mert Cevik.
-
AuthorPosts
-
February 9, 2026 at 8:10 pm #9491
Hi,
I’ve suddenly lost ssh connection to a vm today, slice id : fe17fbc7-d6ad-4d71-9305-8457f52e9ba4
sliver and bastion seem to be fine since i can access my other experiments.
I’ve also tried logging in via fabric but no success. Is there a way to salvage ssh connection to this slice since i’ve some important results stored there.Thanks,
KhawarFebruary 10, 2026 at 8:36 am #9492Hello Khawar,
Your VM was shut down by the hypervisor and I started it now. Please let us know if you have any other issues. We will be investigating the main cause of this shut down internally.
Best regards,
MertFebruary 10, 2026 at 11:03 am #9493Thanks Mert for having a look. I’m still unable to login however. Here’s the error that im getting now :
ssh -F ~/.ssh/fabric_ssh_config -i ~/.ssh/sliver ubuntu@2001:400:a100:3030:f816:3eff:fe30:9bac 255 ✘ base 10:01:17 a.m.
Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
channel 0: open failed: connect failed: Connection refused
stdio forwarding failed
Connection closed by UNKNOWN port 65535Best,
KFebruary 10, 2026 at 12:41 pm #9494I checked your VM and found it in a crashed state. I’m not sure about the reason, when/how it was crashed or rebooted without digging into the logs, but the worker node (star-w2) it’s running on is fully occupied with VMs and we will look into possible out of memory issues on the hypervisor. It can be good if you re-create this VM on another worker node on STAR or use a smaller flavor to run a VM on star-w2.
February 10, 2026 at 3:00 pm #9495I’m not sure if i understand correctly. To give some context, ive one node serving as the master node in the cluster – connected to a 8 node setup on cloudlab side. While the cloudlab side of the cluster is working fine – i’ve just lost the access to the master node on fabric which had all the data.
Coming back to your reply above, I’m not sure what node/setup are you referring to by “star-w2”. I did however create another experiment with similar topology which is working fine.
Although I would much appreciate if the node that i lost access to can be restored since it has some critical data.Thanks.
February 10, 2026 at 5:34 pm #9496@Mert / @Khawar,
I attempted to recover the VM last night and shut it down as part of the process. During the investigation, I noticed that the
/home/ubuntu/.sshdirectory was missing from the VM. I tried to restore the SSH keys to regain access, but subsequently found that the VM was no longer bootable and consistently failed with filesystem errors.Further inspection showed that
/etc/fstabon the VM had been modified:LABEL=cloudimg-rootfs / ext4 discard,errors=remount-ro 0 1 LABEL=UEFI /boot/efi vfat umask=0077 0 1 vm0:/myvol /gss glusterfs defaults,_netdev,nofail 0 0I attempted to revert the
/etc/fstabchanges, but was unable to recover to a bootable state. It appears these modifications may have been introduced as part of your experiment, possibly unintentionally.Please be mindful when making system-level changes during experiments. In some cases, recovery is not possible if the VM state has been significantly altered and the changes are not fully known.
Best,
Komal
February 11, 2026 at 10:20 am #9498Thank you Komal for the information.
Khawar, can you please describe the directory where your “critical data” resides on the VM?
-
AuthorPosts
- You must be logged in to reply to this topic.