1. Lost SSH login to a node

Lost SSH login to a node

Home Forums FABRIC General Questions and Discussion Lost SSH login to a node

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #9491
    Khawar Shehzad
    Participant

      Hi,

      I’ve suddenly lost ssh connection to a vm today, slice id : fe17fbc7-d6ad-4d71-9305-8457f52e9ba4
      sliver and bastion seem to be fine since i can access my other experiments.
      I’ve also tried logging in via fabric but no success. Is there a way to salvage ssh connection to this slice since i’ve some important results stored there.

      Thanks,
      Khawar

      #9492
      Mert Cevik
      Moderator

        Hello Khawar,

        Your VM was shut down by the hypervisor and I started it now. Please let us know if you have any other issues. We will be investigating the main cause of this shut down internally.

        Best regards,
        Mert

        #9493
        Khawar Shehzad
        Participant

          Thanks Mert for having a look. I’m still unable to login however. Here’s the error that im getting now :

          ssh -F ~/.ssh/fabric_ssh_config -i ~/.ssh/sliver ubuntu@2001:400:a100:3030:f816:3eff:fe30:9bac  255 ✘ base  10:01:17 a.m. 

          Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
          channel 0: open failed: connect failed: Connection refused
          stdio forwarding failed
          Connection closed by UNKNOWN port 65535

          Best,
          K

          #9494
          Mert Cevik
          Moderator

            I checked your VM and found it in a crashed state. I’m not sure about the reason, when/how it was crashed or rebooted without digging into the logs, but the worker node (star-w2) it’s running on is fully occupied with VMs and we will look into possible out of memory issues on the hypervisor. It can be good if you re-create this VM on another worker node on STAR or use a smaller flavor to run a VM on star-w2.

            #9495
            Khawar Shehzad
            Participant

              I’m not sure if i understand correctly. To give some context, ive one node serving as the master node in the cluster – connected to a 8 node setup on cloudlab side. While the cloudlab side of the cluster is working fine – i’ve just lost the access to the master node on fabric which had all the data.
              Coming back to your reply above, I’m not sure what node/setup are you referring to by “star-w2”. I did however create another experiment with similar topology which is working fine.
              Although I would much appreciate if the node that i lost access to can be restored since it has some critical data.

              Thanks.

              #9496
              Komal Thareja
              Participant

                @Mert / @Khawar,

                I attempted to recover the VM last night and shut it down as part of the process. During the investigation, I noticed that the /home/ubuntu/.ssh directory was missing from the VM. I tried to restore the SSH keys to regain access, but subsequently found that the VM was no longer bootable and consistently failed with filesystem errors.

                Further inspection showed that /etc/fstab on the VM had been modified:

                LABEL=cloudimg-rootfs / ext4 discard,errors=remount-ro 0 1
                LABEL=UEFI /boot/efi vfat umask=0077 0 1
                vm0:/myvol /gss glusterfs defaults,_netdev,nofail 0 0
                

                I attempted to revert the /etc/fstab changes, but was unable to recover to a bootable state. It appears these modifications may have been introduced as part of your experiment, possibly unintentionally.

                Please be mindful when making system-level changes during experiments. In some cases, recovery is not possible if the VM state has been significantly altered and the changes are not fully known.

                Best,

                Komal

                #9498
                Mert Cevik
                Moderator

                  Thank you Komal for the information.

                  Khawar, can you please describe the directory where your “critical data” resides on the VM?

                Viewing 7 posts - 1 through 7 (of 7 total)
                • You must be logged in to reply to this topic.