1. Cannot SSH into NS1 and NS5 nodes, need to preserve data (PhD simulations)

Cannot SSH into NS1 and NS5 nodes, need to preserve data (PhD simulations)

Home Forums FPGAs in FABRIC Cannot SSH into NS1 and NS5 nodes, need to preserve data (PhD simulations)

Tagged: 

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #9188

    Hello,

    I am user daniloassis@utfpr.edu.br, working on the project “Vanets and Fanets simulations for research”.

    I have several slices/nodes in this project (NS1, NS2, NS3, NS4, NS5, NS6, NS7, NS8, NS9, Dados, etc.). Until recently, I was able to SSH into all of them normally using my current SSH key.

    However, now I can no longer log into nodes NS1 and NS5, while all the other nodes in the same project are still accessible with the same SSH configuration and key.

    The issue started after the workstation I use to access FABRIC had a crash in the graphical interface (desktop). After rebooting my local machine, I was still able to SSH into the other nodes in the project, but NS1 and NS5 became inaccessible.

    When I try to SSH into one of the problematic nodes (for example NS5 at 2607:f018:110:11:f816:3eff:fefe:d0e9), I get the following error (with -vvv):

    Authentications that can continue: publickey
    Next authentication method: publickey
    Offering public key: /root/.ssh/silver ECDSA …
    Auth methods that can continue: publickey
    No more authentication methods to try.
    ubuntu@2607:f018:110:11:f816:3eff:fefe:d0e9: Permission denied (publickey).

    The same SSH key and configuration work correctly for the other nodes in the project (e.g., Dados, NS2, NS3, NS4, NS6, NS7, NS8, NS9).

    Additionally, when I run the addkey POA for my slices, it succeeds on most nodes but fails specifically on NS1 and NS5. The error messages are:

    For NS5:

    POA – 95a4f6ea-7516-4f92-b9e3-2a37134f6ecf/addkey failed with error:
    Exception during poa for unit: cfe0da27-fa20-4ee9-8087-3d4a7e7ba283
    msg Playbook has failed tasks: All items completed

    For NS1:

    POA – cd4bde74-7a9f-4ca6-aec7-0963ad7b47ce/addkey failed with error:
    Exception during poa for unit: b8a6f13a-667c-4bbf-b6c5-eba2a0eda936
    msg Playbook has failed tasks: All items completed

    These nodes contain important simulation data for my PhD research, so I cannot delete or reprovision NS1 and NS5, as I need to preserve the data stored on their disks.

    Could you please:

    1. Investigate why SSH public key authentication is failing only on NS1 and NS5, while it works on the other nodes in the same project, and why the addkey POA is failing for these two nodes; and

    2. Either:

    force-install my current SSH public key for user ubuntu on NS1 and NS5 so I can log in again, or

    attach/mount the disks of NS1 and NS5 on a new VM/slice so that I can recover and copy my simulation data.

     

    If you need more details (exact slice names, site names, or timestamps), I will be happy to provide them.

    Thank you very much for your help.

    Best regards,
    Danilo Assis
    UTFPR – PhD student

    #9189
    Komal Thareja
    Participant

      Hi Danilo,

      I found that the authorized_keys file on both NS1 and NS5 was empty, which is why SSH—whether through the admin key or the Control Framework—was failing resulting in POA/addKey failure. It seems this may have happened unintentionally as part of the experiment.

      I’ve manually restored SSH access so the Control Framework should now function properly, including POA. Could you please try adding your keys to these VMs again using POA? That should re-establish your SSH access.

      Please be careful not to remove or overwrite the authorized_keys file in the process.

      Best,

      Komal

      #9190

      Yes, I managed to connect. Thanks for your help.

    Viewing 3 posts - 1 through 3 (of 3 total)
    • The topic ‘Cannot SSH into NS1 and NS5 nodes, need to preserve data (PhD simulations)’ is closed to new replies.