1. Unable to SSH into my Nodes

Unable to SSH into my Nodes

Home Forums FABRIC General Questions and Discussion Unable to SSH into my Nodes

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #8407
    Samia Choueiri
    Participant

      Hello,
      I am facing a problem when I SSH to all nodes in my slice.

      ID 214f735b-7760-4efd-88c5-93c3c739836f
      Name P4DPDK_HH20
      Lease Expiration (UTC) 2025-04-12 19:03:34 +0000
      Lease Start (UTC) 2025-01-19 20:03:34 +0000
      Project ID 8eaa3ec2-65e7-49a3-8c09-e1761141a6ad
      State StableOK

      error message when I SSH:
      Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
      choueiri_0000118746@bastion.fabric-testbed.net: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
      kex_exchange_identification: Connection closed by remote host
      Connection closed by UNKNOWN port 65535

      error message when I run commands through jupyter:
      AuthenticationException: Authentication failed.

      #8409
      Komal Thareja
      Participant

        Hi Samia,

        I verified all the VMs in your slice are accessible via SSH. The error you are noticing is probably because of expired bastion keys. Could you please try to re-execute the notebook: jupyter-examples-rel1.8.1/configure_and_validate/configure_and_validate.ipynb ?

        This shall renew your bastion keys. If you are doing SSH from your laptop, please download the renewed bastion keys from /home/fabrirc/work/fabric_config directory after executing the notebook above to replace the keys in .ssh directory.

        Please let me know if you run into any issues or have questions.

        Thanks,

        Komal

        #8410
        Samia Choueiri
        Participant

          Thank you Komal, I am using the jupyter hub for now and it works.

          #8510
          Ajay Kumar
          Participant

            Does anyone know, how to reboot a node even if ping and ssh not working to that same node?

            #8511
            Komal Thareja
            Participant

              Hi Ajay,

              You can use the following code snippet to reboot the node:

              slice = fablib.get_slice(slice_name)
              node = slice.get_node(node_name)
              node.os_reboot()

              Also, please share your slice ID so we can take a look at it.

              Thanks,

              Komal

              #8515
              Ajay Kumar
              Participant

                Thank you so much, Komal, it worked for me.

                Following on that, I noticed my interface (enp9s0) is not found, earlier it was there. I have used this interface to connect with other nodes in the cluster. Could you please help me to make it UP again?

                (base) ubuntu@Node4:~$ ifconfig
                docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
                inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
                ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
                RX packets 0 bytes 0 (0.0 B)
                RX errors 0 dropped 0 overruns 0 frame 0
                TX packets 0 bytes 0 (0.0 B)
                TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
                inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
                inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
                ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
                RX packets 179 bytes 18635 (18.6 KB)
                RX errors 0 dropped 0 overruns 0 frame 0
                TX packets 168 bytes 22384 (22.3 KB)
                TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
                inet 127.0.0.1 netmask 255.0.0.0
                inet6 ::1 prefixlen 128 scopeid 0x10<host>
                loop txqueuelen 1000 (Local Loopback)
                RX packets 110 bytes 8928 (8.9 KB)
                RX errors 0 dropped 0 overruns 0 frame 0
                TX packets 110 bytes 8928 (8.9 KB)
                TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                #8517
                Komal Thareja
                Participant

                  Please share your slice ID and also the output of the command: ifconfig -a

                  Thanks,

                  Komal

                  #8518
                  Ajay Kumar
                  Participant

                    My Slice ID: 09255c48-5512-4e3c-bdc6-ad7d4fd37d07
                    Output of ifconfig -a command:

                    (base) ubuntu@Node4:~$ ifconfig -a
                    docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
                    inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
                    ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
                    RX packets 0 bytes 0 (0.0 B)
                    RX errors 0 dropped 0 overruns 0 frame 0
                    TX packets 0 bytes 0 (0.0 B)
                    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                    enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
                    inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
                    inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
                    ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
                    RX packets 541 bytes 53260 (53.2 KB)
                    RX errors 0 dropped 0 overruns 0 frame 0
                    TX packets 417 bytes 56772 (56.7 KB)
                    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
                    inet 127.0.0.1 netmask 255.0.0.0
                    inet6 ::1 prefixlen 128 scopeid 0x10<host>
                    loop txqueuelen 1000 (Local Loopback)
                    RX packets 114 bytes 9436 (9.4 KB)
                    RX errors 0 dropped 0 overruns 0 frame 0
                    TX packets 114 bytes 9436 (9.4 KB)
                    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                    #8519
                    Komal Thareja
                    Participant

                      Could you please check your VM again?

                      All PCI devices had been disconnected. I have reconnected them to your VM. Please check it.

                      Also, could you please share the sequence of operations that lead your VM to this state?

                      It would be helpful to see if there is anything that needs to be fixed on our control software.

                      Thanks,

                      Komal

                      #8521
                      Ajay Kumar
                      Participant

                        Thank you very much, now it works fine, double hands up for your help, Komal.

                      Viewing 10 posts - 1 through 10 (of 10 total)
                      • You must be logged in to reply to this topic.