1. 3/4 nodes in slice not accessible via SSH

3/4 nodes in slice not accessible via SSH

Home Forums FABRIC General Questions and Discussion 3/4 nodes in slice not accessible via SSH

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #8335
    Pete Stenger
    Participant

      When I try to connect to my setup, I get an output like:

      
      VM Name: wgclient
      VM IP: 192.168.1.2
      Reservation Active
      SSH working? True
      
      VM Name: wgnet-1
      VM IP: 192.168.1.3
      Reservation Active
      SSH working? False
      
      VM Name: wgnet-2
      VM IP: 192.168.1.4
      Reservation Active
      SSH working? False
      
      VM Name: wgnet-3
      VM IP: 192.168.1.5
      Reservation Active
      SSH working? False
      

      Project ID: “2aaaea18-5cf9-497a-ade0-b4f51112a34d”

      I am running locally using a token in my token.json I generated through the credential manager (not using JupyterHub). The configuration is shown in the screenshot below.

       

      This is the created nodes / slices:

      This is the creation code:

      This is the code to fetch if the reservations are active, and SSH is working. I also tried variants of node.os_reboot() and node.config().


      for vm_name in ["wgclient", "wgnet-1", "wgnet-2", "wgnet-3"]:
      print('VM Name:', vm_name)
      node = slice.get_node(vm_name)
      print(f"VM IP: {node.get_interfaces()[0].get_ip_addr()}")
      print('Reservation', node.get_reservation_state())
      print('SSH working?', node.test_ssh())

      I’m not sure exactly what’s going on, but I can only access the “wgclient” VM via ssh, not my “wgnet-*” VMs.

      When I recreate the slice, and try to connect, it shows all 4 are available, but ~30 minutes later, only the “wgclient” VM is available. I run a script on each VM installing some tooling, and it takes ~7 minutes per VM to do this. After they all have the tooling installed, I can only connect to the “wgclient” VM via SSH.

      I want to guess that somehow the method I setup the network interface with is incorrect? I also ensured that my Bastion and Sliver SSH keys aren’t expired on the site under my profile > SSH Keys.

      Full stack trace in /tmp/fablib/fablib.log

      
      [22:51:05] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/fablib.py:1158} INFO - orchestrator_host=orchestrator.fabric-testbed.net,credmgr_host=cm.fabric-testbed.net,core_api_host=uis.fabric-testbed.net,am_host=artifacts.fabric-testbed.net,project_id=b24ba048-5b54-4034-b49f-16f8fbf3e35f,token_location=/home/retep/repos/cs538-project/local/token.json,initialize=True,scope='all'
      [22:51:06] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/fabrictestbed/token_manager/token_manager.py:164} INFO - Project Id/Name not specified, trying to determine it from the token
      [22:51:16] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/fablib.py:955} INFO - Fetching User's information
      [22:51:17] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/fablib.py:987} INFO - User: peteras4@illinois.edu bastion key is valid!
      [22:51:31] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection refused: Connect failed
      [22:51:31] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1944} ERROR - Exception (client): Error reading SSH protocol banner
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - Traceback (most recent call last):
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py", line 2369, in _check_banner
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     buf = self.packetizer.readline(timeout)
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/packet.py", line 395, in readline
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     buf += self._read_timeout(timeout)
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/packet.py", line 665, in _read_timeout
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     raise EOFError()
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - EOFError
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - During handling of the above exception, another exception occurred:
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - Traceback (most recent call last):
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py", line 2185, in run
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     self._check_banner()
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py", line 2373, in _check_banner
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     raise SSHException(
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: Error reading SSH protocol banner
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1944} ERROR - Exception (client): Error reading SSH protocol banner
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - Traceback (most recent call last):
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py", line 2369, in _check_banner
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     buf = self.packetizer.readline(timeout)
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/packet.py", line 395, in readline
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     buf += self._read_timeout(timeout)
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/packet.py", line 665, in _read_timeout
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     raise EOFError()
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - EOFError
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - During handling of the above exception, another exception occurred:
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - Traceback (most recent call last):
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py", line 2185, in run
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     self._check_banner()
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -   File "/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py", line 2373, in _check_banner
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR -     raise SSHException(
      [22:51:32] {/home/retep/repos/cs538-project/local/.venv/lib/python3.11/site-packages/paramiko/transport.py:1942} ERROR - paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
      
      • This topic was modified 3 weeks, 4 days ago by Pete Stenger.
      • This topic was modified 3 weeks, 4 days ago by Pete Stenger.
      • This topic was modified 3 weeks, 4 days ago by Pete Stenger.
      • This topic was modified 3 weeks, 4 days ago by Pete Stenger.
      • This topic was modified 3 weeks, 4 days ago by Pete Stenger.
      #8341
      Hussam Nasir
      Moderator

        Hi,

        I looked at logs on one of the failed nodes and found that the last command before the node failed was

        “sudo /usr/sbin/ldconfig /home/ubuntu/openssl/build/lib64/”

        This command results in the breaking of the sshd daemon running on the machine, thus causing you to loose your ssh connection. A reboot would fix the ssh because the library you built is not loaded.

        #8349
        Pete Stenger
        Participant

          Thank you!

        Viewing 3 posts - 1 through 3 (of 3 total)
        • The topic ‘3/4 nodes in slice not accessible via SSH’ is closed to new replies.