1. Unable to SSH to one node in a 3-node slice

Unable to SSH to one node in a 3-node slice

Home Forums FABRIC General Questions and Discussion Unable to SSH to one node in a 3-node slice

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #9562
    Tanay Maheshwari
    Participant

      Hello, I have the following slice:

      <caption>Slice</caption>

      ID f761a02e-dae0-4122-b0a1-40b6cffc84e6
      Name CEPH_DOCA_POC
      Lease Expiration (UTC) 2026-03-14 00:00:29 +0000
      Lease Start (UTC) 2026-02-25 00:53:21 +0000
      Project ID 42b3494b-982f-4fe8-b160-26f28c3e33c0
      State StableOK
      Email mahesh88@purdue.edu
      UserId 14e40626-117b-43fe-a9dd-89b0063d126d

      It has 3 nodes, i am able to ssh into 2 of them, not into the third one.
      Node details:

      <caption>Node</caption>

      ID 4cdae64a-1527-49fb-8be4-564216a16102
      Name node3-dpu
      Cores 16
      RAM 16
      Disk 100
      Image default_ubuntu_24
      Image Type qcow2
      Host hawi-w3.fabric-testbed.net
      Site HAWI
      Username ubuntu
      Management IP 2607:f278:1:202:f816:3eff:fe8b:7638
      State Active
      Error
      SSH Command ssh -i /home/fabric/work/fabric_config/slice_key -F /home/fabric/work/fabric_config/ssh_config ubuntu@2607:f278:1:202:f816:3eff:fe8b:7638
      Public SSH Key File /home/fabric/work/fabric_config/slice_key.pub
      Private SSH Key File /home/fabric/work/fabric_config/slice_key

      I was able to ssh with no issues a day back. Wondering what went wrong.

      Thanks,
      Tanay

      EDIT: Iam also able to ssh into the dpu inside node3, which means the node is definitely up and working, i believe some networking issue here? Maybe the node lost its IP 192.168.50.2 on the NIC?

      #9566
      Tanay Maheshwari
      Participant

        I also tried the following:

        slice = fablib.get_slice(name=”CEPH_DOCA_POC”)
        slice.show()
        # slice.delete()
        DPU_NODE_NAME = “node3-dpu”

        node = slice.get_node(name=DPU_NODE_NAME)
        node.show()
        node.execute(“ip addr”)
        node.execute(“sudo ip addr add 192.168.50.2/24 dev enp8s0”)

        Fabric returned this error:

        File /opt/conda/lib/python3.11/site-packages/paramiko/transport.py:1130, in Transport.open_channel(self, kind, dest_addr, src_addr, window_size, max_packet_size, timeout)
           1128 if e is None:
           1129     e = SSHException("Unable to open channel.")
        -> 1130 raise e
        
        ChannelException: ChannelException(2, 'Connect failed')

        I believe the instance lost power and restarted, and somehow some of the important networking config was lost. Any help would be really appreciated!!

         

        #9567
        Tanay Maheshwari
        Participant

          Any help would be appreciated here! Thank you

          #9569
          Tanay Maheshwari
          Participant

            closing.

          Viewing 4 posts - 1 through 4 (of 4 total)
          • The topic ‘Unable to SSH to one node in a 3-node slice’ is closed to new replies.