1. channel 0: open failed: connect failed: No route to host

channel 0: open failed: connect failed: No route to host

Home Forums FABRIC General Questions and Discussion channel 0: open failed: connect failed: No route to host

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #8756
    Ajay Kumar
    Participant

      I am facing below error while connecting to a node of a cluster, (Slice ID: 683229dc-53c7-4723-ba9a-93ef3481339c)

      Error:
      Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
      channel 0: open failed: connect failed: No route to host
      stdio forwarding failed
      Connection closed by UNKNOWN port 65535

       

      While rebooting that node using Python Notebook, it says,

      Exception: POA - 6f55c268-03da-49fc-b4f7-3d6600d2546c/reboot failed with error: - Exception during poa for unit: eee04aa7-c811-477d-8881-bfb60e3df919 msg Playbook has failed tasks: non-zero return code
      
      
      
      #8757
      Komal Thareja
      Participant

        Hi Ajay,

        Your VM was in a shutoff state, which I’ve now restored. Could you please share the notebook that outlines the type of workload you’re running on this VM? We’ve observed similar instances with your slices in the past, so having this information would help us identify the root cause of your VMs shutting down.

        Thanks,
        Komal

        #8758
        Ajay Kumar
        Participant

          slice_name=’GPU_Variant_Calling_FIU’
          node_name=’Node3′
          slice = fablib.get_slice(slice_name)
          node = slice.get_node(node_name)
          node.os_reboot()

          This piece of code generated this error. Now that it’s live, I can access this node. Thank you very much, Komal.

          #8759
          Komal Thareja
          Participant

            Hi Ajay,

            node.os_reboot() is recommended to be executed only if you are doing CPU pinning or NUMA tuning. This failed because your VM was already in shutoff state. If the intent is to just reboot the VM, please use sudo reboot via node.execute(). Also, what kind of workload is your application/experiment running? We are noticing some kernel level CPU locks on the host where your VM is running. We want to investigate if something from your experiment is triggering this. Could you please share more details about the experiment workload being executed on this VM?

            Appreciate your help with this!

            Thanks,

            Komal

          Viewing 4 posts - 1 through 4 (of 4 total)
          • You must be logged in to reply to this topic.