channel 0: open failed: connect failed: No route to host

Tagged: node reboot failed

This topic has 3 replies, 2 voices, and was last updated 7 months ago by Komal Thareja.

Viewing 4 posts - 1 through 4 (of 4 total)

Author

Posts
August 5, 2025 at 11:20 am #8756
Ajay Kumar
Participant
I am facing below error while connecting to a node of a cluster, (Slice ID: 683229dc-53c7-4723-ba9a-93ef3481339c)

Error:
Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
channel 0: open failed: connect failed: No route to host
stdio forwarding failed
Connection closed by UNKNOWN port 65535

While rebooting that node using Python Notebook, it says,
```
Exception: POA - 6f55c268-03da-49fc-b4f7-3d6600d2546c/reboot failed with error: - Exception during poa for unit: eee04aa7-c811-477d-8881-bfb60e3df919 msg Playbook has failed tasks: non-zero return code
```
August 5, 2025 at 11:32 am #8757
Komal Thareja
Participant
Hi Ajay,

Your VM was in a shutoff state, which I’ve now restored. Could you please share the notebook that outlines the type of workload you’re running on this VM? We’ve observed similar instances with your slices in the past, so having this information would help us identify the root cause of your VMs shutting down.

Thanks,
Komal
August 5, 2025 at 1:19 pm #8758
Ajay Kumar
Participant
slice_name=’GPU_Variant_Calling_FIU’
node_name=’Node3′
slice = fablib.get_slice(slice_name)
node = slice.get_node(node_name)
node.os_reboot()

This piece of code generated this error. Now that it’s live, I can access this node. Thank you very much, Komal.
August 5, 2025 at 2:27 pm #8759
Komal Thareja
Participant
Hi Ajay,

node.os_reboot() is recommended to be executed only if you are doing CPU pinning or NUMA tuning. This failed because your VM was already in shutoff state. If the intent is to just reboot the VM, please use sudo reboot via node.execute(). Also, what kind of workload is your application/experiment running? We are noticing some kernel level CPU locks on the host where your VM is running. We want to investigate if something from your experiment is triggering this. Could you please share more details about the experiment workload being executed on this VM?

Appreciate your help with this!

Thanks,

Komal
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.