- This topic has 3 replies, 2 voices, and was last updated 1 day, 3 hours ago by .
Viewing 4 posts - 1 through 4 (of 4 total)
Viewing 4 posts - 1 through 4 (of 4 total)
- You must be logged in to reply to this topic.
Home › Forums › FABRIC General Questions and Discussion › channel 0: open failed: connect failed: No route to host
Tagged: node reboot failed
I am facing below error while connecting to a node of a cluster, (Slice ID: 683229dc-53c7-4723-ba9a-93ef3481339c)
Error:
Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
channel 0: open failed: connect failed: No route to host
stdio forwarding failed
Connection closed by UNKNOWN port 65535
While rebooting that node using Python Notebook, it says,
Exception: POA - 6f55c268-03da-49fc-b4f7-3d6600d2546c/reboot failed with error: - Exception during poa for unit: eee04aa7-c811-477d-8881-bfb60e3df919 msg Playbook has failed tasks: non-zero return code
Hi Ajay,
Your VM was in a shutoff state, which I’ve now restored. Could you please share the notebook that outlines the type of workload you’re running on this VM? We’ve observed similar instances with your slices in the past, so having this information would help us identify the root cause of your VMs shutting down.
Thanks,
Komal
slice_name=’GPU_Variant_Calling_FIU’
node_name=’Node3′
slice = fablib.get_slice(slice_name)
node = slice.get_node(node_name)
node.os_reboot()
This piece of code generated this error. Now that it’s live, I can access this node. Thank you very much, Komal.
Hi Ajay,
node.os_reboot()
is recommended to be executed only if you are doing CPU pinning or NUMA tuning. This failed because your VM was already in shutoff state. If the intent is to just reboot the VM, please use sudo reboot
via node.execute()
. Also, what kind of workload is your application/experiment running? We are noticing some kernel level CPU locks on the host where your VM is running. We want to investigate if something from your experiment is triggering this. Could you please share more details about the experiment workload being executed on this VM?
Appreciate your help with this!
Thanks,
Komal