Home › Forums › FABRIC General Questions and Discussion › Unable to reach a node at GATECH
- This topic has 6 replies, 3 voices, and was last updated 7 months, 2 weeks ago by Hussam Nasir.
-
AuthorPosts
-
April 4, 2024 at 1:17 pm #6890
Hello,
I have a node at GATECH which I was able to perform execute() on just this morning but now I get the error below when I try to execute():
--------------------------------------------------------------------------- ChannelException Traceback (most recent call last) Cell In[22], line 2 1 # Restart services for clean operations ----> 2 stdout, stderr = node1.execute("sudo pkill -9 python3") 3 stdout, stderr = node2.execute("sudo systemctl restart nginx") 4 stdout, stderr = node3.execute("sudo pkill -9 python3") File ~/anaconda3/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1564, in Node.execute(self, command, retry, retry_interval, username, private_key_file, private_key_passphrase, quiet, read_timeout, timeout, output_file) 1559 logging.warning( 1560 f"Exception in node.execute() (attempt #{attempt} of {retry}): {e}" 1561 ) 1563 if attempt + 1 == retry: -> 1564 raise e 1566 # Fail, try again 1567 if self.get_fablib_manager().get_log_level() == logging.DEBUG: File ~/anaconda3/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1424, in Node.execute(self, command, retry, retry_interval, username, private_key_file, private_key_passphrase, quiet, read_timeout, timeout, output_file) 1417 bastion.connect( 1418 self.get_fablib_manager().get_bastion_host(), 1419 username=bastion_username, 1420 key_filename=bastion_key_file, 1421 ) 1423 bastion_transport = bastion.get_transport() -> 1424 bastion_channel = bastion_transport.open_channel( 1425 "direct-tcpip", dest_addr, src_addr 1426 ) 1428 client = paramiko.SSHClient() 1429 # client.load_system_host_keys() 1430 # client.set_missing_host_key_policy(paramiko.MissingHostKeyPolicy()) File ~/anaconda3/lib/python3.11/site-packages/paramiko/transport.py:1101, in Transport.open_channel(self, kind, dest_addr, src_addr, window_size, max_packet_size, timeout) 1099 if e is None: 1100 e = SSHException("Unable to open channel.") -> 1101 raise e ChannelException: ChannelException(2, 'Connect failed')
When I try to ssh to the node I get the following error:
Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
channel 0: open failed: connect failed: No route to host
stdio forwarding failed
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535Regards,
Acheme
April 4, 2024 at 1:40 pm #6891I recently got these errors and tried some ways, hope these can help.
For error Connect Failed or Connection closed by UNKNOWN port 65535, I often check if my slice is StateOK or I will go to Jupyter->File->Hub control Panel-> Stop and start my server.
If those do not work, try to access VM through portal and run the command there instead of node.execute().
April 4, 2024 at 2:09 pm #6892Can you please provide us some details about your slice and VM ? WHat is the IP of the node you are trying to ssh into ?
April 4, 2024 at 2:50 pm #6893Yes, sliceID: 97c1aa87-f5e5-4ccb-9b63-a73c1be85825, VM at GATECH management IP: 2610:148:1f00:9f01:f816:3eff:fe49:cd8d
April 4, 2024 at 4:09 pm #6894Can you check now. The VM had crashed. We restored it, but it may be missing its IP config on the data plane NICS that you may have to restore.
April 4, 2024 at 4:23 pm #6895I configured the IP address as it was before it crashed but cannot ping the other nodes using the dataplane. Is there something else I should do?
April 5, 2024 at 8:56 am #6896The physical node has crashed again. I am working on restoring it now. Please restore your data plane IPs in about 2 hrs and check and let me know.
-
AuthorPosts
- You must be logged in to reply to this topic.