1. Can’t SSH to resources at MICH

Can’t SSH to resources at MICH

Home Forums FABRIC General Questions and Discussion Can’t SSH to resources at MICH

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #5159
    Fraida Fund
    Participant

      Hello! This morning, I’m not able to SSH to resources at MICH (either via SSH directly, or with fablib). I don’t think it’s a problem with my keys, because I am able to SSH to resources at other sites.

      Examples of slice IDs with this problem – 09cfd15c-268e-4ec8-b398-e4c47e3b535d, 68abcbf0-c0ab-4f62-9385-836088a0c233.

      Example of error raised in fablib when I attempt to execute something on resource –

      ChannelException Traceback (most recent call last)
      Cell In[7], line 1
      ----> 1 kernel = tx_node.execute("uname -r")[0].strip()
      2 data_dir = kernel + "_" + exp_factors['cc'][0] + "_" + exp_factors['cc'][1]
      3 tx_node.execute("mkdir -p " + data_dir)
      
      File /opt/conda/lib/python3.10/site-packages/fabrictestbed_extensions/fablib/node.py:1542, in Node.execute(self, command, retry, retry_interval, username, private_key_file, private_key_passphrase, quiet, read_timeout, timeout, output_file)
      1537 logging.warning(
      1538 f"Exception in node.execute() (attempt #{attempt} of {retry}): {e}"
      1539 )
      1541 if attempt + 1 == retry:
      -> 1542 raise e
      1544 # Fail, try again
      1545 if self.get_fablib_manager().get_log_level() == logging.DEBUG:
      
      File /opt/conda/lib/python3.10/site-packages/fabrictestbed_extensions/fablib/node.py:1402, in Node.execute(self, command, retry, retry_interval, username, private_key_file, private_key_passphrase, quiet, read_timeout, timeout, output_file)
      1395 bastion.connect(
      1396 self.get_fablib_manager().get_bastion_public_addr(),
      1397 username=bastion_username,
      1398 key_filename=bastion_key_file,
      1399 )
      1401 bastion_transport = bastion.get_transport()
      -> 1402 bastion_channel = bastion_transport.open_channel(
      1403 "direct-tcpip", dest_addr, src_addr
      1404 )
      1406 client = paramiko.SSHClient()
      1407 # client.load_system_host_keys()
      1408 # client.set_missing_host_key_policy(paramiko.MissingHostKeyPolicy())
      
      File /opt/conda/lib/python3.10/site-packages/paramiko/transport.py:1085, in Transport.open_channel(self, kind, dest_addr, src_addr, window_size, max_packet_size, timeout)
      1083 if e is None:
      1084 e = SSHException("Unable to open channel.")
      -> 1085 raise e
      
      ChannelException: ChannelException(2, 'Connect failed')
      
      #5163
      Hussam Nasir
      Moderator

        Hi Fraida,

        The MICH site is currently down. We are trying to determine why but no ETA has been provided.

        #5166
        David Bank
        Moderator

          UPDATE

          Our University of Michigan partners are currently experiencing a campus-wide network disruption. As a result, all FABRIC services in the MICH Hank are unavailable.

          We do not expect this Hank to become available before 6pm (US Eastern Time) today (8/28)

          #5167
          Ilya Baldin
          Participant

            Given it is only a network outage, VM slivers should still be there assuming they do not timeout in the interim.

            #5176
            Ilya Baldin
            Participant

              MICH continues to be unreachable for us due to UMich campus network outage of unknown anticipated duration. We’ve placed the site into maintenance and will update on the FABRIC Announcements forums when it becomes available again.

              #5191
              Ilya Baldin
              Participant

                UMich campus network is reachable again, we are cleaning up the MICH site and preparing to take it out of maintenance.

              Viewing 6 posts - 1 through 6 (of 6 total)
              • You must be logged in to reply to this topic.