1. node.execute() hangs in FABRIC notebook

node.execute() hangs in FABRIC notebook

Home Forums FABRIC General Questions and Discussion node.execute() hangs in FABRIC notebook

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • #9516

    Dear FABRIC team,

    I hope you are doing well. In my FABRIC notebook, when I run commands on nodes using node.execute(), the command sometimes works normally, but other times it just hangs indefinitely without returning any error. The node itself remains reachable via SSH, and the same command runs immediately from the terminal. I have tried restarting the notebook server, but the issue persists. Do you have any suggestions on how to solve this problem?

    Thank you.

    Best regards,

    Fatih Berkay Sarpkaya

    #9517
    Komal Thareja
    Participant

      Hi Fatih,

      Could you please check if you see any errors in the /tmp/fablib/fablib.log?

      Also, could you please share which Jupyter Container are you using?

      Best,

      Komal

      #9518

      Hi Komal,

      Thank you for your reply. I checked /tmp/fablib/fablib.log, and I see the following errors:

      [15:51:05] {/opt/conda/lib/python3.11/site-packages/paramiko/transport.py:1938} ERROR – Secsh channel 0 open FAILED: Connection timed out: Connect failed
      [15:51:05] {/opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1665} WARNING – Attempt 1 failed: ChannelException(2, ‘Connect failed’)
      [15:54:47] {/opt/conda/lib/python3.11/site-packages/paramiko/transport.py:1938} ERROR – Secsh channel 0 open FAILED: Connection timed out: Connect failed
      [15:54:47] {/opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1665} WARNING – Attempt 1 failed: ChannelException(2, ‘Connect failed’)
      [16:05:26] {/opt/conda/lib/python3.11/site-packages/paramiko/transport.py:1938} ERROR – Secsh channel 0 open FAILED: Connection timed out: Connect failed
      [16:05:26] {/opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1665} WARNING – Attempt 1 failed: ChannelException(2, ‘Connect failed’)
      [16:09:07] {/opt/conda/lib/python3.11/site-packages/paramiko/transport.py:1938} ERROR – Secsh channel 0 open FAILED: Connection timed out: Connect failed
      [16:09:07] {/opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1665} WARNING – Attempt 1 failed: ChannelException(2, ‘Connect failed’)
      [16:11:26] {/opt/conda/lib/python3.11/site-packages/paramiko/transport.py:1938} ERROR – Secsh channel 0 open FAILED: Connection timed out: Connect failed
      [16:11:26] {/opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1665} WARNING – Attempt 2 failed: ChannelException(2, ‘Connect failed’)

      My Jupyter container is: kthare10/jupyter-notebook:1.9.2r2

      Best regards,

      Fatih Berkay Sarpkaya

      #9519
      Komal Thareja
      Participant

        Hi Fatih,

        Could you please try changing the following files:

        /home/fabric/work/fabric_config/ssh_config

        /home/fabric/work/fabric_config/fabric_config

        change bastion.fabric-testbed.net to bastion-ncsa-1.fabric-testbed.net in both the files.

        Reload the kernel of your notebook and try the node.execute

        Thanks,

        Komal

        #9520

        Hi Komal,

        It seems to be working now. Thank you very much for your help.

        Best regards,

        Fatih Berkay Sarpkaya

        #9533
        Hussam Nasir
        Participant

          May I ask which OS is running on your nodes?. We are trying to narrow down the issue.

          #9534

          Hi,

          The nodes are running Ubuntu 22.04.5 LTS.

          Thank you.

          Best regards,

          Fatih Berkay Sarpkaya

          #9535
          Hussam Nasir
          Participant

            Which SITE was this VM on ? Also, could you please share the notebook you used when you encountered the issue? you can email the notebook to help@fabric-testbed.net.

            #9546

            Hi,

            At this point, I am no longer experiencing the issue after applying the modification mentioned in this thread. However, when I initially encountered the problem, I was testing a slice at the SRI site. I will email the relevant portion of the notebook I used to help@fabric-testbed.net shortly.

            Best regards,

            Fatih Berkay Sarpkaya

          Viewing 9 posts - 1 through 9 (of 9 total)
          • You must be logged in to reply to this topic.