1. Slice submit via Jupyter get’s stuck

Slice submit via Jupyter get’s stuck

Home Forums FABRIC General Questions and Discussion Slice submit via Jupyter get’s stuck

Viewing 12 posts - 1 through 12 (of 12 total)
  • Author
    Posts
  • #8195
    Justas Balcas
    Participant

      Hi,

      I submit a slice via Jupyter Notebook, which gets stuck, see the images below. If I look at the fabric portal – it shows StableOK, while the terminal remains pending. I do ctrl+c like 20mins after it is StableOK on Fabric Portal – and I took a screenshot of there it was stuck. Additionally – if I ask to print ssh commands after this – ssh always fails

      `

      fabric@winter:JustasB-FreeRTR-86%$ ssh -i /home/fabric/work/fabric_config/slice_key -F /home/fabric/work/fabric_config/ssh_config ubuntu@2001:400:a100:3090:f816:3eff:fee2:79ca
      Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
      jbalcas_0000188368@bastion.fabric-testbed.net: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
      kex_exchange_identification: Connection closed by remote host
      Connection closed by UNKNOWN port 65535

      <code></code>`

      Slice name: FRR-cern, Slice-ID: b81a4277-1239-4a23-9f29-e2d81a218361

      Could you check what is wrong with it?

       

      #8196
      Justas Balcas
      Participant

        Hello Again, I just tried another slice on LOSA (not CERN) – it got stuck on Jupyter notebook slice.submit()

        10710a66-159b-4a25-b2d0-00fb98de85ab l3_meas_net_LOSA LOSA network Ticketed

        There have been no updates for the last 30 minutes. On fabric portal – it shows stableOK, while stuck on Jupyter.

        Slice name: FRR-losa, Slice ID: 0367f6f3-1331-49dc-9399-722616237a5b

        #8197
        Komal Thareja
        Participant

          Hi,

          Both your slices  are in Stable State. It seems like a bug in fablib or a race condition which is causing fablib to think the slice is still Configuring.

          As a workaround, could you please do the following?

          I am trying to reproduce this at my end and would work to fix this. Apologies for the inconvenience!


          slice=fablib.get_slice(slice_name)
          slice.post_boot_config()
          slice.list_nodes();
          slice.list_interfaces();


          Slice Name: FRR-losa Slice ID: 0367f6f3-1331-49dc-9399-722616237a5b Project ID: a57c7715-d871-4369-82e6-408c9a57a6e7 Project Name: UCSD-FABRIC test
          Graph ID: 071abcd4-f292-449d-a69a-da4768780546
          Slice owner: { name: orchestrator, guid: orchestrator-guid, oidc_sub_claim: 91f5ecc3-16ff-4f09-95ac-dfeee0c3b1e3, email: jbalcas@es.net}
          Slice state: StableOK
          Lease time: 2025-02-07 14:24:38+00:00

          Thanks,

          Komal

          #8198
          Komal Thareja
          Participant

            Also, could you please share which JH container are you using?

            Thanks,

            Komal

            #8200
            Justas Balcas
            Participant

              Hi, I tried your commands, and it gets stuck at slice.post_boot_config() – if I ctrl+c it seems to be stuck here [1]. I tried ssh commands manually – not able to ssh (as reported previously). For JH – edge for FRR-CERN, stable 1.8 for FRR-LOSA – both have same issue.

               

              [1]

              File /opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/interface.py:1075, in Interface.get_ip_addr(self)
                 1073 if self.get_mac() is None:
                 1074     return None
              -> 1075 return self.get_ip_addr_ssh()
              
              File /opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/interface.py:885, in Interface.get_ip_addr_ssh(self, dev)
                  878 """
                  879 Gets the ip addr info for this interface.
                  880 
                  881 :return ip addr info
                  882 :rtype: str
                  883 """
                  884 try:
              --> 885     stdout, stderr = self.get_node().execute("ip -j addr list", quiet=True)
                  887     addrs = json.loads(stdout)
                  889     dev = self.get_device_name()
              
              File /opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1658, in Node.execute(self, command, retry, retry_interval, username, private_key_file, private_key_passphrase, quiet, read_timeout, timeout, output_file)
                 1653         logging.debug(
                 1654             f"SSH execute fail. Slice: {self.get_slice().get_name()}, Node: {self.get_name()}, trying again"
                 1655         )
                 1656         logging.debug(e, exc_info=True)
              -> 1658     time.sleep(retry_interval)
                 1659     pass
                 1661 # Clean-up of open connections and files.
                 1662 finally:
              #8201
              Komal Thareja
              Participant

                Could you please try this with Beyond Bleeding Edge Container? I wasn’t able to reproduce this issue there. Trying it with 1.8 Stable container now.

                Thanks,

                Komal

                #8203
                Justas Balcas
                Participant

                  Hi, So with Beyond Bleeding Edge – slice.submit() still get’s stuck same way and shows state: Configuring and there are two networks in Ticketed state. While portal – already shows stable-ok. I stopped it on Jupyter, and executed commands you provided [1]. It get’s stuck at post_boot_config() – Will keep it active (without cancelling it) and will let you know if it moves forward.

                  [1]

                  `

                  site=”CERN”
                  slice_name = f’FRR-{site.lower()}’
                  print(1)
                  slice=fablib.get_slice(slice_name)
                  print(2)
                  slice.list_nodes();
                  print(3)
                  slice.post_boot_config()
                  print(4)
                  slice.list_nodes();
                  print(5)
                  slice.list_interfaces();
                  print(6)

                  <code></code>`

                  #8204
                  Komal Thareja
                  Participant

                    Thank you Justas! I haven’t been able to reproduce this even on JH Stable 1.8 container. Could you please share /tmp/fablib/fablib.log file from your container?

                    Also, please share the sliceid of your new slice.

                    Thanks,

                    Komal

                    #8206
                    Justas Balcas
                    Participant

                      I put log here: /home/fabric/work/JustasB-FreeRTR/fablib.log (If not able to access – let me know if ok to upload here and it has no secret info inside).

                      Slice Name FRR-cern, sliceid: f01132de-75a7-4edd-bbb2-816ee76e824f

                      I see in the logs a lot of:

                      [16:15:52] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:708} INFO – update : FRR-cern, count: 2
                      [16:15:52] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:592} INFO – update_slice: FRR-cern, count: 23
                      [16:15:53] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:634} INFO – update_topology: FRR-cern, count: 2
                      [16:15:56] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:2013} INFO – post_boot_config: slice_name: FRR-cern, slice_id f01132de-75a7-4edd-bbb2-816ee76e824f
                      [16:15:56] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:708} INFO – update : FRR-cern, count: 3
                      [16:15:56] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:592} INFO – update_slice: FRR-cern, count: 24
                      [16:15:56] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:634} INFO – update_topology: FRR-cern, count: 3
                      [16:15:57] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 1 failed: Authentication failed.
                      [16:16:08] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 2 failed: Authentication failed.
                      [16:16:18] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 3 failed: Authentication failed.
                      [16:16:19] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 1 failed: Authentication failed.
                      [16:16:30] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 2 failed: Authentication failed.
                      [16:16:41] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 3 failed: Authentication failed.
                      [16:16:41] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 1 failed: Authentication failed.
                      [16:16:52] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 2 failed: Authentication failed.
                      [16:17:03] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 3 failed: Authentication failed.
                      [16:17:03] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/interface.py:914} WARNING – Authentication failed.

                      and auth failures repeats many times

                       

                      #8207
                      Komal Thareja
                      Participant

                        Authentication failed would explain the SSH errors you are observing. Could you please re-run this notebook: jupyter-examples-rel1.8.1/configure_and_validate/configure_and_validate.ipynb ?
                        This shall renew any expired keys. Please try your slice again after this.

                        Thanks,
                        Komal

                        #8210
                        Justas Balcas
                        Participant

                          So interestingly – my bastion key expired and was removed. It continued to allow to me to submit slice, but failed in an endless loop to authenticate during post_boot_config. It would be nice to report this error back to Jupyter. I will add for future always validate config before using my notebook.

                          I confirm now my new bastion key in use and it was verified. New slice submission seems worked now. Thank you for your help!

                          #8211
                          Komal Thareja
                          Participant

                            Glad to hear that worked! We will work to address this and add support to interrupt/return meaningful error in such cases.

                            Thanks,

                            Komal

                          Viewing 12 posts - 1 through 12 (of 12 total)
                          • You must be logged in to reply to this topic.
                          FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

                          KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)
                          FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

                          KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)