Home › Forums › FABRIC General Questions and Discussion › Slice submit via Jupyter get’s stuck
- This topic has 11 replies, 2 voices, and was last updated 2 weeks, 2 days ago by
Komal Thareja.
-
AuthorPosts
-
February 6, 2025 at 9:20 am #8195
Hi,
I submit a slice via Jupyter Notebook, which gets stuck, see the images below. If I look at the fabric portal – it shows StableOK, while the terminal remains pending. I do ctrl+c like 20mins after it is StableOK on Fabric Portal – and I took a screenshot of there it was stuck. Additionally – if I ask to print ssh commands after this – ssh always fails
`
fabric@winter:JustasB-FreeRTR-86%$ ssh -i /home/fabric/work/fabric_config/slice_key -F /home/fabric/work/fabric_config/ssh_config ubuntu@2001:400:a100:3090:f816:3eff:fee2:79ca
Warning: Permanently added ‘bastion.fabric-testbed.net’ (ED25519) to the list of known hosts.
jbalcas_0000188368@bastion.fabric-testbed.net: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535<code></code>`
Slice name: FRR-cern, Slice-ID: b81a4277-1239-4a23-9f29-e2d81a218361
Could you check what is wrong with it?
February 6, 2025 at 9:58 am #8196Hello Again, I just tried another slice on LOSA (not CERN) – it got stuck on Jupyter notebook slice.submit()
10710a66-159b-4a25-b2d0-00fb98de85ab l3_meas_net_LOSA LOSA network Ticketed There have been no updates for the last 30 minutes. On fabric portal – it shows stableOK, while stuck on Jupyter.
Slice name: FRR-losa, Slice ID: 0367f6f3-1331-49dc-9399-722616237a5b
February 6, 2025 at 10:10 am #8197Hi,
Both your slices are in Stable State. It seems like a bug in fablib or a race condition which is causing fablib to think the slice is still Configuring.
As a workaround, could you please do the following?
I am trying to reproduce this at my end and would work to fix this. Apologies for the inconvenience!
slice=fablib.get_slice(slice_name)
slice.post_boot_config()
slice.list_nodes();
slice.list_interfaces();
Slice Name: FRR-losa Slice ID: 0367f6f3-1331-49dc-9399-722616237a5b Project ID: a57c7715-d871-4369-82e6-408c9a57a6e7 Project Name: UCSD-FABRIC test
Graph ID: 071abcd4-f292-449d-a69a-da4768780546
Slice owner: { name: orchestrator, guid: orchestrator-guid, oidc_sub_claim: 91f5ecc3-16ff-4f09-95ac-dfeee0c3b1e3, email: jbalcas@es.net}
Slice state: StableOK
Lease time: 2025-02-07 14:24:38+00:00
Thanks,
Komal
February 6, 2025 at 10:13 am #8198Also, could you please share which JH container are you using?
Thanks,
Komal
February 6, 2025 at 10:44 am #8200Hi, I tried your commands, and it gets stuck at slice.post_boot_config() – if I ctrl+c it seems to be stuck here [1]. I tried ssh commands manually – not able to ssh (as reported previously). For JH – edge for FRR-CERN, stable 1.8 for FRR-LOSA – both have same issue.
[1]
File /opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/interface.py:1075, in Interface.get_ip_addr(self) 1073 if self.get_mac() is None: 1074 return None -> 1075 return self.get_ip_addr_ssh() File /opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/interface.py:885, in Interface.get_ip_addr_ssh(self, dev) 878 """ 879 Gets the ip addr info for this interface. 880 881 :return ip addr info 882 :rtype: str 883 """ 884 try: --> 885 stdout, stderr = self.get_node().execute("ip -j addr list", quiet=True) 887 addrs = json.loads(stdout) 889 dev = self.get_device_name() File /opt/conda/lib/python3.11/site-packages/fabrictestbed_extensions/fablib/node.py:1658, in Node.execute(self, command, retry, retry_interval, username, private_key_file, private_key_passphrase, quiet, read_timeout, timeout, output_file) 1653 logging.debug( 1654 f"SSH execute fail. Slice: {self.get_slice().get_name()}, Node: {self.get_name()}, trying again" 1655 ) 1656 logging.debug(e, exc_info=True) -> 1658 time.sleep(retry_interval) 1659 pass 1661 # Clean-up of open connections and files. 1662 finally:
February 6, 2025 at 10:52 am #8201Could you please try this with Beyond Bleeding Edge Container? I wasn’t able to reproduce this issue there. Trying it with 1.8 Stable container now.
Thanks,
Komal
February 6, 2025 at 11:18 am #8203Hi, So with Beyond Bleeding Edge – slice.submit() still get’s stuck same way and shows state: Configuring and there are two networks in Ticketed state. While portal – already shows stable-ok. I stopped it on Jupyter, and executed commands you provided [1]. It get’s stuck at post_boot_config() – Will keep it active (without cancelling it) and will let you know if it moves forward.
[1]
`
site=”CERN”
slice_name = f’FRR-{site.lower()}’
print(1)
slice=fablib.get_slice(slice_name)
print(2)
slice.list_nodes();
print(3)
slice.post_boot_config()
print(4)
slice.list_nodes();
print(5)
slice.list_interfaces();
print(6)<code></code>`
February 6, 2025 at 11:20 am #8204Thank you Justas! I haven’t been able to reproduce this even on JH Stable 1.8 container. Could you please share
/tmp/fablib/fablib.log
file from your container?Also, please share the sliceid of your new slice.
Thanks,
Komal
February 6, 2025 at 11:31 am #8206I put log here: /home/fabric/work/JustasB-FreeRTR/fablib.log (If not able to access – let me know if ok to upload here and it has no secret info inside).
Slice Name FRR-cern, sliceid: f01132de-75a7-4edd-bbb2-816ee76e824f
I see in the logs a lot of:
[16:15:52] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:708} INFO – update : FRR-cern, count: 2
[16:15:52] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:592} INFO – update_slice: FRR-cern, count: 23
[16:15:53] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:634} INFO – update_topology: FRR-cern, count: 2
[16:15:56] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:2013} INFO – post_boot_config: slice_name: FRR-cern, slice_id f01132de-75a7-4edd-bbb2-816ee76e824f
[16:15:56] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:708} INFO – update : FRR-cern, count: 3
[16:15:56] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:592} INFO – update_slice: FRR-cern, count: 24
[16:15:56] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/slice.py:634} INFO – update_topology: FRR-cern, count: 3
[16:15:57] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 1 failed: Authentication failed.
[16:16:08] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 2 failed: Authentication failed.
[16:16:18] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 3 failed: Authentication failed.
[16:16:19] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 1 failed: Authentication failed.
[16:16:30] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 2 failed: Authentication failed.
[16:16:41] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 3 failed: Authentication failed.
[16:16:41] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 1 failed: Authentication failed.
[16:16:52] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 2 failed: Authentication failed.
[16:17:03] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/node.py:1600} WARNING – Attempt 3 failed: Authentication failed.
[16:17:03] {/home/fabric/fabrictestbed-extensions/fabrictestbed_extensions/fablib/interface.py:914} WARNING – Authentication failed.and auth failures repeats many times
February 6, 2025 at 11:40 am #8207Authentication failed
would explain the SSH errors you are observing. Could you please re-run this notebook:jupyter-examples-rel1.8.1/configure_and_validate/configure_and_validate.ipynb
?
This shall renew any expired keys. Please try your slice again after this.Thanks,
KomalFebruary 6, 2025 at 12:01 pm #8210So interestingly – my bastion key expired and was removed. It continued to allow to me to submit slice, but failed in an endless loop to authenticate during post_boot_config. It would be nice to report this error back to Jupyter. I will add for future always validate config before using my notebook.
I confirm now my new bastion key in use and it was verified. New slice submission seems worked now. Thank you for your help!
February 6, 2025 at 12:03 pm #8211Glad to hear that worked! We will work to address this and add support to interrupt/return meaningful error in such cases.
Thanks,
Komal
-
AuthorPosts
- You must be logged in to reply to this topic.