Forum Replies Created
-
AuthorPosts
-
Hi Sourya,
There was a power outage at SALT and is being recovered. We will let you know as soon as the recovery is complete.
Thanks,
Komal
Hi Tanay,
We are targeting a release for either Summer or Fall and will share more details once our plans are finalized.
Thanks,
Komal
Hi Sankalpa,
Based on our logs, this slice has been renewed three times in descending order of time. During the renewal attempt on 2025-02-15 13:46:54,320, Client3 (be97d870-3299-418e-ba17-a1ddcab06bdb) could not be renewed because a required component was likely allocated to another future slice. Since requesting slices in the future is enabled, this allocation prevented the renewal of that particular VM. However, other available resources were successfully extended.
The lease for Client3 ended on 2025-02-19 05:05:37 UTC, after which the VM was closed/deleted. The latest renewal request was issued on 2025-02-19 22:04:39, which was after the lease expiration, making the VM ineligible for renewal.
Logs:
- 2025-02-19 22:04:39,589 – CFEL Slice event: Renewal attempt by prj:b3cffedd-ddb4-43ee-b57d-459b768e14ca (usr: sankalpatimilsina12@gmail.com)
- 2025-02-19 22:09:37,543 – CFEL Slice event: Renewal attempt by prj:b3cffedd-ddb4-43ee-b57d-459b768e14ca (usr: sankalpatimilsina12@gmail.com)
- 2025-02-15 13:46:54,320 – CFEL Slice event: Renewal attempt by prj:b3cffedd-ddb4-43ee-b57d-459b768e14ca (usr: sankalpatimilsina12@gmail.com)
Reservation Details:
- Reservation ID: be97d870-3299-418e-ba17-a1ddcab06bdb
- Slice ID: 6acbc4aa-4b6e-44e3-b7c0-8c2f33de46c4
- Resource Type: VM
- Status: Closed (Last update: *Insufficient resources – Renew failed: Component of type ConnectX-6 with PCI Address 0000:a1:04.7 is already allocated to another reservation on node GDXYNF3).
Start: 2025-01-24 17:18:11 UTC
End: 2025-02-19 05:05:37 UTC
Requested End: 2025-02-28 13:46:54 UTCLet me know if you need further clarification.
Best,
Komal
Hi Luca,
I reviewed your slice and noticed that during the last renewal, two of the VMs could not be renewed due to insufficient resources. As a result, they were not extended and have now transitioned to a Closed state, meaning they have been deleted. Below is a snapshot for reference.
Please note that since users can request slices for future use, it’s possible that renewing an active slice may fail if resources are already reserved for future allocations.
You can check the current state of your slice using the following code:
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager fablib = fablib_manager() slice_name = "Slice INT slice - DALL + LOSA" slice = fablib.get_slice(slice_name) slice.list_nodes()
Additionally, if the renewal was triggered via JupyterHub (JH), this information has also been communicated to you there.
Reservation ID: b0ff1824-011d-4225-a748-371ddf6eb5e4 Slice ID: 831a0115-8e9e-4854-bbcb-d12022a878aa
Resource Type: VM Notices: Reservation b0ff1824-011d-4225-a748-371ddf6eb5e4 (Slice INT slice - DALL + LOSA(831a0115-8e9e-4854-bbcb-d12022a878aa) Graph Id:f6a2c692-b430-4a01-95d6-2f2343320dea Owner:s317694@studenti.polito.it) is in state (Closed,None_) (Last ticket update: Insufficient resources : ['ram', 'disk'])
Reservation ID: dc4456ff-f768-4915-8c7e-97696b2fcc21 Slice ID: 831a0115-8e9e-4854-bbcb-d12022a878aa
Resource Type: VM Notices: Reservation dc4456ff-f768-4915-8c7e-97696b2fcc21 (Slice INT slice - DALL + LOSA(831a0115-8e9e-4854-bbcb-d12022a878aa) Graph Id:f6a2c692-b430-4a01-95d6-2f2343320dea Owner:s317694@studenti.polito.it) is in state (Closed,None_) (Last ticket update: Insufficient resources : ['ram', 'disk'])
Thanks,
Komal
1 user thanked author for this post.
Hi,
I was able to run the notebook .
Could you please share your Slice ID?
Additionally, could you post your inquiries in the FABRIC General Questions and Discussion forum?
Thanks,
Komal
Hi Yuanjun,
Could you please try SSH into your VMs now? We were able to successfully SSH to the VMs.
Additionally, could you post your inquiry in the FABRIC General Questions and Discussion forum?
Thanks,
Komal
Hi Yuanjun,
Unfortunately, the STAR resources you’re requesting are currently in use. Please try again later or consider scheduling your slices in advance using the notebook.
Additionally, could you post your inquiry in the FABRIC General Questions and Discussion forum?
Thanks,
KomalHi Yuanjun,
Unfortunately, the MAX resources you’re requesting are currently in use. Please try again later or consider scheduling your slices in advance using the notebook.
Additionally, could you post your inquiry in the FABRIC General Questions and Discussion forum?
Thanks,
KomalGlad to hear that worked! We will work to address this and add support to interrupt/return meaningful error in such cases.
Thanks,
Komal
Hi Kriti,
Could you please re-run this notebook:
jupyter-examples-rel1.8.1/configure_and_validate/configure_and_validate.ipynb
?
This shall renew any expired keys. Please try your slice again after this. I want to rule out any SSH errors. If you continue to see the error, please share/tmp/fablib/fablib.log
Thanks,
KomalAuthentication failed
would explain the SSH errors you are observing. Could you please re-run this notebook:jupyter-examples-rel1.8.1/configure_and_validate/configure_and_validate.ipynb
?
This shall renew any expired keys. Please try your slice again after this.Thanks,
KomalHi,
Please try the following snippet. Please note that list_sites() should be invoked before show_site().
We will fix this to return a more meaningful error in the next version.
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager
fablib = fablib_manager()
fablib.list_sites()
fablib.get_resources().show_site("HAWI")
Thanks,
KomalThank you Justas! I haven’t been able to reproduce this even on JH Stable 1.8 container. Could you please share
/tmp/fablib/fablib.log
file from your container?Also, please share the sliceid of your new slice.
Thanks,
Komal
Could you please try this with Beyond Bleeding Edge Container? I wasn’t able to reproduce this issue there. Trying it with 1.8 Stable container now.
Thanks,
Komal
Hey Kriti,
Could you please share which JH container were you using when you noticed this issue?
Thanks,
Komal
-
AuthorPosts