Forum Replies Created
-
AuthorPosts
-
December 5, 2024 at 12:09 pm in reply to: How to de;ete an interface from a node using the interface name #7910
Please refer to this example for removing interfaces from a network as well as a node.
Thanks,
Komal
Possibly you changed the call from list_hosts to list_sites. Please see the snippet below.
None of the hosts on FIU have more than 3 GPUs. Also, even 3 can be requested based on availability.
The screenshot only shows the full capacity. You can also check this from portal too.
FIU per host information can be seen here: https://portal.fabric-testbed.net/sites/FIU
Thanks,
Komal
Hi Abdulhadi,
The GPU count you are referring to represents the total number of GPUs available at a site.
No single host at a site has more than 3 GPUs. In fact, only a few hosts are equipped with 3 GPUs. To check the per-host resource details, you can use the notebook:
jupyter-examples-main/fabric_examples/fablib_api/sites_and_resources/list_all_resources.ipynb
.For convenience, the following code snippet can also be used:
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager
fablib = fablib_manager()
fablib.show_config();
fields=['name', 'tesla_t4_capacity','rtx6000_capacity', 'a30_capacity', 'a40_capacity']
output_table = fablib.list_hosts(fields=fields)
Thanks,
Komal
Hi Ilya,
I looked into your slice and found that it was partially renewed, with the VM on STAR not renewing completely.
This appears to be a side effect of the Kafka maintenance we conducted yesterday, which impacted STAR. During this time, renewal messages were not processed because the Kafka consumer had stopped. I’ve resolved the issue, and future renewals should now work as expected.
Thank you for bringing this to our attention and helping us identify and fix the problem.
Best regards,
KomalP.S: Another user also ran into this: https://learn.fabric-testbed.net/forums/topic/not-able-to-renew-the-slice/
Hi Sankalpa,
Both your slices were partially renewed. Each slice included a VM on STAR, where the renewal process was stuck.
We use a Kafka messaging bus, and there was a brief maintenance yesterday that impacted STAR. As a result, renewal messages were not processed because the Kafka consumer had stopped. I have resolved this issue, and all the slivers in your slices have been successfully renewed. Your slice is now in the StableOK state.
Thank you for reporting this and helping us identify and address the problem.
Best regards,
KomalPlease share the slice ID. Slice ID can be captured from the Portal as well as from JH.
Portal -> Experiments -> My Slices -> Copy the Slice ID.
Also, how are you renewing the slices – Portal or JH?
Thanks,
Komal
Hi,
Could you please share your slice id?
Thanks,
Komal
Hi Ilya,
Thank you for reporting this issue. It seems to be a bug, and I’m in the process of debugging it. In the meantime, I’ve closed your slice, so it should no longer show up as “Configuring.”
Best regards,
KomalHi Vaneshi,
Permission updated would be rolled out with Release 1.8 in January.
Thanks,
Komal
Hi Sourya,
It looks like the
authorized_keys
file is not correct. I am not even able to login to nova SSH keys.Could you please confirm if you see a key which ends with
Generated-by-Nova
in/home/ubuntu/.ssh/authorized_keys
?Also, please share the output of the command
ls -ltr /home/ubuntu/.ssh/
?Thanks,
Komal
Correction: Both the CERN and CIEN racks have ConnectX-6 and GPU available on the same host. However, the CIEN rack is currently under maintenance as it is being transported back from SC.
You can proceed with your experiment on the CERN rack, subject to its availability.
Additionally, here’s a Fablib code snippet to help you check for specific resources on hosts:
fields=['name','nic_connectx_6_capacity','nic_connectx_5_capacity','tesla_t4_capacity','rtx6000_capacity', 'a30_capacity', 'a40_capacity']
output_table = fablib.list_hosts(fields=fields)
Thanks,
Komal1 user thanked author for this post.
Hi Tanay,
As you mentioned, the current infrastructure supports GPUs with ConnectX-5. Unfortunately, GPUs with ConnectX-6 are not a feasible option at this time. I hope the available setup works well for your experiment.
Thanks,
KomalShared the BitFile flash status with Nishanth over email. He confirmed that FPGA slice with EsNet Sites are working as expected.
Thanks,
KomalHi Vaneshi,
We are working to update the Permissions, but in the current release, you need permission
Net.PortMirroring
for InSlice PortMirroring to work.Thanks,
KomalHi Vaneshi,
Your project would need
Net.PortMirroring
permission for this to work. Could you please check if your project has this permission? If not, Please request your Project Owner or Lead to request for these permissions from the portal.
More details for requesting the permissions can be found here.Thanks,
Komal -
AuthorPosts