Forum Replies Created
-
AuthorPosts
-
Hi Nishanth,
This issue has been fixed for a while now but is only available in Beyond Bleeding Edge Container.
Could you please use that? This should be available in the pypi with the next release.
Thanks,
Komal
Hi Acheme,
We investigated the possibility of enabling UEFI mode for users but encountered issues where GPUs do not function in that mode. Consequently, we have opted to maintain updated firmware to mitigate these errors for users. Could you please rerun your experiment and inform us if the error persists? I am available to collaborate with you on upgrading the firmware and addressing the issue.Thanks,
Komal
May 31, 2024 at 8:02 am in reply to: Maintenance on FABRIC Network AM on 05/29/2024 9:00 PM – 10:00 PM EST #7034The updated network model has been deployed and the maintenance is complete.
Hi Emmanuel,
It is not possible to recover a deleted slice. Apologies we may not be able to recover your data. However, you should be able to request renewal of an expired project though.
Thanks,
Komal
May 17, 2024 at 11:43 am in reply to: Unable to create a slice : redeem predecessor reservation #7017To clarify, requesting two VMs is acceptable. However, requesting VMs with GPUs and SmartNICs in the mentioned slice is invalid because none of the hosts have SmartNICs and GPUs available on the same host.
May 16, 2024 at 2:35 pm in reply to: Unable to create a slice : redeem predecessor reservation #7016Hello Khawar,
Your slice is requesting 2 VMs. This is unsupported configuration. On UTAH, we have two hosts each with 3 GPUs but none of them have a dedicated CX-6. So your slice configuration is seemed unsupported.
Also, I checked, all 6 RTX-6000 GPUs are in use. Please note that the resource usage displayed on the portal may be outdated by 30 minutes.
- n1 – with RTX-6000 GPU and dedicated NIC CX-6
- n2 – with two RTX-600 GPU
We do have ongoing work for users to identify such invalid slice configurations using fablib API. This should be available soon with the upcoming Release 1.7. We also plan to provide host level resource usage details to user in 1.7 that may help with this too. Hope this helps!
Thanks,
Komal
@Nirmala – Maintenance has been completed.
Hello Nirmala,
Apologies for the inconvenience. We have a maintenance ongoing and hence the error on the portal.
Will keep you posted as soon as maintenance is complete.
Thanks,
Komal
Hi Nirmala,
Could you please share your Slice ID or if possible please share your notebook? I can help tailor it to handle this scenario.
Thanks,
Komal
Hello Nirmala,
Over the weekend, we encountered memory failures on the Wash workers, necessitating their reboot. Unfortunately, this led to the loss of IP addresses of your VMs. Rest assured, we are actively addressing the memory failure issue to prevent further worker reboots.
In the meantime, you can utilize the following block in a notebook to restore your IP configuration without having to delete your slice. We apologize for any inconvenience this may have caused.
try: slice = fablib.get_slice(name=slice_name) for node in slice.get_nodes(): print(f"{node}") node.config() except Exception as e: print(f"Exception: {e}")
Thank you for your understanding,
Komal
- This reply was modified 6 months, 3 weeks ago by Komal Thareja.
@Vaiden, @Nirmala,
The issue has been resolved. Jupyter Hub is accessible now. Please let us know if you still run into any issues.
Thanks,
Komal
This issue has been resolved and Jupyter Hub is accessible again.
Thanks,
Komal
Hi Nirmala,
Thank you for reporting this. It looks like ours K8s cluster hosting Jupyter Hub is down. We are working to resolve this and will keep you posted.
Thanks,
Komal
Hi Jacob,
I used nslookup to determine the FQDN for your server and can confirm that I can ping your host as shown below.
SALT is IPv6-only site. I will check and confirm if FABRIC NAT server config needs changes to enable this. But the reachability is working with FQDN/hostname.
root@TransferNode:~# nslookup 129.114.108.207
207.108.114.129.in-addr.arpa name = chi-dyn-129-114-108-207.tacc.chameleoncloud.org.
root@TransferNode:~#
root@TransferNode:~#
root@TransferNode:~#
root@TransferNode:~# ping chi-dyn-129-114-108-207.tacc.chameleoncloud.org
PING chi-dyn-129-114-108-207.tacc.chameleoncloud.org(chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf)) 56 data bytes
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=1 ttl=35 time=113 ms
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=2 ttl=35 time=113 ms
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=3 ttl=35 time=113 ms
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=4 ttl=35 time=113 ms
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=5 ttl=35 time=113 ms
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=6 ttl=35 time=113 ms
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=7 ttl=35 time=113 ms
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=8 ttl=35 time=113 ms
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=9 ttl=35 time=113 ms
64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=10 ttl=35 time=113 ms
Thanks,
KomalHi Jacob,
I noticed that
/etc/resolv.conf
was updated on your VM probably via nat64.sh. I reverted it back to the default as shown below. Your original file is saved as/etc/resolv.conf.bkp
.With this change, I was able to ping github.com an IPV4 domain. IPv4 subnets should be reachable. Please note
nat64.sh
is no longer required. I will update the Knowledge base article also to reflect this.root@TransferNode:/etc# cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.nameserver 127.0.0.53
options edns0 trust-ad
search openstacklocalroot@TransferNode:/etc# ping -c2 github.com
PING github.com(lb-140-82-112-4-iad.github.com (2600:2701:5000:5001::8c52:7004)) 56 data bytes
64 bytes from lb-140-82-112-4-iad.github.com (2600:2701:5000:5001::8c52:7004): icmp_seq=1 ttl=230 time=88.4 ms
64 bytes from lb-140-82-112-4-iad.github.com (2600:2701:5000:5001::8c52:7004): icmp_seq=2 ttl=230 time=87.4 ms--- github.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 87.439/87.930/88.422/0.491 ms
root@TransferNode:/etc#Thanks,
Komal
- This reply was modified 6 months, 3 weeks ago by Komal Thareja.
-
AuthorPosts