1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 331 through 345 (of 557 total)
  • Author
    Posts
  • Komal Thareja
    Moderator

      The updated network model has been deployed and the maintenance is complete.

      in reply to: How can we restore our files from deleted nodes #7023
      Komal Thareja
      Moderator

        Hi Emmanuel,

        It is not possible to recover a deleted slice. Apologies we may not be able to recover your data. However, you should be able to request renewal of an expired project though.

        Thanks,

        Komal

        Komal Thareja
        Moderator

          To clarify, requesting two VMs is acceptable. However, requesting VMs with GPUs and SmartNICs in the mentioned slice is invalid because none of the hosts have SmartNICs and GPUs available on the same host.

          Komal Thareja
          Moderator

            Hello Khawar,

            Your slice is requesting 2 VMs. This is unsupported configuration. On UTAH, we have two hosts each with 3 GPUs but none of them have a dedicated CX-6. So your slice configuration is seemed unsupported.

            Also, I checked, all 6 RTX-6000 GPUs are in use. Please note that the resource usage displayed on the portal may be outdated by 30 minutes.

            • n1 – with RTX-6000 GPU and dedicated NIC CX-6
            • n2 – with two RTX-600 GPU

            We do have ongoing work for users to identify such invalid slice configurations using fablib API. This should be available soon with the upcoming Release 1.7. We also plan to provide host level resource usage details to user in 1.7 that may help with this too. Hope this helps!

            Thanks,

            Komal

            in reply to: Assigned addresses lost in reserved slices #7008
            Komal Thareja
            Moderator

              @Nirmala – Maintenance has been completed.

              in reply to: Assigned addresses lost in reserved slices #7006
              Komal Thareja
              Moderator

                Hello Nirmala,

                Apologies for the inconvenience. We have a maintenance ongoing and hence the error on the portal.

                Will keep you posted as soon as maintenance is complete.

                Maintenance on the testbed – May 9 – 8am-12pm EST

                Thanks,

                Komal

                in reply to: Assigned addresses lost in reserved slices #7001
                Komal Thareja
                Moderator

                  Hi Nirmala,

                  Could you please share your Slice ID or if possible please share your notebook? I can help tailor it to handle this scenario.

                  Thanks,

                  Komal

                  in reply to: Assigned addresses lost in reserved slices #6990
                  Komal Thareja
                  Moderator

                    Hello Nirmala,

                    Over the weekend, we encountered memory failures on the Wash workers, necessitating their reboot. Unfortunately, this led to the loss of IP addresses of your VMs. Rest assured, we are actively addressing the memory failure issue to prevent further worker reboots.

                    In the meantime, you can utilize the following block in a notebook to restore your IP configuration without having to delete your slice. We apologize for any inconvenience this may have caused.

                    
                    try:
                    slice = fablib.get_slice(name=slice_name)
                    for node in slice.get_nodes():
                    print(f"{node}")
                    node.config()
                    except Exception as e:
                    print(f"Exception: {e}")
                    

                    Thank you for your understanding,

                    Komal

                    in reply to: login to server failure #6988
                    Komal Thareja
                    Moderator

                      @Vaiden, @Nirmala,

                      The issue has been resolved. Jupyter Hub is accessible now. Please let us know if you still run into any issues.

                      Thanks,

                      Komal

                      in reply to: Outage at FABRIC Jupyter Hub #6987
                      Komal Thareja
                      Moderator

                        This issue has been resolved and Jupyter Hub is accessible again.

                        Thanks,

                        Komal

                        in reply to: login to server failure #6985
                        Komal Thareja
                        Moderator

                          Hi Nirmala,

                          Thank you for reporting this. It looks like ours K8s cluster hosting Jupyter Hub is down. We are working to resolve this and will keep you posted.

                          Thanks,

                          Komal

                          in reply to: How to reach Nginx being hosted via IPv4 #6974
                          Komal Thareja
                          Moderator

                            Hi Jacob,

                            I used nslookup to determine the FQDN for your server and can confirm that I can ping your host as shown below.
                            SALT is IPv6-only site. I will check and confirm if FABRIC NAT server config needs changes to enable this. But the reachability is working with FQDN/hostname.


                            root@TransferNode:~# nslookup 129.114.108.207
                            207.108.114.129.in-addr.arpa name = chi-dyn-129-114-108-207.tacc.chameleoncloud.org.


                            root@TransferNode:~#
                            root@TransferNode:~#
                            root@TransferNode:~#
                            root@TransferNode:~# ping chi-dyn-129-114-108-207.tacc.chameleoncloud.org
                            PING chi-dyn-129-114-108-207.tacc.chameleoncloud.org(chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf)) 56 data bytes
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=1 ttl=35 time=113 ms
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=2 ttl=35 time=113 ms
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=3 ttl=35 time=113 ms
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=4 ttl=35 time=113 ms
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=5 ttl=35 time=113 ms
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=6 ttl=35 time=113 ms
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=7 ttl=35 time=113 ms
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=8 ttl=35 time=113 ms
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=9 ttl=35 time=113 ms
                            64 bytes from chi-dyn-129-114-108-207.tacc.chameleoncloud.org (2600:2701:5000:5001::8172:6ccf): icmp_seq=10 ttl=35 time=113 ms

                            Thanks,
                            Komal

                            in reply to: How to reach Nginx being hosted via IPv4 #6970
                            Komal Thareja
                            Moderator

                              Hi Jacob,

                              I noticed that /etc/resolv.confwas updated on your VM probably via nat64.sh. I reverted it back to the default as shown below. Your original file is saved as /etc/resolv.conf.bkp.

                              With this change, I was able to ping github.com an IPV4 domain. IPv4 subnets should be reachable. Please note nat64.sh is no longer required. I will update the Knowledge base article also to reflect this.

                              root@TransferNode:/etc# cat /etc/resolv.conf
                              # This file is managed by man:systemd-resolved(8). Do not edit.
                              #
                              # This is a dynamic resolv.conf file for connecting local clients to the
                              # internal DNS stub resolver of systemd-resolved. This file lists all
                              # configured search domains.
                              #
                              # Run "resolvectl status" to see details about the uplink DNS servers
                              # currently in use.
                              #
                              # Third party programs must not access this file directly, but only through the
                              # symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
                              # replace this symlink by a static file or a different symlink.
                              #
                              # See man:systemd-resolved.service(8) for details about the supported modes of
                              # operation for /etc/resolv.conf.

                              nameserver 127.0.0.53
                              options edns0 trust-ad
                              search openstacklocal

                              root@TransferNode:/etc# ping -c2 github.com
                              PING github.com(lb-140-82-112-4-iad.github.com (2600:2701:5000:5001::8c52:7004)) 56 data bytes
                              64 bytes from lb-140-82-112-4-iad.github.com (2600:2701:5000:5001::8c52:7004): icmp_seq=1 ttl=230 time=88.4 ms
                              64 bytes from lb-140-82-112-4-iad.github.com (2600:2701:5000:5001::8c52:7004): icmp_seq=2 ttl=230 time=87.4 ms

                              --- github.com ping statistics ---
                              2 packets transmitted, 2 received, 0% packet loss, time 1002ms
                              rtt min/avg/max/mdev = 87.439/87.930/88.422/0.491 ms
                              root@TransferNode:/etc#

                              Thanks,

                              Komal

                              • This reply was modified 2 years, 2 months ago by Komal Thareja.
                              in reply to: How to reach Nginx being hosted via IPv4 #6965
                              Komal Thareja
                              Moderator

                                Hi Jacob,

                                Thank you for clarifying your setup. FABRIC is now running it’s own NAT gateway. All VMs are configured to use it for NAT resolution so IPv4 addresses should be accessible.

                                Could you please share your slice ID? I’ll check your slice and share my findings.

                                Thanks,

                                Komal

                                • This reply was modified 2 years, 2 months ago by Komal Thareja.
                                in reply to: How to reach Nginx being hosted via IPv4 #6963
                                Komal Thareja
                                Moderator

                                  FABRIC only allows SSH and few ICMP messages over the management interface. Hosting services on management network is not recommended. Instead, we recommend using data plane network for your service.

                                  FABRIC serves as a secure sandbox, allowing students and researchers to experiment with potentially disruptive and vulnerable software architectures in a protected environment. When connecting external devices, such as laptops or servers, to nodes within a slice, it is crucial to employ secure methods like SSH tunnels. A Jupyter notebook example illustrates how to create SSH tunnels through the FABRIC bastion host. Alternatively, users can utilize personal VPNs like Tailscale for secure connections.

                                  Exposing ports to the entire Internet is restricted, reserved only for exceptional cases where alternative solutions are not viable. Moreover, users undertaking such capabilities are responsible for deploying, maintaining, and ensuring the security of experiments, akin to a production data center. IPv4Ext and IPv6Ext services facilitate these capabilities.

                                  For newcomers, getting acquainted with SSH tunnels is recommended due to their simplicity and security. If users have additional questions or require further guidance, they are encouraged to reach out.

                                  Thanks,

                                  Komal

                                Viewing 15 posts - 331 through 345 (of 557 total)