1. Ajay Kumar

Ajay Kumar

Forum Replies Created

Viewing 13 posts - 1 through 13 (of 13 total)
  • Author
    Posts
  • in reply to: Have you tried LoomAI yet? #9845
    Ajay Kumar
    Participant

      I am trying to use LoomAI. I was able to follow up the instruction in the GitHub and downloaded Docker version. But, while  uploading FABRIC credential token in the LoomAI, it shows API error. Is there any way to resolve this? I understand it’s at very beginning stage so errors are expected.

      Token upload failed: API error 502: <html> <head><title>502 Bad Gateway</title></head> <body> <center>

      502 Bad Gateway

      </center> <hr><center>nginx</center> </body> </html>

      ✗ API error 502: <html> <head><title>502 Bad Gateway</title></head> <body> <center>

      502 Bad Gateway

      </center> <hr><center>nginx</center> </body> </html>

      in reply to: Inquiry Regarding MAX Site Maintenance Completion Timeline #9699
      Ajay Kumar
      Participant

        Thank you so much, Mert, for the quick update. It’s perfectly working now.

        in reply to: Slices stuck at configuring…….state #9689
        Ajay Kumar
        Participant

          Thank you very much, Mert. I highly appreciate your response on this. I have created slice on other nodes for current work. But, I am also in a need of resources that MAX has, is there any around time duration, that you may suggest, would be highly appreciated.

          Ajay Kumar
          Participant

            I was bounded by time deadline, thus, I deleted that slice and recreated so it working correctly now, if i still face this same issue, I would raise it here in this thread.

            Ajay Kumar
            Participant

              I have also tried to purge all CUDA, NVIDIA drivers and install from scratch. This does not work as well.

              Commands used:

              sudo apt-get purge -y ‘*nvidia*’

              sudo apt-get autoremove -y

              sudo apt-get autoclean

              sudo reboot

               

              in reply to: channel 0: open failed: connect failed: No route to host #8758
              Ajay Kumar
              Participant

                slice_name=’GPU_Variant_Calling_FIU’
                node_name=’Node3′
                slice = fablib.get_slice(slice_name)
                node = slice.get_node(node_name)
                node.os_reboot()

                This piece of code generated this error. Now that it’s live, I can access this node. Thank you very much, Komal.

                in reply to: Lost network interface after rebooting of vm3 in a cluster #8624
                Ajay Kumar
                Participant

                  Thank you very much, Komal, you are always a big help while working with fabric. I am not sure, but maybe overloading tasks on GPUs might have caused it crashed and then, when we reboot, it vanish the network interface settings and detaches PCI devices.

                  It’s working pretty much well, thank you so much 😊 .

                  Ajay Kumar
                  Participant

                    Yes I did! but anyways, it started working, I guess there was some time constraint issue that time. Its working perfectly now.

                    Ajay Kumar
                    Participant

                      Is there issues going with fabric Jupyter lab to create cluster right now? It was working pretty fine yesterday.

                      in reply to: Unable to SSH into my Nodes #8521
                      Ajay Kumar
                      Participant

                        Thank you very much, now it works fine, double hands up for your help, Komal.

                        in reply to: Unable to SSH into my Nodes #8518
                        Ajay Kumar
                        Participant

                          My Slice ID: 09255c48-5512-4e3c-bdc6-ad7d4fd37d07
                          Output of ifconfig -a command:

                          (base) ubuntu@Node4:~$ ifconfig -a
                          docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
                          inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
                          ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
                          RX packets 0 bytes 0 (0.0 B)
                          RX errors 0 dropped 0 overruns 0 frame 0
                          TX packets 0 bytes 0 (0.0 B)
                          TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                          enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
                          inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
                          inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
                          ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
                          RX packets 541 bytes 53260 (53.2 KB)
                          RX errors 0 dropped 0 overruns 0 frame 0
                          TX packets 417 bytes 56772 (56.7 KB)
                          TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                          lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
                          inet 127.0.0.1 netmask 255.0.0.0
                          inet6 ::1 prefixlen 128 scopeid 0x10<host>
                          loop txqueuelen 1000 (Local Loopback)
                          RX packets 114 bytes 9436 (9.4 KB)
                          RX errors 0 dropped 0 overruns 0 frame 0
                          TX packets 114 bytes 9436 (9.4 KB)
                          TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                          in reply to: Unable to SSH into my Nodes #8515
                          Ajay Kumar
                          Participant

                            Thank you so much, Komal, it worked for me.

                            Following on that, I noticed my interface (enp9s0) is not found, earlier it was there. I have used this interface to connect with other nodes in the cluster. Could you please help me to make it UP again?

                            (base) ubuntu@Node4:~$ ifconfig
                            docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
                            inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
                            ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
                            RX packets 0 bytes 0 (0.0 B)
                            RX errors 0 dropped 0 overruns 0 frame 0
                            TX packets 0 bytes 0 (0.0 B)
                            TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                            enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
                            inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
                            inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
                            ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
                            RX packets 179 bytes 18635 (18.6 KB)
                            RX errors 0 dropped 0 overruns 0 frame 0
                            TX packets 168 bytes 22384 (22.3 KB)
                            TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                            lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
                            inet 127.0.0.1 netmask 255.0.0.0
                            inet6 ::1 prefixlen 128 scopeid 0x10<host>
                            loop txqueuelen 1000 (Local Loopback)
                            RX packets 110 bytes 8928 (8.9 KB)
                            RX errors 0 dropped 0 overruns 0 frame 0
                            TX packets 110 bytes 8928 (8.9 KB)
                            TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                            in reply to: Unable to SSH into my Nodes #8510
                            Ajay Kumar
                            Participant

                              Does anyone know, how to reboot a node even if ping and ssh not working to that same node?

                            Viewing 13 posts - 1 through 13 (of 13 total)