1. Ajay Kumar

Ajay Kumar

Forum Replies Created

Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • in reply to: Slices stuck at configuring…….state #9689
    Ajay Kumar
    Participant

      Thank you very much, Mert. I highly appreciate your response on this. I have created slice on other nodes for current work. But, I am also in a need of resources that MAX has, is there any around time duration, that you may suggest, would be highly appreciated.

      Ajay Kumar
      Participant

        I was bounded by time deadline, thus, I deleted that slice and recreated so it working correctly now, if i still face this same issue, I would raise it here in this thread.

        Ajay Kumar
        Participant

          I have also tried to purge all CUDA, NVIDIA drivers and install from scratch. This does not work as well.

          Commands used:

          sudo apt-get purge -y ‘*nvidia*’

          sudo apt-get autoremove -y

          sudo apt-get autoclean

          sudo reboot

           

          in reply to: channel 0: open failed: connect failed: No route to host #8758
          Ajay Kumar
          Participant

            slice_name=’GPU_Variant_Calling_FIU’
            node_name=’Node3′
            slice = fablib.get_slice(slice_name)
            node = slice.get_node(node_name)
            node.os_reboot()

            This piece of code generated this error. Now that it’s live, I can access this node. Thank you very much, Komal.

            in reply to: Lost network interface after rebooting of vm3 in a cluster #8624
            Ajay Kumar
            Participant

              Thank you very much, Komal, you are always a big help while working with fabric. I am not sure, but maybe overloading tasks on GPUs might have caused it crashed and then, when we reboot, it vanish the network interface settings and detaches PCI devices.

              It’s working pretty much well, thank you so much 😊 .

              Ajay Kumar
              Participant

                Yes I did! but anyways, it started working, I guess there was some time constraint issue that time. Its working perfectly now.

                Ajay Kumar
                Participant

                  Is there issues going with fabric Jupyter lab to create cluster right now? It was working pretty fine yesterday.

                  in reply to: Unable to SSH into my Nodes #8521
                  Ajay Kumar
                  Participant

                    Thank you very much, now it works fine, double hands up for your help, Komal.

                    in reply to: Unable to SSH into my Nodes #8518
                    Ajay Kumar
                    Participant

                      My Slice ID: 09255c48-5512-4e3c-bdc6-ad7d4fd37d07
                      Output of ifconfig -a command:

                      (base) ubuntu@Node4:~$ ifconfig -a
                      docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
                      inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
                      ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
                      RX packets 0 bytes 0 (0.0 B)
                      RX errors 0 dropped 0 overruns 0 frame 0
                      TX packets 0 bytes 0 (0.0 B)
                      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                      enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
                      inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
                      inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
                      ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
                      RX packets 541 bytes 53260 (53.2 KB)
                      RX errors 0 dropped 0 overruns 0 frame 0
                      TX packets 417 bytes 56772 (56.7 KB)
                      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                      lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
                      inet 127.0.0.1 netmask 255.0.0.0
                      inet6 ::1 prefixlen 128 scopeid 0x10<host>
                      loop txqueuelen 1000 (Local Loopback)
                      RX packets 114 bytes 9436 (9.4 KB)
                      RX errors 0 dropped 0 overruns 0 frame 0
                      TX packets 114 bytes 9436 (9.4 KB)
                      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                      in reply to: Unable to SSH into my Nodes #8515
                      Ajay Kumar
                      Participant

                        Thank you so much, Komal, it worked for me.

                        Following on that, I noticed my interface (enp9s0) is not found, earlier it was there. I have used this interface to connect with other nodes in the cluster. Could you please help me to make it UP again?

                        (base) ubuntu@Node4:~$ ifconfig
                        docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
                        inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
                        ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
                        RX packets 0 bytes 0 (0.0 B)
                        RX errors 0 dropped 0 overruns 0 frame 0
                        TX packets 0 bytes 0 (0.0 B)
                        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                        enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
                        inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
                        inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
                        ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
                        RX packets 179 bytes 18635 (18.6 KB)
                        RX errors 0 dropped 0 overruns 0 frame 0
                        TX packets 168 bytes 22384 (22.3 KB)
                        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                        lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
                        inet 127.0.0.1 netmask 255.0.0.0
                        inet6 ::1 prefixlen 128 scopeid 0x10<host>
                        loop txqueuelen 1000 (Local Loopback)
                        RX packets 110 bytes 8928 (8.9 KB)
                        RX errors 0 dropped 0 overruns 0 frame 0
                        TX packets 110 bytes 8928 (8.9 KB)
                        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                        in reply to: Unable to SSH into my Nodes #8510
                        Ajay Kumar
                        Participant

                          Does anyone know, how to reboot a node even if ping and ssh not working to that same node?

                        Viewing 11 posts - 1 through 11 (of 11 total)