1. Ajay Kumar

Ajay Kumar

Forum Replies Created

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • Ajay Kumar
    Participant

      I was bounded by time deadline, thus, I deleted that slice and recreated so it working correctly now, if i still face this same issue, I would raise it here in this thread.

      Ajay Kumar
      Participant

        I have also tried to purge all CUDA, NVIDIA drivers and install from scratch. This does not work as well.

        Commands used:

        sudo apt-get purge -y ‘*nvidia*’

        sudo apt-get autoremove -y

        sudo apt-get autoclean

        sudo reboot

         

        in reply to: channel 0: open failed: connect failed: No route to host #8758
        Ajay Kumar
        Participant

          slice_name=’GPU_Variant_Calling_FIU’
          node_name=’Node3′
          slice = fablib.get_slice(slice_name)
          node = slice.get_node(node_name)
          node.os_reboot()

          This piece of code generated this error. Now that it’s live, I can access this node. Thank you very much, Komal.

          in reply to: Lost network interface after rebooting of vm3 in a cluster #8624
          Ajay Kumar
          Participant

            Thank you very much, Komal, you are always a big help while working with fabric. I am not sure, but maybe overloading tasks on GPUs might have caused it crashed and then, when we reboot, it vanish the network interface settings and detaches PCI devices.

            It’s working pretty much well, thank you so much 😊 .

            Ajay Kumar
            Participant

              Yes I did! but anyways, it started working, I guess there was some time constraint issue that time. Its working perfectly now.

              Ajay Kumar
              Participant

                Is there issues going with fabric Jupyter lab to create cluster right now? It was working pretty fine yesterday.

                in reply to: Unable to SSH into my Nodes #8521
                Ajay Kumar
                Participant

                  Thank you very much, now it works fine, double hands up for your help, Komal.

                  in reply to: Unable to SSH into my Nodes #8518
                  Ajay Kumar
                  Participant

                    My Slice ID: 09255c48-5512-4e3c-bdc6-ad7d4fd37d07
                    Output of ifconfig -a command:

                    (base) ubuntu@Node4:~$ ifconfig -a
                    docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
                    inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
                    ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
                    RX packets 0 bytes 0 (0.0 B)
                    RX errors 0 dropped 0 overruns 0 frame 0
                    TX packets 0 bytes 0 (0.0 B)
                    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                    enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
                    inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
                    inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
                    ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
                    RX packets 541 bytes 53260 (53.2 KB)
                    RX errors 0 dropped 0 overruns 0 frame 0
                    TX packets 417 bytes 56772 (56.7 KB)
                    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
                    inet 127.0.0.1 netmask 255.0.0.0
                    inet6 ::1 prefixlen 128 scopeid 0x10<host>
                    loop txqueuelen 1000 (Local Loopback)
                    RX packets 114 bytes 9436 (9.4 KB)
                    RX errors 0 dropped 0 overruns 0 frame 0
                    TX packets 114 bytes 9436 (9.4 KB)
                    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                    in reply to: Unable to SSH into my Nodes #8515
                    Ajay Kumar
                    Participant

                      Thank you so much, Komal, it worked for me.

                      Following on that, I noticed my interface (enp9s0) is not found, earlier it was there. I have used this interface to connect with other nodes in the cluster. Could you please help me to make it UP again?

                      (base) ubuntu@Node4:~$ ifconfig
                      docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
                      inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
                      ether a2:9a:8b:03:9c:61 txqueuelen 0 (Ethernet)
                      RX packets 0 bytes 0 (0.0 B)
                      RX errors 0 dropped 0 overruns 0 frame 0
                      TX packets 0 bytes 0 (0.0 B)
                      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                      enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
                      inet 10.20.4.248 netmask 255.255.254.0 broadcast 10.20.5.255
                      inet6 fe80::f816:3eff:fe3a:e097 prefixlen 64 scopeid 0x20<link>
                      ether fa:16:3e:3a:e0:97 txqueuelen 1000 (Ethernet)
                      RX packets 179 bytes 18635 (18.6 KB)
                      RX errors 0 dropped 0 overruns 0 frame 0
                      TX packets 168 bytes 22384 (22.3 KB)
                      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                      lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
                      inet 127.0.0.1 netmask 255.0.0.0
                      inet6 ::1 prefixlen 128 scopeid 0x10<host>
                      loop txqueuelen 1000 (Local Loopback)
                      RX packets 110 bytes 8928 (8.9 KB)
                      RX errors 0 dropped 0 overruns 0 frame 0
                      TX packets 110 bytes 8928 (8.9 KB)
                      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                      in reply to: Unable to SSH into my Nodes #8510
                      Ajay Kumar
                      Participant

                        Does anyone know, how to reboot a node even if ping and ssh not working to that same node?

                      Viewing 10 posts - 1 through 10 (of 10 total)