1. slice active but node no longer accessible

slice active but node no longer accessible

Home Forums FABRIC General Questions and Discussion slice active but node no longer accessible

Viewing 15 posts - 1 through 15 (of 29 total)
  • Author
    Posts
  • #942
    Fengping Hu
    Participant

      I have a slice I created on 11/4. The node is no longer accessible for ssh even though the state still shows StableOK.

      I did extend the lease for 10 days. The lease end is not updated which I know is possibly a bug.

      How can I find out more about my slice? Is the lease not extended as requested? If it did then how to troubleshoot a active slice that’s not responding for ssh(or ping) via the Management IP?

      This is the slice information:

      Slice Name : MySliceL2Bridge1
      ID : a1d0bc37-6e32-4de3-a5a0-2b2534a018df
      State : StableOK
      Lease End : 2021-11-05 02:18:44

      Thanks,

      Fengping

      #946
      Paul Ruth
      Keymaster

        Fengping,

        We are looking into this. It does look like your slice is up.  We need to investigate more to see what is wrong with the VMs.

        I’ll add more info when I know it.

        Paul

        #949
        Ilya Baldin
        Participant

          The VMs are up, but for some reason not reachable. Operations is looking into it.

          aa8eda10-6c27-42a5-ad89-6e8a06948203-Node2 instance-00000534  management-2004=10.20.4.228, 63.239.135.87
          e52f1301-52f1-4eb4-bd26-c92fc1b66664-Node1 instance-00000533  management-2004=10.20.4.203, 63.239.135.114

          #953
          Fengping Hu
          Participant

            FYI

            Looks I can login to the nodes again today. However the node is changed. for example, the kubernetes installation are gone.

            #991
            Fengping Hu
            Participant

              seems I have a opposite problem today. I am getting  slice not found, even though I can still login to the node.

              This is a slice I created yesterday during slate fabric working sessions. I have been trying to extend the lease of this slice since yesterday. It was blocked due to timeout to get the slice at first which I attribute to maintenance. But now the Max is announced to be up and what I am getting is the slice can’t be found. Should I give up this slice and just create a new one?

              Failure: (404)
              Reason: NOT FOUND
              HTTP response headers: HTTPHeaderDict({‘Server’: ‘nginx/1.21.1’, ‘Date’: ‘Thu, 11 Nov 2021 21:25:33 GMT’, ‘Content-Type’: ‘text/html; charset=utf-8’, ‘Content-Length’: ’19’, ‘Connection’: ‘keep-alive’, ‘Access-Control-Allow-Credentials’: ‘true’, ‘Access-Control-Allow-Headers’: ‘DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range’, ‘Access-Control-Allow-Methods’: ‘GET, POST, PUT, DELETE, OPTIONS’, ‘Access-Control-Allow-Origin’: ‘*’, ‘Access-Control-Expose-Headers’: ‘Content-Length, Content-Range, X-Error’, ‘X-Error’: ‘User# has no Slices’})
              HTTP response body: User# has no Slices

              #992
              Ilya Baldin
              Participant

                Yes, this probably leaked during the restart. I suggest you start a new one. I will see if we can garbage collect the old slice. Are there other slices?

                 

                @Paul this may be the answer to your question about leaked cards – even though I thought I shut down all slices, I clearly missed something.

                #993
                Fengping Hu
                Participant

                  Yes I had about 3 active slices and a few dead ones. Now they are all gone.

                  I will create a new slice for the slate fabric project.  Let me know if I need to do anything to the nodes that are still alive. Guess they will still die after lease end even though the slice is not returned from query.

                  Here are the management ips :

                  63.239.135.116, 63.239.135.87

                  63.239.135.75, 63.239.135.121(these two seems just died since there are created this time yesterday)

                   

                  #994
                  Ilya Baldin
                  Participant

                    If you have names and sites for those slices it would help me look.

                    #995
                    Fengping Hu
                    Participant

                      They are all at the MAX site. The names are something like KubernetesSlice, MyKubernetesSlice, KubernetesSlice-test. Not 100% sure about the names anymore since I didn’t keep it.

                      #996
                      Ilya Baldin
                      Participant

                        I’ve garbage collected everything that wasn’t active at MAX. Let me know if you can still reach your slice.

                        #1039
                        Fengping Hu
                        Participant

                          Slice active but node no longer accessible happened again. My slice is on site MAX. Please let me know if there’s anything I can do to regain access. Thanks!

                          KubernetesSlice-slate:
                          ID : 9d7ee6d2-1db0-4e2c-a513-6b89801f7ed3
                          State : StableOK
                          Lease End : 2021-11-12 22:01:58

                           

                          #1045
                          Ilya Baldin
                          Participant

                            We are looking into it.

                            #1046
                            Ilya Baldin
                            Participant

                              @Fengping – you are sure you were able to access this slice before? Logs indicate the VMs may not be fully booted – and must have been this way for a while.

                              #1047
                              Fengping Hu
                              Participant

                                Thanks for looking into this for me.

                                Yes, I was able to access this slice perfectly fine.  I deployed this slice on Nov 11th. I have installed kubernetes on it and was able to deploy some applications on the kubernetes cluster and access the application via internet(via ingress through the management ip) on Friday.

                                I also extended the lease of this slice for 30 days.

                                I found it no longer accessible this morning. I have not done anything to the slice over the weekend.

                                #1048
                                Ilya Baldin
                                Participant

                                  As far as the site OpenStack is concerned the VMs are active and presumably fine. We are trying to figure out why management access isn’t working. This has happened at MAX, as you know, we are not sure why yet – it does not appear to affect other sites. We will continue looking into it.

                                Viewing 15 posts - 1 through 15 (of 29 total)
                                • You must be logged in to reply to this topic.