Home › Forums › FABRIC General Questions and Discussion › slice active but node no longer accessible
- This topic has 28 replies, 4 voices, and was last updated 3 years ago by Fengping Hu.
-
AuthorPosts
-
November 5, 2021 at 4:35 pm #942
I have a slice I created on 11/4. The node is no longer accessible for ssh even though the state still shows StableOK.
I did extend the lease for 10 days. The lease end is not updated which I know is possibly a bug.
How can I find out more about my slice? Is the lease not extended as requested? If it did then how to troubleshoot a active slice that’s not responding for ssh(or ping) via the Management IP?
This is the slice information:
Slice Name : MySliceL2Bridge1
ID : a1d0bc37-6e32-4de3-a5a0-2b2534a018df
State : StableOK
Lease End : 2021-11-05 02:18:44Thanks,
Fengping
November 6, 2021 at 4:50 pm #946Fengping,
We are looking into this. It does look like your slice is up. We need to investigate more to see what is wrong with the VMs.
I’ll add more info when I know it.
Paul
November 8, 2021 at 6:00 pm #949The VMs are up, but for some reason not reachable. Operations is looking into it.
aa8eda10-6c27-42a5-ad89-6e8a06948203-Node2 instance-00000534 management-2004=10.20.4.228, 63.239.135.87
e52f1301-52f1-4eb4-bd26-c92fc1b66664-Node1 instance-00000533 management-2004=10.20.4.203, 63.239.135.114November 9, 2021 at 5:09 pm #953FYI
Looks I can login to the nodes again today. However the node is changed. for example, the kubernetes installation are gone.
November 11, 2021 at 4:36 pm #991seems I have a opposite problem today. I am getting slice not found, even though I can still login to the node.
This is a slice I created yesterday during slate fabric working sessions. I have been trying to extend the lease of this slice since yesterday. It was blocked due to timeout to get the slice at first which I attribute to maintenance. But now the Max is announced to be up and what I am getting is the slice can’t be found. Should I give up this slice and just create a new one?
Failure: (404)
Reason: NOT FOUND
HTTP response headers: HTTPHeaderDict({‘Server’: ‘nginx/1.21.1’, ‘Date’: ‘Thu, 11 Nov 2021 21:25:33 GMT’, ‘Content-Type’: ‘text/html; charset=utf-8’, ‘Content-Length’: ’19’, ‘Connection’: ‘keep-alive’, ‘Access-Control-Allow-Credentials’: ‘true’, ‘Access-Control-Allow-Headers’: ‘DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range’, ‘Access-Control-Allow-Methods’: ‘GET, POST, PUT, DELETE, OPTIONS’, ‘Access-Control-Allow-Origin’: ‘*’, ‘Access-Control-Expose-Headers’: ‘Content-Length, Content-Range, X-Error’, ‘X-Error’: ‘User# has no Slices’})
HTTP response body: User# has no SlicesNovember 11, 2021 at 4:37 pm #992Yes, this probably leaked during the restart. I suggest you start a new one. I will see if we can garbage collect the old slice. Are there other slices?
@Paul this may be the answer to your question about leaked cards – even though I thought I shut down all slices, I clearly missed something.
November 11, 2021 at 4:50 pm #993Yes I had about 3 active slices and a few dead ones. Now they are all gone.
I will create a new slice for the slate fabric project. Let me know if I need to do anything to the nodes that are still alive. Guess they will still die after lease end even though the slice is not returned from query.
Here are the management ips :
63.239.135.116, 63.239.135.87
63.239.135.75, 63.239.135.121(these two seems just died since there are created this time yesterday)
November 11, 2021 at 4:51 pm #994If you have names and sites for those slices it would help me look.
November 11, 2021 at 4:59 pm #995They are all at the MAX site. The names are something like KubernetesSlice, MyKubernetesSlice, KubernetesSlice-test. Not 100% sure about the names anymore since I didn’t keep it.
November 11, 2021 at 5:26 pm #996I’ve garbage collected everything that wasn’t active at MAX. Let me know if you can still reach your slice.
November 15, 2021 at 10:56 am #1039Slice active but node no longer accessible happened again. My slice is on site MAX. Please let me know if there’s anything I can do to regain access. Thanks!
KubernetesSlice-slate:
ID : 9d7ee6d2-1db0-4e2c-a513-6b89801f7ed3
State : StableOK
Lease End : 2021-11-12 22:01:58November 15, 2021 at 11:38 am #1045We are looking into it.
November 15, 2021 at 12:20 pm #1046@Fengping – you are sure you were able to access this slice before? Logs indicate the VMs may not be fully booted – and must have been this way for a while.
November 15, 2021 at 12:29 pm #1047Thanks for looking into this for me.
Yes, I was able to access this slice perfectly fine. I deployed this slice on Nov 11th. I have installed kubernetes on it and was able to deploy some applications on the kubernetes cluster and access the application via internet(via ingress through the management ip) on Friday.
I also extended the lease of this slice for 30 days.
I found it no longer accessible this morning. I have not done anything to the slice over the weekend.
November 15, 2021 at 12:35 pm #1048As far as the site OpenStack is concerned the VMs are active and presumably fine. We are trying to figure out why management access isn’t working. This has happened at MAX, as you know, we are not sure why yet – it does not appear to affect other sites. We will continue looking into it.
-
AuthorPosts
- You must be logged in to reply to this topic.