slice active but node no longer accessible

This topic has 28 replies, 4 voices, and was last updated 3 years, 8 months ago by Fengping Hu.

Viewing 15 posts - 1 through 15 (of 29 total)

1 2 →

Author

Posts
November 5, 2021 at 4:35 pm #942
Fengping Hu
Participant
I have a slice I created on 11/4. The node is no longer accessible for ssh even though the state still shows StableOK.

I did extend the lease for 10 days. The lease end is not updated which I know is possibly a bug.

How can I find out more about my slice? Is the lease not extended as requested? If it did then how to troubleshoot a active slice that’s not responding for ssh(or ping) via the Management IP?

This is the slice information:

Slice Name : MySliceL2Bridge1
ID : a1d0bc37-6e32-4de3-a5a0-2b2534a018df
State : StableOK
Lease End : 2021-11-05 02:18:44

Thanks,

Fengping
November 6, 2021 at 4:50 pm #946
Paul Ruth
Keymaster
Fengping,

We are looking into this. It does look like your slice is up. We need to investigate more to see what is wrong with the VMs.

I’ll add more info when I know it.

Paul
November 8, 2021 at 6:00 pm #949
Ilya Baldin
Participant
The VMs are up, but for some reason not reachable. Operations is looking into it.

aa8eda10-6c27-42a5-ad89-6e8a06948203-Node2 instance-00000534 management-2004=10.20.4.228, 63.239.135.87
e52f1301-52f1-4eb4-bd26-c92fc1b66664-Node1 instance-00000533 management-2004=10.20.4.203, 63.239.135.114
November 9, 2021 at 5:09 pm #953
Fengping Hu
Participant
FYI

Looks I can login to the nodes again today. However the node is changed. for example, the kubernetes installation are gone.
November 11, 2021 at 4:36 pm #991
Fengping Hu
Participant
seems I have a opposite problem today. I am getting slice not found, even though I can still login to the node.

This is a slice I created yesterday during slate fabric working sessions. I have been trying to extend the lease of this slice since yesterday. It was blocked due to timeout to get the slice at first which I attribute to maintenance. But now the Max is announced to be up and what I am getting is the slice can’t be found. Should I give up this slice and just create a new one?

Failure: (404)
Reason: NOT FOUND
HTTP response headers: HTTPHeaderDict({‘Server’: ‘nginx/1.21.1’, ‘Date’: ‘Thu, 11 Nov 2021 21:25:33 GMT’, ‘Content-Type’: ‘text/html; charset=utf-8’, ‘Content-Length’: ’19’, ‘Connection’: ‘keep-alive’, ‘Access-Control-Allow-Credentials’: ‘true’, ‘Access-Control-Allow-Headers’: ‘DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range’, ‘Access-Control-Allow-Methods’: ‘GET, POST, PUT, DELETE, OPTIONS’, ‘Access-Control-Allow-Origin’: ‘*’, ‘Access-Control-Expose-Headers’: ‘Content-Length, Content-Range, X-Error’, ‘X-Error’: ‘User# has no Slices’})
HTTP response body: User# has no Slices
November 11, 2021 at 4:37 pm #992
Ilya Baldin
Participant
Yes, this probably leaked during the restart. I suggest you start a new one. I will see if we can garbage collect the old slice. Are there other slices?

@Paul this may be the answer to your question about leaked cards – even though I thought I shut down all slices, I clearly missed something.
November 11, 2021 at 4:50 pm #993
Fengping Hu
Participant
Yes I had about 3 active slices and a few dead ones. Now they are all gone.

I will create a new slice for the slate fabric project. Let me know if I need to do anything to the nodes that are still alive. Guess they will still die after lease end even though the slice is not returned from query.

Here are the management ips :

63.239.135.116, 63.239.135.87

63.239.135.75, 63.239.135.121(these two seems just died since there are created this time yesterday)
November 11, 2021 at 4:51 pm #994
Ilya Baldin
Participant
If you have names and sites for those slices it would help me look.
November 11, 2021 at 4:59 pm #995
Fengping Hu
Participant
They are all at the MAX site. The names are something like KubernetesSlice, MyKubernetesSlice, KubernetesSlice-test. Not 100% sure about the names anymore since I didn’t keep it.
November 11, 2021 at 5:26 pm #996
Ilya Baldin
Participant
I’ve garbage collected everything that wasn’t active at MAX. Let me know if you can still reach your slice.
November 15, 2021 at 10:56 am #1039
Fengping Hu
Participant
Slice active but node no longer accessible happened again. My slice is on site MAX. Please let me know if there’s anything I can do to regain access. Thanks!

KubernetesSlice-slate:
ID : 9d7ee6d2-1db0-4e2c-a513-6b89801f7ed3
State : StableOK
Lease End : 2021-11-12 22:01:58
November 15, 2021 at 11:38 am #1045
Ilya Baldin
Participant
We are looking into it.
November 15, 2021 at 12:20 pm #1046
Ilya Baldin
Participant
@Fengping – you are sure you were able to access this slice before? Logs indicate the VMs may not be fully booted – and must have been this way for a while.
November 15, 2021 at 12:29 pm #1047
Fengping Hu
Participant
Thanks for looking into this for me.

Yes, I was able to access this slice perfectly fine. I deployed this slice on Nov 11th. I have installed kubernetes on it and was able to deploy some applications on the kubernetes cluster and access the application via internet(via ingress through the management ip) on Friday.

I also extended the lease of this slice for 30 days.

I found it no longer accessible this morning. I have not done anything to the slice over the weekend.
November 15, 2021 at 12:35 pm #1048
Ilya Baldin
Participant
As far as the site OpenStack is concerned the VMs are active and presumably fine. We are trying to figure out why management access isn’t working. This has happened at MAX, as you know, we are not sure why yet – it does not appear to affect other sites. We will continue looking into it.
Author

Posts

Viewing 15 posts - 1 through 15 (of 29 total)

1 2 →

You must be logged in to reply to this topic.