Forum Replies Created
-
AuthorPosts
-
Maintenance is completed. CLEM node is available for experiments.
December 12, 2025 at 5:18 pm in reply to: Cannot SSH into NS2 and NS4 nodes, need to preserve data (PhD simulations) #9263Hello Danilo,
I checked both VMs (NS2 and NS4) and they are up and online. However there seems to be some changes in the SSH key(s) that are injected by the system and I cannot login to the VMs. I’m not sure about the main root cause of the exceptions that you posted, they indicate that the FABRIC orchestration system cannot perform actions on the VMs, and the issue with the SSH keys may be the main cause. We need to learn from you what might have changed on the VMs.
I’m posting the reservation info here. Both VM reservations are valid until 12/25
{ "sliver_id": "be7426fd-7ebe-4e6b-bc65-56e30d7e8e50", "slice_id": "53cfa2bd-5110-420d-8bdb-053c1af45801", "type": "VM", "notices": "Reservation be7426fd-7ebe-4e6b-bc65-56e30d7e8e50 (Slice NS2(53cfa2bd-5110-420d-8bdb-053c1af45801) Graph Id:2aced1b1-3843-4809-b9a2-9ed9e2ff0317 Owner:daniloassis@utfpr.edu.br) is in state (Active,None_) ", "start": "2025-08-02 12:44:10 +0000", "end": "2025-12-25 18:11:58 +0000", "requested_end": "2025-12-25 18:11:58 +0000", "units": 1, "state": 4, "pending_state": 11, "sliver": { "Name": "NS2", "Type": "VM", "Capacities": "{\"core\": 16, \"disk\": 1000, \"ram\": 16}", "CapacityHints": "{\"instance_type\": \"fabric.c16.m16.d1000\"}", "CapacityAllocations": "{\"core\": 16, \"disk\": 1000, \"ram\": 16}", "LabelAllocations": "{\"instance\": \"instance-000040ec\", \"instance_parent\": \"star-w1.fabric-testbed.net\"}", "ReservationInfo": "{\"reservation_id\": \"be7426fd-7ebe-4e6b-bc65-56e30d7e8e50\", \"reservation_state\": \"Active\"}", "NodeMap": "[\"e2b1e451-45b4-4691-9527-aae18cad3b19\", \"BBQSH63\"]", "StitchNode": "false", "ImageRef": "default_ubuntu_22,qcow2", "MgmtIp": "2001:400:a100:3030:f816:3eff:fed9:14ee", "Site": "STAR" } } { "sliver_id": "527af509-fb62-41ae-a0c0-191e3c7f6525", "slice_id": "059939f1-19be-49ee-ab5b-bf4504639c13", "type": "VM", "notices": "Reservation 527af509-fb62-41ae-a0c0-191e3c7f6525 (Slice NS4(059939f1-19be-49ee-ab5b-bf4504639c13) Graph Id:14a3d7f7-e797-4e75-bafb-00282ea63896 Owner:daniloassis@utfpr.edu.br) is in state (Active,None_) ", "start": "2025-08-02 12:44:10 +0000", "end": "2025-12-25 18:12:40 +0000", "requested_end": "2025-12-25 18:12:40 +0000", "units": 1, "state": 4, "pending_state": 11, "sliver": { "Name": "NS4", "Type": "VM", "Capacities": "{\"core\": 16, \"disk\": 1000, \"ram\": 16}", "CapacityHints": "{\"instance_type\": \"fabric.c16.m16.d1000\"}", "CapacityAllocations": "{\"core\": 16, \"disk\": 1000, \"ram\": 16}", "LabelAllocations": "{\"instance\": \"instance-00000cf6\", \"instance_parent\": \"toky-w2.fabric-testbed.net\"}", "ReservationInfo": "{\"reservation_id\": \"527af509-fb62-41ae-a0c0-191e3c7f6525\", \"reservation_state\": \"Active\"}", "NodeMap": "[\"e2b1e451-45b4-4691-9527-aae18cad3b19\", \"FW696S3\"]", "StitchNode": "false", "ImageRef": "default_ubuntu_22,qcow2", "MgmtIp": "133.69.160.21", "Site": "TOKY" } }December 5, 2025 at 5:14 pm in reply to: Maintenance on STAR, WASH, DALL, SALT, LOSA, KANS – on Dec 5 at 9AM EST #9252Maintenance is completed.
Maintenance completed. FABRIC-AMST is back online. All current VMs are started.
Network outage is resolved. BRIST node is available for new slices.
FABRIC-BRIST node is back online.
August 18, 2025 at 3:06 pm in reply to: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. #8829Ajay,
The sliver that you reported the problem about was on FABRIC-FIU and it’s expired. I can see that you have recent slivers on the FIU node. Can you please let us know if you still encounter the problem or not?
Hello,
Yesterday some services were affected from an outage and the errors for loading projects and extension might be due to the problem with the affected services. FABRIC team checked all services to confirm they are restored and I also confirmed that slice extension is working well.
Due to the FABRIC team’s schedules, this outage might not have been communicated via the announcements, apologies for this inconvenience.
June 27, 2025 at 5:56 pm in reply to: FABRIC NEWY and FABRIC LBNL – Network Maintenance on June 27 at 4:30 pm EST #8660Maintenance is completed.
June 18, 2025 at 7:55 pm in reply to: FABRIC STAR – Network Maintenance on June 18th between 5-7pm EST #8625Maintenance is completed.
June 12, 2025 at 6:40 pm in reply to: FABRIC WASH – Network Maintenance on June 12 between 5-7pm EST #8612Maintenance completed. WASH node is online.
June 11, 2025 at 7:27 pm in reply to: FABRIC KANS – Network Maintenance on June 11 between 5-7pm EST #8604Maintenance completed. KANS node is online.
Hello Sunjay,
I checked the 3 VMs. I could bring the VM node3d2 online manually, but the other two did not succeed. I will suggest re-creating the slice (or modify it to re-create the VMs).
We also need to check ourselves. Can you point out the notebook you used? I’m assuming one of the notebooks in fabric-examples github repo, but if it’s a customized notebook, please let me know, I will reach out to you via email to get the notebook (or you can attach to this thread if it’s fine for you).
I just tested access from a TACC VM and http://linux.mirrors.es.net/ubuntu was reachable. If you’re still having problems, then we need to learn about the specific information from your VM and docker image creation. Please let us know if you still need help on this.
Power outage is resolved. PSC node is online, available for experiments.
-
AuthorPosts