Home › Forums › FABRIC General Questions and Discussion › Lost SSH access to some nodes after node.os_reboot()
- This topic has 2 replies, 2 voices, and was last updated 5 months ago by
Sunjay Cauligi.
-
AuthorPosts
-
May 29, 2025 at 5:28 pm #8556
Hello, I was following the examples for NUMA tuning, which have a call to
node.os_reboot()after requesting CPU pinning and NUMA tuning.
Most of my nodes have come back up, but nodesnode2d2,node2d3, andnode3d2are still inaccessible via SSH.Slice name:
ei-network-20250515140559
Slice ID:b94cdedd-c230-4059-a172-a1ff45fd85e8May 29, 2025 at 8:44 pm #8559Hello Sunjay,
I checked the 3 VMs. I could bring the VM node3d2 online manually, but the other two did not succeed. I will suggest re-creating the slice (or modify it to re-create the VMs).
We also need to check ourselves. Can you point out the notebook you used? I’m assuming one of the notebooks in fabric-examples github repo, but if it’s a customized notebook, please let me know, I will reach out to you via email to get the notebook (or you can attach to this thread if it’s fine for you).
May 30, 2025 at 12:07 pm #8562Hi Mert,
I was following the NUMA steps based on this example notebook: iperf3_optimized.ipynb
which really just boiled down to running the following:for node in slc.get_nodes(): node.pin_cpu(component_name='einic') # one node failed to pin cpu here, investigated that with .get_cpu_info() for node in slc.get_nodes(): node.numa_tune() # a couple nodes failed to pin memory here, investigated that with .get_numa_info() for node in slc.get_nodes(): node.os_reboot()The slice I ran this on is a long-running project slice that has been up for the last two weeks. Instead of a notebook, I use a wrapper library and Python scripts directly from my local machine to provision/manage slices. I can email you my library code and an example script of my typical usage if you would like to take a look.
This slice was provisioned with three nodes each at three different sites; each node itself is provisioned identically with a FABNETv4 (“internal”, with a 10.* IP address) NIC and a FABNETv4Ext (“public”, with a publicly routable IP address) NIC.
I install identical software and run identical code on each node as well.-
This reply was modified 5 months ago by
Sunjay Cauligi.
-
This reply was modified 5 months ago by
Sunjay Cauligi.
-
This reply was modified 5 months ago by
-
AuthorPosts
- You must be logged in to reply to this topic.