- This topic has 2 replies, 2 voices, and was last updated 2 days, 21 hours ago by .
Viewing 3 posts - 1 through 3 (of 3 total)
Viewing 3 posts - 1 through 3 (of 3 total)
- You must be logged in to reply to this topic.
Home › Forums › FABRIC General Questions and Discussion › Lost SSH access to some nodes after node.os_reboot()
Hello, I was following the examples for NUMA tuning, which have a call to node.os_reboot()
after requesting CPU pinning and NUMA tuning.
Most of my nodes have come back up, but nodes node2d2
, node2d3
, and node3d2
are still inaccessible via SSH.
Slice name: ei-network-20250515140559
Slice ID: b94cdedd-c230-4059-a172-a1ff45fd85e8
Hello Sunjay,
I checked the 3 VMs. I could bring the VM node3d2 online manually, but the other two did not succeed. I will suggest re-creating the slice (or modify it to re-create the VMs).
We also need to check ourselves. Can you point out the notebook you used? I’m assuming one of the notebooks in fabric-examples github repo, but if it’s a customized notebook, please let me know, I will reach out to you via email to get the notebook (or you can attach to this thread if it’s fine for you).
Hi Mert,
I was following the NUMA steps based on this example notebook: iperf3_optimized.ipynb
which really just boiled down to running the following:
for node in slc.get_nodes():
node.pin_cpu(component_name='einic')
# one node failed to pin cpu here, investigated that with .get_cpu_info()
for node in slc.get_nodes():
node.numa_tune()
# a couple nodes failed to pin memory here, investigated that with .get_numa_info()
for node in slc.get_nodes():
node.os_reboot()
The slice I ran this on is a long-running project slice that has been up for the last two weeks. Instead of a notebook, I use a wrapper library and Python scripts directly from my local machine to provision/manage slices. I can email you my library code and an example script of my typical usage if you would like to take a look.
This slice was provisioned with three nodes each at three different sites; each node itself is provisioned identically with a FABNETv4 (“internal”, with a 10.* IP address) NIC and a FABNETv4Ext (“public”, with a publicly routable IP address) NIC.
I install identical software and run identical code on each node as well.