error when attempting to numa_tune

This topic has 9 replies, 3 voices, and was last updated 5 months, 3 weeks ago by Komal Thareja.

Viewing 10 posts - 1 through 10 (of 10 total)

Author

Posts
November 6, 2023 at 2:47 pm #5994
Gregory Daues
Participant
I was making a first attempt to utilize this code

—
slice = fablib.get_slice(slice_name)

for node in slice.get_nodes():
# Pin all vCPUs for VM to same Numa node as the component
node.pin_cpu(component_name=nic_name)

# Pin memmory for VM to same Numa node as the components
node.numa_tune()

# Reboot the VM
node.os_reboot()
—
from the jupyter example
https://github.com/fabric-testbed/jupyter-examples/blob/main/fabric_examples/complex_recipes/iPerf3/iperf3_optimized.ipynb

and I obtained the error message
Pinning Node: compute1-ATLA CPUs for component: Tr1.NIC1 to Numa Node: 1
Fail: Cannot numatune VM to Numa Nodes [‘1’]; requested memory 65536 exceeds available: 31825

My Slice has nodes with 16 cores and 128 GB ; any commentary on such a message ?

Greg
November 6, 2023 at 3:07 pm #5995
Komal Thareja
Participant
Hello Greg,

node.numa_tune() tries to pin the memory for a VM to the Numa Nodes belonging to the components attached to the VM in the current implementation.

Looking at the sliver details, this sliver has 64G memory allocated. The Numa node for the component attached to this VM is Node 1. In our topology we have 8 Numa Nodes per worker and each is allocated 64G memory. The error message above implies that the requested memory in the above case (64G) is not available on the Numa Node(1) and hence the VM’s memory cannot be pinned to Node(1).

sliver_id: '0764c99c-0e76-4aaa-94de-c291bd2b23f0', 'name': 'compute1-ATLA' 'capacities': '{ core: 16 , ram: 64 G, disk: 500 G}', 'capacity_allocations': '{ core: 16 , ram: 64 G, disk: 500 G}'

Also, in the current version, the API doesn’t allow to pin only a percentage of the memory to the numa node. We will work on adding that capability to serve the memory request better in the next release. Appreciate your feedback!

Thanks,

Komal
November 6, 2023 at 3:18 pm #5998
Gregory Daues
Participant
Hello Komal,

Yes I thought I had 128 GB memory but indeed it is only 64 GB. Would it be more likely to succeed
if the VMs has 128 GB memory ?

Greg
November 6, 2023 at 3:26 pm #5999
Gregory Daues
Participant
Or perhaps eve 256 GB on the VM to ensure that 64 GB is free.

Greg
November 6, 2023 at 3:27 pm #6000
Komal Thareja
Participant
No, having lesser memory requested would have better chances or deploying on a relatively less used site would give better success. I checked on the portal GPN seems to be very sparsely used. Please consider requesting the VM there and try with 32G ram.

Upper limit for a VM connected with only one component would map to a single Numa Node. Max limit on memory for a numa node is 64G so exceeding that limit would not work.

Adding more flexibility to this API would help alleviate this issue. Will definitely work on that and keep you updated once that is available.

Thanks,
Komal
November 6, 2023 at 3:30 pm #6001
Gregory Daues
Participant
ok I misunderstood the ‘memory fit’, I will try lower such as 32 GB, and make a new Slice.

Greg
November 7, 2023 at 11:33 am #6040
yoursunny
Participant
Komal Thareja wrote:

Upper limit for a VM connected with only one component would map to a single Numa Node.

What happens if there are multiple components that are on distinct NUMA sockets?
Is it possible to specify how much RAM to pin to each NUMA socket?

Komal Thareja wrote:

Max limit on memory for a numa node is 64G so exceeding that limit would not work.

If we pin a CPU core or certain amount of RAM onto a NUMA socket, does it prevent other VMs from using the same CPU core or RAM capacity?
November 7, 2023 at 11:43 am #6041
Komal Thareja
Participant
@yoursunny

What happens if there are multiple components that are on distinct NUMA sockets?
If you have multiple components, we try to pin the memory for the VM to both the Numa Nodes.

Example: your VM has a ConnectX-5 and GPU both on different sockets, invoking numa_tune would pin the memory to both the sockets provided that the combined available memory on both the sockets >= requested VM RAM.

Is it possible to specify how much RAM to pin to each NUMA socket?
In the current version, this is not supported. We may be limited on this by the underlying OS API as well. But we would explore to improve on this.

If we pin a CPU core or certain amount of RAM onto a NUMA socket, does it prevent other VMs from using the same CPU core or RAM capacity?
Yes, if you have pinned CPUs/Memory to a specific NUMA socket, other VMs cannot use the same cores/memory on that socket.

For CPU pinning, you can explicitly specify how many cores to pin to a Numa Node.

Thanks,
Komal

1 user thanked author for this post.

yoursunny
November 7, 2023 at 4:13 pm #6062
Gregory Daues
Participant
I created a Slice with 32 GB memory VMs at multiple sites, and I see that the Pin and numa_tune succeeded
at GPN, NCSA, EDC, RUTG, but failed at INDI, STAR, TACC
(requested memory 32768 exceeds available: 9323 type error)
So I suppose this should just be viewed as an issue of the available resources at the site at
the current time .

Greg
November 7, 2023 at 6:32 pm #6063
Komal Thareja
Participant
You are right Greg, this is totally dependent on how much memory is available on the Numa Node on the Host where your VM is launched at the current time.
Author

Posts

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.

1 user thanked author for this post.