Home › Forums › FABRIC General Questions and Discussion › pin_cpu & poa(operation=”cpupin”)
- This topic has 1 reply, 2 voices, and was last updated 22 hours, 2 minutes ago by
Komal Thareja.
-
AuthorPosts
-
October 31, 2025 at 9:58 am #9126
I’m poking around the CPU pinning feature and noticed some problems around
pin_cpu()andpoa(operation="cpupin")APIs.pin_cpu(cpu_range_to_pin=) syntax
In Node.pin_cpu function, the cpu_range_to_pin= parameter is described as:
cpu_range_to_pin: range of the cpus to pin; example: 0-1 or 0
However, passing
cpu_range_to_pin="0"would raiseValueError: not enough values to unpack (expected 2, got 1)at this line:start, end = map(int, cpu_range_to_pin.split("-"))pinned CPUs still in other VM’s affinity list
I created two slices each having a node on the same worker host.
After successfully pinning two physical CPUs to VCPUs onnode1, I checked the output ofnode2.get_cpu_info():{ "atla-w2.fabric-testbed.net": { "pinned_cpus": ["117", "53"] }, "instance-0000187f": [ { "CPU": "116", "CPU Affinity": "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127", "CPU time": "4.6s", "State": "running", "VCPU": "0" } ] }Notably, the other node’s CPU affinity list still includes the pinned CPUs.
I’m hoping that the pinned CPUs can be reserved exclusively fornode1and removed from the CPU affinity list of all other nodes.
This would allow CPU-intensive workloads onnode1to be executed more accurately.poa_cpupin with a CPU range
Normally, the
pin_cpufunction would send a POA command like this:node.poa(operation="cpupin", vcpu_cpu_map=[ {"vcpu":"2","cpu":"15"}, {"vcpu":"3","cpu":"79"}, ])I attempted to send a variation of this command:
node.poa(operation="cpupin", vcpu_cpu_map=[ {"vcpu":"2","cpu":"14-15"}, {"vcpu":"3","cpu":"78-79"}, ])The latter command would return
"SentToAuthority"instead of"Success".
Subsequently, further POA commands includingnode.get_cpu_info()would return 500 errors.When I adjust QEMU process with taskset command on my own server, I can set affinity of a VCPU to multiple physical CPUs.
If FABRIC cannot support that, the server side should have rejected the poa_cpupin command, instead of letting the node fall into an error state.poa_cpupin with in-use CPUs
I created two slices each having a node on the same worker host.
Then, I sent POA commands pinning the CPUs of these two nodes to the same physical CPUs:node1.poa(operation="cpupin", vcpu_cpu_map=[{"vcpu":"2","cpu":"15"}]) node2.poa(operation="cpupin", vcpu_cpu_map=[{"vcpu":"2","cpu":"15"}])The first POA completely successfully, while the second POA returns
"SentToAuthority".
Subsequently, further POA commands on the second node would return 500 errors.This error could happen even if the user is only calling the high-level API
node.pin_cpu().
Whenpin_cpu()calls are running concurrently against two separate nodes, which could belong to different users, they could pick the same physical CPU cores and send the conflicting POAs.Again, I’d suggest the server side to reject the poa_cpupin command that causes a conflict, instead of letting the second node fall into an error state.
October 31, 2025 at 11:10 am #9131Thank you, @yoursunny, for sharing these observations and the detailed steps to reproduce them. This appears to be a bug. I’ll work on addressing it and will update you once the patch is deployed.
Best,Komal1 user thanked author for this post.
-
AuthorPosts
- You must be logged in to reply to this topic.