1. Requesting exclusive/dedicated CPU core allocation for performance-sensitive exp

Requesting exclusive/dedicated CPU core allocation for performance-sensitive exp

Home Forums FABRIC General Questions and Discussion Requesting exclusive/dedicated CPU core allocation for performance-sensitive exp

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #9755

    Hello,

    I am running distributed GPU training experiments (PyTorch FSDP / NCCL all_reduce) on a FABRIC slice with two NVIDIA A30 GPUs at the PRIN site. Our main task is to compare Fabric training times with with those from our own simulator.

    For reproducible bandwidth measurements I need to eliminate CPU noise from hypervisor co-scheduling. I understand that GPU components use PCIe passthrough and are therefore exclusive to my slice. My question is:

    Is there a way to request a node where the physical CPU cores are also exclusively assigned to my VM (no hypervisor overcommit / no co-tenancy)? I did not find a dedicated_cpu or exclusive parameter in the FABlib API.

    Site: PRIN
    Node type: VM with 2× GPU_A30

    Thank you.

    #9756
    Hussam Nasir
    Participant

      There is a feature in fablib that allows for cpu/core pinining. That is as close as you can get to core exclusivity i guess.

      #9771
      yoursunny
      Participant

        You can try the cpupin_common script that eliminates four of five layers of CPU interference:
        CPU isolation, all the way up

        The NIST-MQNS artifact contains a concrete example of using this script to perform a CPU-bound benchmark of a Python application.

      Viewing 3 posts - 1 through 3 (of 3 total)
      • You must be logged in to reply to this topic.