1. Create 8-node slice in Fabric

Create 8-node slice in Fabric

Home Forums FABRIC General Questions and Discussion Create 8-node slice in Fabric

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #3268
    Ertza
    Participant

      Hi, I’ve been trying to create an 8-node Fabric slice in MAX or TACC and the maximum I can go up to is 6, that too with quite difficulty and retries, although the resources page shows that they in fact do have the available resources and all 10 GPUs are available. Is there any constraint limiting it to 6 per project? Or is there some other way to create an 8-node slice?

      -----------  ------------------------------------
      Slice Name   ultima8
      Slice ID     d8933582-5702-4aee-ac54-0d63eb7ec74c
      Slice State  Closing
      Lease End    2022-10-07 17:46:15 +0000
      -----------  ------------------------------------
      
      Retry: 0, Time: 33 sec
      
      ID                                    Name    Site    Host                          Cores    RAM    Disk  Image              Management IP    State    Error
      ------------------------------------  ------  ------  --------------------------  -------  -----  ------  -----------------  ---------------  -------  ----------------------------------------------------------------------------------------
      9e1b80b3-8ff7-459a-a7cd-eab8f7a32a51  Node_0  TACC    tacc-w1.fabric-testbed.net       16     64     100  default_ubuntu_20                   Closed   TicketReviewPolicy: Closing reservation due to failure in slice
      4ab457f3-1cd0-4808-9c0a-7e8aad63a455  Node_1  TACC    tacc-w1.fabric-testbed.net       16     64     100  default_ubuntu_20                   Closed   TicketReviewPolicy: Closing reservation due to failure in slice
      b54cb2ff-6a76-4c60-b8ee-9ca4b728278d  Node_2  TACC    tacc-w2.fabric-testbed.net       16     64     100  default_ubuntu_20                   Closed   TicketReviewPolicy: Closing reservation due to failure in slice
      1e35ad5d-f3e7-4a0c-8172-e4d3116946aa  Node_3  TACC    tacc-w2.fabric-testbed.net       16     64     100  default_ubuntu_20                   Closed   TicketReviewPolicy: Closing reservation due to failure in slice
      6a706293-7c20-4c24-9830-76bf7cebc79e  Node_4  TACC    tacc-w2.fabric-testbed.net       16     64     100  default_ubuntu_20                   Closed   TicketReviewPolicy: Closing reservation due to failure in slice
      e2a3f34c-bafb-4165-85bc-9e937ceda9e7  Node_5  TACC                                                        default_ubuntu_20                   Closed   Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3
      b4afdaf5-e133-4705-acdb-1ff564850718  Node_6  TACC                                                        default_ubuntu_20                   Closed   Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3
      55316f2c-0e02-40f4-8439-cb36d5ddc700  Node_7  TACC                                                        default_ubuntu_20                   Closed   Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3
      #3272
      Paul Ruth
      Keymaster

        There is no limit per project.  I suspect you are hitting another resource limit.   What error are you getting?

        You are asking for specific hardware (GPUs) and significant amounts of cores/ram.  I suspect there is not enough cores or ram available on the hosts that have the GPUs (keep in mind other users are using them too).

        Try reducing the cores/ram and it will probably work.  Or try another site.

        #3273
        Ertza
        Participant

          ———– ————————————
          Slice Name ultima8
          Slice ID d8933582-5702-4aee-ac54-0d63eb7ec74c
          Slice State Closing
          Lease End 2022-10-07 17:46:15 +0000
          ———– ————————————

          Retry: 0, Time: 33 sec

          ID Name Site Host Cores RAM Disk Image Management IP State Error
          ———————————— —— —— ————————– ——- —– —— —————– ————— ——- —————————————————————————————-
          9e1b80b3-8ff7-459a-a7cd-eab8f7a32a51 Node_0 TACC tacc-w1.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
          4ab457f3-1cd0-4808-9c0a-7e8aad63a455 Node_1 TACC tacc-w1.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
          b54cb2ff-6a76-4c60-b8ee-9ca4b728278d Node_2 TACC tacc-w2.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
          1e35ad5d-f3e7-4a0c-8172-e4d3116946aa Node_3 TACC tacc-w2.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
          6a706293-7c20-4c24-9830-76bf7cebc79e Node_4 TACC tacc-w2.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
          e2a3f34c-bafb-4165-85bc-9e937ceda9e7 Node_5 TACC default_ubuntu_20 Closed Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3
          b4afdaf5-e133-4705-acdb-1ff564850718 Node_6 TACC default_ubuntu_20 Closed Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3
          55316f2c-0e02-40f4-8439-cb36d5ddc700 Node_7 TACC default_ubuntu_20 Closed Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3

          Time to stable 33 seconds
          Running post_boot_config … FAILURE: Slice is in Closing state; cannot do post boot config
          Time to post boot config 33 seconds

          #3274
          Ertza
          Participant

            I get the above error, basically RTX6000 not available in graph node: 8QPDZC3, although resources page show it is. Anyways I’ll try to lower other resources but I think the error is mainly due to GPU availability here.

            #3275
            Ilya Baldin
            Participant

              Our advertisements are a bit imperfect. There are typically 2 RTX6000 and 2 T4 GPUs per site, but for brevity we sometimes show 4 GPUs. This is something we’re working on. So if you are asking for more than 2 RTX6000s that may be the problem.

            Viewing 5 posts - 1 through 5 (of 5 total)
            • You must be logged in to reply to this topic.