Home › Forums › FABRIC General Questions and Discussion › Create 8-node slice in Fabric
Tagged: slice creation
- This topic has 4 replies, 3 voices, and was last updated 2 years, 1 month ago by Ilya Baldin.
-
AuthorPosts
-
October 6, 2022 at 2:18 pm #3268
Hi, I’ve been trying to create an 8-node Fabric slice in MAX or TACC and the maximum I can go up to is 6, that too with quite difficulty and retries, although the resources page shows that they in fact do have the available resources and all 10 GPUs are available. Is there any constraint limiting it to 6 per project? Or is there some other way to create an 8-node slice?
----------- ------------------------------------ Slice Name ultima8 Slice ID d8933582-5702-4aee-ac54-0d63eb7ec74c Slice State Closing Lease End 2022-10-07 17:46:15 +0000 ----------- ------------------------------------ Retry: 0, Time: 33 sec ID Name Site Host Cores RAM Disk Image Management IP State Error ------------------------------------ ------ ------ -------------------------- ------- ----- ------ ----------------- --------------- ------- ---------------------------------------------------------------------------------------- 9e1b80b3-8ff7-459a-a7cd-eab8f7a32a51 Node_0 TACC tacc-w1.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice 4ab457f3-1cd0-4808-9c0a-7e8aad63a455 Node_1 TACC tacc-w1.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice b54cb2ff-6a76-4c60-b8ee-9ca4b728278d Node_2 TACC tacc-w2.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice 1e35ad5d-f3e7-4a0c-8172-e4d3116946aa Node_3 TACC tacc-w2.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice 6a706293-7c20-4c24-9830-76bf7cebc79e Node_4 TACC tacc-w2.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice e2a3f34c-bafb-4165-85bc-9e937ceda9e7 Node_5 TACC default_ubuntu_20 Closed Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3 b4afdaf5-e133-4705-acdb-1ff564850718 Node_6 TACC default_ubuntu_20 Closed Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3 55316f2c-0e02-40f4-8439-cb36d5ddc700 Node_7 TACC default_ubuntu_20 Closed Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3
October 6, 2022 at 3:52 pm #3272There is no limit per project. I suspect you are hitting another resource limit. What error are you getting?
You are asking for specific hardware (GPUs) and significant amounts of cores/ram. I suspect there is not enough cores or ram available on the hosts that have the GPUs (keep in mind other users are using them too).
Try reducing the cores/ram and it will probably work. Or try another site.
October 6, 2022 at 3:54 pm #3273———– ————————————
Slice Name ultima8
Slice ID d8933582-5702-4aee-ac54-0d63eb7ec74c
Slice State Closing
Lease End 2022-10-07 17:46:15 +0000
———– ————————————Retry: 0, Time: 33 sec
ID Name Site Host Cores RAM Disk Image Management IP State Error
———————————— —— —— ————————– ——- —– —— —————– ————— ——- —————————————————————————————-
9e1b80b3-8ff7-459a-a7cd-eab8f7a32a51 Node_0 TACC tacc-w1.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
4ab457f3-1cd0-4808-9c0a-7e8aad63a455 Node_1 TACC tacc-w1.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
b54cb2ff-6a76-4c60-b8ee-9ca4b728278d Node_2 TACC tacc-w2.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
1e35ad5d-f3e7-4a0c-8172-e4d3116946aa Node_3 TACC tacc-w2.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
6a706293-7c20-4c24-9830-76bf7cebc79e Node_4 TACC tacc-w2.fabric-testbed.net 16 64 100 default_ubuntu_20 Closed TicketReviewPolicy: Closing reservation due to failure in slice
e2a3f34c-bafb-4165-85bc-9e937ceda9e7 Node_5 TACC default_ubuntu_20 Closed Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3
b4afdaf5-e133-4705-acdb-1ff564850718 Node_6 TACC default_ubuntu_20 Closed Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3
55316f2c-0e02-40f4-8439-cb36d5ddc700 Node_7 TACC default_ubuntu_20 Closed Insufficient resources : Component of type: RTX6000 not available in graph node: 8QPDZC3Time to stable 33 seconds
Running post_boot_config … FAILURE: Slice is in Closing state; cannot do post boot config
Time to post boot config 33 secondsOctober 6, 2022 at 3:56 pm #3274I get the above error, basically
RTX6000 not available in graph node: 8QPDZC3
, although resources page show it is. Anyways I’ll try to lower other resources but I think the error is mainly due to GPU availability here.October 6, 2022 at 10:38 pm #3275Our advertisements are a bit imperfect. There are typically 2 RTX6000 and 2 T4 GPUs per site, but for brevity we sometimes show 4 GPUs. This is something we’re working on. So if you are asking for more than 2 RTX6000s that may be the problem.
-
AuthorPosts
- You must be logged in to reply to this topic.