Forum Replies Created
-
AuthorPosts
-
Ah ok. I’ll try later. I thought it may mean some kind of resource exhaustion.
I think this example creates multiple slices to test MTU between sites
I have this artifact notebook that shows how to do (2), but again, for your example I wouldn’t worry about this
https://artifacts.fabric-testbed.net/artifacts/e1771f8d-ca7a-42fc-b6ec-542df83168a8
At least in my experience you are not likely to succeed getting a single slice this large in one shot. One of two things is a better approach:
- Build separate slices (if you are using FABNetv4 or FABNetv4Ext it is easy to get them all to communicate with each other)
- Build up a single slice by growing it via ‘modify’ (if a modify fails on a given site because it is out of resources, you move on to the next to get more nodes)
I am not sure (2) is worth the trouble for what you are describing.
- This reply was modified 1 month ago by Ilya Baldin.
I think this is operator error, apologies.
Komal,
Thank you. My recollection of events though is that I tried to renew it some days ago and it stayed in that ‘Configuring’ state, never updating to the new deadline. It’s possible I did it during another Kafka event.
That’s strange. KANS and LOSA are allocated to our project (EJFAT) and we don’t have anything running there.
Same problem is back – tried WASH, KANS and LOSA – get a different insufficient error (once it is disk, another it is ram, third time it is Xilinx) but the result is same – can’t create a VM with Xilinx on any of those.
It is possible this is happening because I’m requesting 32G RAM on the VM with FPGA, but just wanted to check, since the error suggests FPGAs aren’t available.
Had to delete the slice for unrelated reasons…
If you are trying to reach this server via the *management* interface, it may or may not work depending on whether the site you are in is using IPv4 or (most of them) IPv6 as a management address. To do what you want to do you need to use the dataplane with FABNetv4Ext service. Please read more here
and also here
1 user thanked author for this post.
FABRIC doesn’t have a100s. As per this document, what is available are a30, a40, rtx6000 and t4 – all are suited for inference, not training tasks.
Awesome! Thank you!
“No candidates nodes found to serve” – there aren’t enough resources in the site to serve your request. It could be there are not enough cores or RAM in the workers because the site is too busy.
-
AuthorPosts