Forum Replies Created
-
AuthorPosts
-
July 15, 2022 at 11:40 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2327
This might be Windows issue. I’m going to have to have some other people look at it. Is there any way you could reproduce that graphml error and include a full stack trace? That might help us track this down.
Re: Code in a forum post. Clickt the “Text” tab next to the “Visual” tab that in top right of the box that you are typing in. The click the “CODE” button and it will insert a then add your code, then click the “/CODE” button to insert another. Anything between the `s will be in the box that my code was in.
- This reply was modified 2 years, 4 months ago by Paul Ruth.
We are still working on tuning all the links and trying to figure out best practices for achieving very high bandwidths. There are no artificial limitations on that link and, in theory, 100G is possible. This is just going to require a bunch of tuning, both on the edge and in probably in the core.
I know some of our students where looking at this and achieved ~100G between pairs of sites that are closer to each other. I’m not sure what the current best bandwidth achieved is on the longer spans, but I remember seeing them getting at least 30G for some tests. We would be interested in knowing about any successes you have with achieving higher bandwidths.
What tuning did you perform in your nodes?
In general, there are a lot of variable that can prevent bandwidths at these rates. You might reduce some of those variable by starting with a pair of sites that a close to each other (maybe UTAH/SALT) and use dedicated connectX-6 cards.
Your UDP test has a 98% loss. Given that the card is a 100G card it can easily overwhelm an intermediary switch which can result in huge packet losses like that. You might try UDP test with lower bandwidths and slowly increase the bandwidth until you packet loss starts to grow. Then try different tuning parameters to see if you can get it higher.
I’m going to see if one of our student who was working on this can add an more here…
July 15, 2022 at 10:19 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2317It works for me but it didn’t work the first time I tried it. The error I got the first time might be your problem too.
The first time I ran it I got this:
pruth@pruth-laptop Desktop % python3 hello_edited.py Name CPUs Cores RAM (G) Disk (G) Basic (100 Gbps NIC) ConnectX-6 (100 Gbps x2 NIC) ConnectX-5 (25 Gbps x2 NIC) P4510 (NVMe 1TB) Tesla T4 (GPU) RTX6000 (GPU) ------ ------ ------- --------- ------------- ---------------------- ------------------------------ ----------------------------- ------------------ ---------------- --------------- MICH 6 190/192 1530/1536 60590/60600 381/381 0/2 2/2 10/10 2/2 3/3 UTAH 10 320/320 2560/2560 116400/116400 635/635 2/2 4/4 16/16 4/4 5/5 TACC 10 238/320 2328/2560 115590/116400 632/635 2/2 4/4 16/16 4/4 6/6 WASH 6 188/192 1520/1536 60580/60600 379/381 2/2 2/2 10/10 2/2 3/3 NCSA 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3 DALL 6 190/192 1528/1536 60590/60600 381/381 2/2 2/2 10/10 2/2 3/3 MAX 10 290/320 2452/2560 116190/116400 619/635 1/2 4/4 16/16 4/4 6/6 MASS 4 120/128 992/1024 55700/55800 254/254 1/2 0/0 6/6 0/0 3/3 SALT 6 184/192 1504/1536 60500/60600 380/381 2/2 2/2 10/10 2/2 3/3 STAR 12 368/384 3008/3072 121060/121200 757/762 2/2 6/6 20/20 6/6 4/6 Exception: Failed to submit slice: Status.FAILURE, (500) Reason: INTERNAL SERVER ERROR HTTP response headers: HTTPHeaderDict({'Server': 'nginx/1.21.6', 'Date': 'Fri, 15 Jul 2022 15:08:55 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '28', 'Connection': 'keep-alive', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Headers': 'DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range', 'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, OPTIONS', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'Content-Length, Content-Range, X-Error', 'X-Error': 'Slice MySlice already exists'}) HTTP response body: Slice MySlice already exists Exception: 'NoneType' object has no attribute 'slice_name' ----------------- -------------------------------------------------------------------------------------------------------------------- ID Name Node1 Cores RAM Disk Image default_rocky_8 Image Type qcow2 Host Site UTAH Management IP Reservation State Error Message SSH Command ssh -i /Users/pruth/work/fabric_config/slice-private-key -J pruth_0031379841@bastion-1.fabric-testbed.net rocky@None ----------------- -------------------------------------------------------------------------------------------------------------------- Exception: node.execute: Management IP Invalid: None Exception: Failed to delete slice: Status.INVALID_ARGUMENTS, Invalid arguments pruth@pruth-laptop Desktop %
Notice the error in the middle that says “HTTP response body: Slice MySlice already exists”. This is because I already had a slice called “MySlice”. I deleted that slice and re-ran your script and it worked. This was the result:
pruth@pruth-laptop Desktop % python3 hello_edited.py Name CPUs Cores RAM (G) Disk (G) Basic (100 Gbps NIC) ConnectX-6 (100 Gbps x2 NIC) ConnectX-5 (25 Gbps x2 NIC) P4510 (NVMe 1TB) Tesla T4 (GPU) RTX6000 (GPU) ------ ------ ------- --------- ------------- ---------------------- ------------------------------ ----------------------------- ------------------ ---------------- --------------- MICH 6 190/192 1530/1536 60590/60600 381/381 0/2 2/2 10/10 2/2 3/3 UTAH 10 320/320 2560/2560 116400/116400 635/635 2/2 4/4 16/16 4/4 5/5 TACC 10 238/320 2328/2560 115590/116400 632/635 2/2 4/4 16/16 4/4 6/6 WASH 6 188/192 1520/1536 60580/60600 379/381 2/2 2/2 10/10 2/2 3/3 NCSA 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3 DALL 6 190/192 1528/1536 60590/60600 381/381 2/2 2/2 10/10 2/2 3/3 MAX 10 290/320 2452/2560 116190/116400 619/635 1/2 4/4 16/16 4/4 6/6 MASS 4 120/128 992/1024 55700/55800 254/254 1/2 0/0 6/6 0/0 3/3 SALT 6 184/192 1504/1536 60500/60600 380/381 2/2 2/2 10/10 2/2 3/3 STAR 12 368/384 3008/3072 121060/121200 757/762 2/2 6/6 20/20 6/6 4/6 Waiting for slice ........... Slice state: StableOK Waiting for ssh in slice .. ssh successful Running post boot config ... Done! --------------- ------------------------------------ Slice Name MySlice Slice ID fba02fd7-423e-4309-9954-c3cbff38870a Slice State StableOK Lease End (UTC) 2022-07-16 15:11:53 +0000 --------------- ------------------------------------ ----------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ ID 59eda82a-b9b7-4670-b830-40cff59e18cc Name Node1 Cores 2 RAM 8 Disk 10 Image default_rocky_8 Image Type qcow2 Host dall-w3.fabric-testbed.net Site DALL Management IP 2001:400:a100:3000:f816:3eff:fe7e:5477 Reservation State Active Error Message SSH Command ssh -i /Users/pruth/work/fabric_config/slice-private-key -J pruth_0031379841@bastion-1.fabric-testbed.net rocky@2001:400:a100:3000:f816:3eff:fe7e:5477 ----------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ Hello, FABRIC from node 59eda82a-b9b7-4670-b830-40cff59e18cc-node1 pruth@pruth-laptop Desktop %
Is this your issue too?
- This reply was modified 2 years, 4 months ago by Paul Ruth.
- This reply was modified 2 years, 4 months ago by Paul Ruth.
July 15, 2022 at 9:59 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2310I think I fixed it so you can attach .py and .txt file. Can you try again?
Look at the example notebook called “Bastion Keypair”. It sets up a ssh config file that is necessary for ssh’ing from a command line. You can add the path to your bastion key and your bastion user id to this notebook. Then run the notebook and it will create the correct ssh config file.
This is an initial response to a quirk in command line ssh when jumping through a host with -J. For some reason you cannot pass the bastion host key on the command line. The only way to do this is to have the bastion private key in a keychain or in the ssh config file. SSHing from inside a notebook uses paramiko and does not need the ssh config file.
Very soon we will release a new version of fablib that will streamline a bunch of config including this issue.
July 15, 2022 at 8:29 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2295Can you send me the python file you are using so I can try to recreate this issue?
July 15, 2022 at 8:22 am in reply to: Exception: error: User is not a member of project: trafficgen, refresh_token: .. #2291You have the project ID set to the name of your project. It should be set to the guid that can be found on the project’s page in the portal. For example, the project ID for the FABRIC Tutorials project is circled in the attached image.
Paul
- This reply was modified 2 years, 4 months ago by Paul Ruth.
- This reply was modified 2 years, 4 months ago by Paul Ruth.
July 15, 2022 at 8:10 am in reply to: fabric-fim conflicts with jupyter-client: python-dateutil-2.8.1 or 2.8.2? #2290I created a note for the developers.
thanks,
Paul
July 14, 2022 at 2:15 pm in reply to: Management IP Invalid: None when running Python code in Jupyter #2285The Jupyter notebooks are just python but it allows you to run them one cell at a time. Can you cut/paste the code from the cells of “Hello, FABRIC” notebook to a .py script and run it? As long as your env vars and python libraries are setup correctly it should work.
I think some of those debugging notebooks are old and maybe don’t work anymore.
You can use the 100G networks by just creating a WAN link that connects VMs using 100G NICs. Any of the regular networking notebooks should work for this. The only thing to think about is that, for now, dedicated quality of service guarantees are not available. However, very little bandwidth is currently being used and you should not be limited by other users.
That said, we have only begun testing most of the links and have not confirmed the bandwidth we can achieve. In theory, most of them should be able to get 100G but I suspect most of them will need some tuning. Please try this and let us know what you can achieve.
thanks,
Paul
- This reply was modified 2 years, 4 months ago by Paul Ruth.
July 14, 2022 at 8:33 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2280Are you still having issues running your notebook?
July 12, 2022 at 12:50 pm in reply to: Management IP Invalid: None when running Python code in Jupyter #2268Which tags do you need? Which project?
Paul
July 11, 2022 at 5:18 pm in reply to: Management IP Invalid: None when running Python code in Jupyter #2263I’m not sure what the problem is. When I try the code you posted it works. I think this means it has something to do with your configuration. Are you able to run the “Hello, FABRIC” notebook? That one is, basically, a test that confirms the configuration is correct.
July 11, 2022 at 2:40 pm in reply to: Your project is lacking Component.SmartNIC tag to provision a VM with SmartNIC #2261Which project are you working on? A FABRIC admin needs to set the tag.
thanks,
Paul
July 11, 2022 at 2:39 pm in reply to: Management IP Invalid: None when running Python code in Jupyter #2260We are working on better error messages but for now ‘Management IP Invalid: None’ is a bit of generic fail message. It means that the VM didn’t get a Management IP assigned to it. In practice, this is the result of an uncaught VM failure, often related to errors in assigning IPs but sometime other things.
It is difficult to say what is causing this specific error but we seem to see this occasionally when a site is having issues starting VMs. You might try to resubmit the slice but on a different site. In your case you are using a random site so it may be as easy are retrying the same request. It would also be useful if you let us know which site you are seeing in this on when it happens.
Paul
-
AuthorPosts