Forum Replies Created
-
AuthorPosts
-
100G is quite difficult to achieve, especially across wide-area links. 56G
What size is the VMs that you are using? You probably need the biggest one available: 64 cores, 385G ram.
Look at your dropped packets. Getting 100G probably requires zero dropped packets. I suspect the connection is not that clean yet. Also, 100G will be impossible if other users are using any bandwidth at all.
Also, I suspect we need to set CPU affinity to ensure streams are in the same numa domains as the 100G NIC. We haven’t looked into that yet.
We will definitely have direct connections to Google Cloud (and AWS and MS Azure). This is still in development but we are excited to make this work.
Stay tuned…
August 10, 2022 at 2:35 pm in reply to: L2 network between sites often have nodes that cannot reach one another #2616I suspect this was an intermittent issue with some underlying infrastructure. Or possibly a link that was slow to be instantiated. It might be if you wait a few minutes the link will become active.
If you see this again can you respond to this forum thread and include your slice ID? If the right developers are available they can look at the underlying code/infrastructure and see if this is a bug.
Also, let us know if you figure out how to consistently recreate it.
Paul
- This reply was modified 2 years, 4 months ago by Paul Ruth.
I haven’t forgot about this question. This is a bit complicated to explain in text so my intention is to make an example notebook that shows you how to set this up.
To get you started…
The way to set this up is to set up ssh keypairs on all of your VMs and put them in the ~/.ssh/authorized_keys file.
By default, we only push your public slice/sliver ssh key to each VM and put it in the ~/.ssh/authorized_keys file. This way you can use your private ssh key to authorize access to your VM (although you need to jump through the bastion host on the way there).
What you are trying to do is to ssh from one (VM1) to another (VM2). This is possible, however VM1 will need a private ssh key that matches a public ssh key that is in VM2’s ~/.ssh/authorized_keys file.
There are two main options you have to do this:
Option 1: Since your public slice ssh key is already in the ~/.ssh/authorized_keys files, you can just copy your private slice ssh key to all of your VMs. Then you can ssh between nodes using that key. This is an easy way and acceptable way to set up your keys but it has one drawback. You might not want to distribute your private ssh key across your experiment.
Option 2: Create a new keypair and copy that keypair to all of your VMs. Put the public half of the keypair in the ~/.ssh/authorized_keys. It is perfectly fine to use the same keypair across your whole experiment.
Here is an online resource I found that explains a bit about how to setup keypairs and ~/.ssh/authorized_keys files: https://www.digitalocean.com/community/tutorials/how-to-configure-ssh-key-based-authentication-on-a-linux-server
Soon I will release an example notebook that sets all this up for you. I will reply to this forum thread when it is ready.
- This reply was modified 2 years, 4 months ago by Paul Ruth.
August 5, 2022 at 10:24 am in reply to: When Creating a Slice, Sometimes Fails to Get NIC Components Correctly #2584Xander,
It took a while to track this down but we found the bug that is causing this. A fix has been pushed to the production sites and we think you won’t see this anymore.
Keep trying this slices and please let us know if you see this error again.
thanks for reporting this bug in the forums.
Paul
@ Manaus Creating connections between nodes with ssh is a common requirement for experiments. I think this is a question that other user might have as well.
I’m hoping other users add more tutorial requests to this forum topic. Can you re-ask you question about ssh’ing in a new forum topic so we can keep this one clean and other users can easily find the ssh discussion? I will answer it there.
thanks.
August 4, 2022 at 1:08 pm in reply to: SSH Command Requires Bastion Key to be Added or Needs SSH Config File? #2579@Brandon
The tricky part is that FABlib doesn’t use the ssh config file and does not know where it is or if it even exists. The reason the ssh config is needed is for the command line ssh, which does not let you pass the bastion key. Instead, you need the bastion key to be in some keychain or the ssh config file. FABlib uses paramiko which uses the bastion key directly.
August 4, 2022 at 10:51 am in reply to: SSH Command Requires Bastion Key to be Added or Needs SSH Config File? #2574@Brandon I’ve been thinking about how to update that ‘ssh command’ attribute. This issue is that, as Hussam pointed out, it needs a ‘-F /path/to/ssh_config’ argument but FABlib doesn’t know where the ssh_config is located. I’m not sure what is best to put here. Maybe it could just say ‘-F /path/to/ssh_config’ and you would need to change the path.
What do you think?
@Manas Does the terminal window in JupyterHub fit your needs? In JupyterHub, try File->New->Terminal. You should be able to ssh to your VMs that way. You will need to setup your ssh config file correctly and use ‘ssh -F /path/to/the/ssh_config …’
Also, this doc shows how to install FABlib on your local machine. There are some issues with some Python libraries on Windows that we haven’t yet figured out but installing this on a Mac or Linux machine works great. The main issues that we have heard about relate to conflicts between Anaconda and PyPI. https://learn.fabric-testbed.net/knowledge-base/install-the-python-api/
Thanks for the suggestions. I think you are right, we need a video about tips/tricks for JupyterHub and another that shows a simple install of FABlib on a laptop.
@Shivam The P4 BMv2 software switch example is probably already in your JupyterHub. If not, you can pull the updated examples from here: https://github.com/fabric-testbed/jupyter-examples
The specific examples is here: https://github.com/fabric-testbed/jupyter-examples/tree/master/fabric_examples/complex_recipes/P4_bmv2
It is actually designed to support the P4Lang tutorials that you can find here: https://github.com/p4lang/tutorials
You should know that this is really a prototype example. I can help you get it running what you need but it will probably need some tweaking. Since you have expressed interest (and other user are probably interested too), I will revisit this example and make it easier to adapt.
Give it a shot and let me know about your progress. I can help modify the example to your needs.
Also, thanks for your feedback. We will look into the hardware DPDK support but that may take some time.
@Shivam: These are great topics. Are these topics you have figured out, or ones that you are still trying to figure out? We do have an early prototype of a BMv2 P4 software switch. It will probably need some tweaking to fit your needs but a start is there. Were you able to get that to work?
- This reply was modified 2 years, 4 months ago by Paul Ruth.
1 user thanked author for this post.
You should able to get the slice in another notebook using its name (or ID). Create a slice in one notebook and then in another notebook do this:
slice = fablib.get_slice(name=”MySliceName”)
The you should able to get the nodes from the slice or whatever else you need to do.
You should be aware that your token can only be used by one notebook at a time. You can switch back and forth between notebooks but you won’t be able to use the token simultaneously. However, if you want to run multiple notebooks simultaneously, you can manually get a second token and configure a second notebook to use the second token.
Note there is some inconsistency in our use of the terms slice/sliver keys. They are the same thing. They are the keys that FABRIC puts in the VMs in your slice.
Your message helps. I am now certain this is an issue with which slice/sliver keys you are using. It is an understandable mixup because the slice/sliver key handling is only half implemented and needs to be used carefully until the full implementation is complete.
Eventually users will create slice/sliver keys in the FABRIC portal and name them. Then the FABRIC API (FABlib) will require the user to specify a key (by name) to be used in the VMs of a slice. FABRIC will then put the public half of the key in the VMs and the user can access the VMs using the private half. This is similar to how AWS or OpenStack works.
Currently, the FABRIC API does not have the ability to use slice/sliver keys that are specified in the portal. As a work around, the API uses a keypair that needs to be accessible wherever the API is run. In the case of our JupyterHub, there is a keypair in
/home/fabric/work/fabric_config
and when you load a fablib manager it will be configured to use those keys (or whichever keys are specified in the/home/fabric/work/fabric_config/fabric_rc
file). You can check to see which keys you are using with thefablib.show_config()
method.The fix for your situation is to ignore the slice/sliver keys in the portal and copy the keys used by fablib from your Jupyter environment to your laptop. Then setup the ssh config file and keys exactly like you have been except use the slice/sliver keys you copied from your Jupyter environment.
Let me know if this works for you.
That all looks correct to me. The only thing I can’t check is if the keys are the correct ones.
The error suggests that it was able to get through the bastion host and was able to add the VM’s host key. This would indicate that the sliver key is the one that is incorrect.
Can you confirm the sliver key is correct? Can you use a Jupyter notebook to run a simple “execute” command on the VM (i.e. like in the Hello, FABRIC example)?
Also, are you trying to ssh to the VM from the same place you created the VM? If not, did you copy the sliver keys from the place you created the slice? For example, if you created the slice on our JupyterHub and are trying to ssh from your laptop, you will need to copy the keys from your Jupyter environment to your laptop.
July 27, 2022 at 2:44 pm in reply to: When Creating a Slice, Sometimes Fails to Get NIC Components Correctly #2546I understand. I was able to repeat the problem with the NICs but only when the RTX6000 is added. It has something to do with the GPU. I’m not sure why this happens this way but it has been reported to the developers.
Thanks for reporting this.
-
AuthorPosts