Forum Replies Created
-
AuthorPosts
-
There is a image called “default_ubuntu_22” that you can use.
However, I tried both ubuntu images and there is something about their IPv6 configuration that isn’t working quite right. It looks like in ubuntu the interfaces are not correctly being put in the state ‘up’ with IPv6.
It will work if you add the following line to each of the “Configure NodeX” cells of the JupyterExample:
stdout, stderr = node1.execute(f'sudo ip -6 link set dev {node1_iface.get_os_interface()} up')
That looks correct.
Are you able to open and read the file directly in the code?
It looks like you are running this on a Mac. Are you running it in a virtual environment on the Mac? In some cases a virtual environment will not be able to access files outside of the virtual environment.
Try opening and reading the file directly.
For clarity, I say “slice/sliver key” because we are a bit inconsistent in our use of terms. “Slice key” and “sliver key” are often use to mean the same thing. This is really just the key that is in the VM (as opposed to the key that is in the bastion host).
The important thing to know is that the slice key you use in the portal and the slice key you use in the JupyterHub are not necessarily the same. The slice key that is pushed to the VM will be the slice key that is used when you submit the slice request. That is the slice key you will need to use to access it, regardless of where you access if from.
So, if you create a slice in the portal and want to access it from the jupyterhub, you will need to have that slice key in your jupyterhub. The reverse is also true.
You don’t need to have your keys match, you just need to know which key you used when you created the slice.
This has something to do with the ESnet IEP for your account and we need to escalate it to a proper ticket.
Can you create a account help ticket here: https://portal.fabric-testbed.net/help
PAul
This should work. Can you confirm the following?
- You replaced “username_0123456789” with your bastion user name from the portal.
- You replace “~/.ssh/fabric_bastion” with the actual path to your bastion private key
- Your bastion key is not more that 6 months old
Paul
January 31, 2023 at 12:31 pm in reply to: Unable to allocate resources after the updates/maintenance. #3753@Manas –
Can you try using the NVMe drives? They are 1 TB each and you can have multiple per VM. Like all the other components, you can only create VMs composed of components that are on the same physical host. So, just because a site has 10+ NVMe drives does not mean you can put them all on one VM. Two NVMe drives in a VMs is possible on most sites. The other bonus of the NVMe drives is that they are very fast.
Also, you might try using large persistent volumes. These can be very large but are mounted across a network but within a site. You would need to pick a few site where we can create the volumes. Then you can mount them with VMs on that site. The bonus with these volumes is that the data is persistent. So, if you shutdown a slice and come back tomorrow or next week, the data will still be there.
Paul
- This reply was modified 1 year, 3 months ago by Paul Ruth.
We looked into this and this is not an issue with being banned.
From your error, you have made it through the bastion host but are failing authorization at the VM. This is likely caused by using the wrong VM username or the wrong key. Keep in mind, the key you use in the portal and the key you use in the JupyterHub are likely different. You can make them the same but you would need to manually do that. Are you sure you are using the correct slice/sliver key?
What are you using it for? Generally, the JupyterHub is a good place for code/script/docs (i.e. smaller things). Do you need space for large data sets? If so we can create a persistent storage volume in the testbed itself.
I think what you have will probably work once the ban is lifted. We did make a small change in order to load balance across the bastions hosts. There is now one bastion name “bastion.fabric-testbed.net”. You might try making ssh_config look something like this (Although I think it would work the way you have it):
Host bastion.fabric-testbed.net User pruth_0031379841 ForwardAgent yes Hostname %h IdentityFile /home/fabric/work/fabric_config/fabric_bastion_key IdentitiesOnly yes Host * !bastion.fabric-testbed.net ProxyJump pruth_0031379841@bastion.fabric-testbed.net:22
- This reply was modified 1 year, 3 months ago by Paul Ruth.
- This reply was modified 1 year, 3 months ago by Paul Ruth.
Oh, actually this won’t work for you right now. There is still something wrong with your ssh setup but even if you correct it, you have triggered our security policy about failed ssh retries and your IP has been temporarily banned.
Are you able to try this from a different IP?
Ok, my next thought is that the ssh_config file might be wrong or not at the path you specified.
Can you confirm the ssh_config file is in the local dir and post the parts related to the fabric bastion host?
It looks like you filled your disk allocation our JupyterHub. Do you have old files that you can clean up?
Duplicate.
Answer is here: https://learn.fabric-testbed.net/forums/topic/stdio-forwarding-failed-issue-2/
I think you need to use the private key in that ssh command rather than the public one.
Paul
Fraida,
Coincidentally, I ran into this issue recently when putting together an example that I intend to share with you in our meeting with Kate this week. I have a working prototype that looks like your example that uses a 5th VM to run a software OVS switch (https://witestlab.poly.edu/blog/basic-ethernet-switch-operation/).
There are actually a couple issues going on here that I had to work around… and its super impressive that Yoursunny identified the trickiest part.
The main issue is the one that Yoursunny pointed out related to the Basic NICs being SRIOV virtual functions on a ConnectX-6.
You can think of the ConnectX-6 as a mini-switch that uses its physical port(s) as trunks between the itself and the bigger dataplane switch. The mini-switch then has several access ports (i.e. SRIOV virtual functions) that that are passed through to the various VMs. The traffic on each of these access ports is basically a “pseudo wire” going through the ConnectX-6 between the VM and the dataplane switch. The problem is that the ConnectX-6 “mini-switch” is also doing MAC learning on the “pseudo wires” and is filtering the traffic. I think this is a unforeseen problem with our SRIOV configuration and just needs to be changed in the future. We are working on this.
The effect this has on your example is that an OVS VM that is using 4 Basic NICs connected to 4 other hosts will not see traffic sent directly to one host from another. The ARP request will go through because it is an broadcast but the ARP reply is filtered by the ConnectX-6 “mini-switch”. Without the ARPs, we don’t get very far.
My workaround is to use dedicated ConnectX-5s for the OVS switch VM (the hosts can be Basic NICs). The dedicated NIC are on access ports connected directly to the dataplane switch so there is no “mini-switch” filtering packets in between. This isn’t a great solution because it limits the degree of your OVS switch and uses a much more scarce resource type. The better long-term solution is for us to turn off MAC learning on the ConnectX-6 “mini-switches”.
I can tell you more about this later this week when we talk with Kate.
Paul
-
AuthorPosts