Forum Replies Created
-
AuthorPosts
-
Duplicate.
Answer is here: https://learn.fabric-testbed.net/forums/topic/stdio-forwarding-failed-issue-2/
I think you need to use the private key in that ssh command rather than the public one.
Paul
Fraida,
Coincidentally, I ran into this issue recently when putting together an example that I intend to share with you in our meeting with Kate this week. I have a working prototype that looks like your example that uses a 5th VM to run a software OVS switch (https://witestlab.poly.edu/blog/basic-ethernet-switch-operation/).
There are actually a couple issues going on here that I had to work around… and its super impressive that Yoursunny identified the trickiest part.
The main issue is the one that Yoursunny pointed out related to the Basic NICs being SRIOV virtual functions on a ConnectX-6.
You can think of the ConnectX-6 as a mini-switch that uses its physical port(s) as trunks between the itself and the bigger dataplane switch. The mini-switch then has several access ports (i.e. SRIOV virtual functions) that that are passed through to the various VMs. The traffic on each of these access ports is basically a “pseudo wire” going through the ConnectX-6 between the VM and the dataplane switch. The problem is that the ConnectX-6 “mini-switch” is also doing MAC learning on the “pseudo wires” and is filtering the traffic. I think this is a unforeseen problem with our SRIOV configuration and just needs to be changed in the future. We are working on this.
The effect this has on your example is that an OVS VM that is using 4 Basic NICs connected to 4 other hosts will not see traffic sent directly to one host from another. The ARP request will go through because it is an broadcast but the ARP reply is filtered by the ConnectX-6 “mini-switch”. Without the ARPs, we don’t get very far.
My workaround is to use dedicated ConnectX-5s for the OVS switch VM (the hosts can be Basic NICs). The dedicated NIC are on access ports connected directly to the dataplane switch so there is no “mini-switch” filtering packets in between. This isn’t a great solution because it limits the degree of your OVS switch and uses a much more scarce resource type. The better long-term solution is for us to turn off MAC learning on the ConnectX-6 “mini-switches”.
I can tell you more about this later this week when we talk with Kate.
Paul
@Arash: Try again now.
This looks like what you might see if your
fabric_rc
file is missing theFABRIC_SSH_COMMAND_LINE
env var.If you are in the FABRIC Jupyterhub, you should have the following line in your
~/work/fabric_config/fabric_rc
export FABRIC_SSH_COMMAND_LINE="ssh -i {{ _self_.private_ssh_key_file }} -F /home/fabric/work/fabric_config/ssh_config {{ _self_.username }}@{{ _self_.management_ip }}"
This is actually a template of the command line that works in the FABRIC JupyterHub. Originally, FABlib printed a standard ssh command. However, no single ssh command works universally across all FABlib installations (i.e. persional laptops) and we were constantly being being told the command was incorrect. The compromise (which I think is a good one) was to create a templated command that could be customized for any environment. In our JupyterHub we include the default configuration that works in that environment but you can change it to work in your environment.
This template feature was added in one of that later 1.3.x versions. If you were using a version of FABlib older than that, you probably need to add the env var to your fabric_rc file. You are seeing this now because the default FABlib version was just updated to 1.4.0. There are some other 1.4.0 features you might also want, so I would recommend re-running the new “Configure Environment” notebook.
January 30, 2023 at 9:50 am in reply to: Component names in add_component vs get_component_available #3690That is a good observation. The code that gets available resources is a bit underdeveloped right now. What you are seeing is a discrepancy between the terms used by user-facing FABlib API and the underlying model data structure. Longer-term, we will try to have the users only see the FABlib API terms.
January 30, 2023 at 9:34 am in reply to: error in slice submit : `refresh_token` must not be `None` #3688The most likely reasons for this are either the token expired or you config is not pointing to the valid token.
If you are in the FABRIC JupyterHub, you can get a new token by logging out the back in. If you still see this error then you should confirm you configuration is correct. You can print you configuration with
fablib.show_config()
. The default location of the token in the FABRIC JupyterHub is:/home/fabric/.tokens.json
Thanks we will look into this.
We are still in the process of automating the generation of the documentation.
January 11, 2023 at 12:13 pm in reply to: Exception: [Errno 99] Cannot assign requested address #3592Mina is correct, this is a current issue with bastion-1. They are working on a solution. If you switch to bastion-2 it will work.
January 11, 2023 at 12:09 pm in reply to: Exception: node.execute: Management IP Invalid: None #3591Thushari,
This issue was fixed with FABLib 1.3.3. This is now the current default version in our JupyterHub. Please try again.
If you are using local installation, can update using pip:
pip install fabrictestbed-extensions==1.3.3
Paul
1 user thanked author for this post.
Ertza – We are able to get close to 100G on most links. Are you using iperf3? iperf3 is single threaded and the highest bandwidth you will see is ~25Gbps. You will need to create multiple processes and add their results.
@Don – I think I stumbled upon the problem you were having with missing subnets/gateways on your fabnet networks. I think the issue is that you have fabrictested ==1.3.2 but fabrictestved-extensions only works with fabrictestbed=1.3.1.
We are planning a release but are holding off until after the semester is over so that we don’t interfere with educational uses cases. Somehow fabrictested ==1.3.2.became the default in pypi. So, if you manually installed the both, you may have “upgraded” to fabrictested ==1.3.2.
Try reverting to fabrictested ==1.3.1 and see if this helps.
(let me know if it works or not)
Yingqiang, Can you post the contents of your fabric_rc file and your ssh_config file? Then post exactly what your command line looks like ant he error that is returned?
@Yingqiang – Can you confirm that you are using the newest version of the jupyter-examples? They should in your jupyterhyub container in a folder called “jupyter-examples-rel1.3.3”.
You can set it in your fabric_rc file.
I had a similar issue with this when I run multiple fablib apps at the same time. What I did was to change the export in the fabric_rc to the follow so that the log is in the same folder at the executable.
export FABRIC_LOG_FILE=fablib.log export FABRIC_LOG_LEVEL=DEBUG
Right now some of the config code is a not tested as well as could be. We will get this smoothed out soon.
- This reply was modified 2 years, 1 month ago by Paul Ruth.
-
AuthorPosts