Forum Replies Created
-
AuthorPosts
-
@Yingqiang – Have you tried the FABRIC example notebooks that come pre-installed in your JupyterHub environment? They all use
node.execute()
. Try the Hello, FABRIC example.There are videos that walk you through some: https://www.youtube.com/playlist?list=PL64VqyRjOwSFaDlX-bk7KXAiiCF3FP4vv
- This reply was modified 2 years ago by Paul Ruth.
@Ertza – If you are using an L2Bridge at a site then your VMs are directly connected with a VLAN on one of the Cisco switches. If have 6 nodes, you can find the host each of them is on. The topology is simply all of those hosts connected directly to a single Cisco 5500 or 5700. The Cisco switch model depends on which the site. Most sites have a 5700 a few have a 5500.
Do you need more info? Are you asking to know the model of the specific switch you are using?
- This reply was modified 2 years, 1 month ago by Paul Ruth.
@Yingqiang – Its hard to tell what is happening here. It looks like you are trying to ssh with iPython/magic command. I am not familiar with these yet. The error is
Host key verification failed
. However, it looks like you are passingStrictHostChecking=no
andUserKnownHostsFile=/dev/null
. These parameters should instruct the system to skip host key verification. I suspect those parameters are not being passed correctly.Are you able to ssh from a regular command line? Can you use the
node.execute()
command in fablib?@Donald – Which site did your VM go to? I suspect your problem is that your VM landed on CLEM, FIU, GPN, or UCSD. These sites are not fully deployed yet and the networks will always fail.
Try adding
avoid=['CLEM', 'FIU', 'GPN', 'UCSD']
to your calls toadd_node
orget_random_sites
. You might also avoid STAR and MAX while we debug an issue with their dataplane switches.October 19, 2022 at 4:05 pm in reply to: Management IP Invalid: None when running Python code in Jupyter #3330This is likely a bug that happens when the testbed is busy and a bit slow. What happens is that the slice becomes “StableOK” before the management IP is set on the node. Usually this happens so fast that the management IP is ready when you need it but occasionally there is enough of a delay to trigger this error.
There are a few ways to work around this.
One option is to wait a few seconds after the failure and then call fablib.get_slice(“<slice_name>”) again and it will pull a new copy the slice information that will have the management IP. Depending on when you do this ,you may need to re-call “post_boot_config” on the slice as well.
Another option is to install a new pre-release version of fablib which has a permanent fix for this. There are a bunch of bug fixes and some extra features too. Try:
pip install fabrictestbed-extensions==1.3.2rc3 --user
- This reply was modified 2 years, 1 month ago by Paul Ruth.
- This reply was modified 2 years, 1 month ago by Paul Ruth.
This is because that Nvidia site is not setup for IPv6 and FABRIC’s CLEM site uses IPv6 addresses on its management network.
One reasonable workaround for this is to use the NAT64 services described here: https://nat64.net/
There is an example FABRIC Jupyter notebook in your JupyterHub container called “Accessing IPv4 Sites from IPv6 Nodes” that shows how to set this up on a FABRIC node.
Paul
Which FABRIC site is your node on?
There is no limit per project. I suspect you are hitting another resource limit. What error are you getting?
You are asking for specific hardware (GPUs) and significant amounts of cores/ram. I suspect there is not enough cores or ram available on the hosts that have the GPUs (keep in mind other users are using them too).
Try reducing the cores/ram and it will probably work. Or try another site.
You can make those env vars persistent in any way you can on your machine. The important part is that they are set before you run your FABlib application OR if you are using JupyterLab on your machine, you will need have the vars set before you start JupyterLab.
On the FABRIC JupyterHub, we skip this issue by having the application read the fabric_rc file and set the vars. You might want to do this as well. The easiest way to do this is to put all your FABRIC config files in the same place as on the JupyterHub (i.e. ${HOME}/work/fabric_config/). You can also set a different location whey you create a FABlib manager.
I think you need to add your bastion key to the fabric_rc file. Refer to the “Confgure Environment” example note book. The second to last cell shows what needs to be set on the fabric_rc file.
September 30, 2022 at 10:34 am in reply to: What if institution is not in the list on the CI Logon page? #3243If the email addresses are different then there will be two accounts. This would be the case if you used an NCSA account and then a non-NCSA Google/GSuite account. I don’t think we can move an account to different identity provider.
If email addresses are the same, it might merge the accounts. This is the case if, for example, you used your NYU account and then tried to login with Google/GSuite using your NYU credentials. That said, this will probably produce unpredictable results. I would not recommend it.
If you have a list of non-Incommon users, you should create a ticket here: https://fabric-testbed.atlassian.net/servicedesk/customer/portal/2/group/8/create/18
September 30, 2022 at 8:25 am in reply to: What if institution is not in the list on the CI Logon page? #3234We can use Google accounts too. However, institutional accounts are strongly preferred and we will only approve Google accounts in situations where an institutional account is not possible.
If you have collaborators/students who need to use Google accounts, we need the project lead to provide a list of users/Google accounts that we should approve and we will approve them as they come in.
One other possibility is if an institution is not listed on the CIlogon page but the institution uses Google as a email provider. In this case, they can choose ‘Google’ from the dropdown and log in with their institutional ID. We can approve these accounts without extra information because the account is still tied to the institution.
Yeah, it looks like the version of ‘ip’ in Ubuntu 18 is not capable of json as output.
A workaround would be to use this execute command:
stdout, stderr = node.execute("ip route list default | cut -d ' ' -f 5") print(stdout)
In general, the fablib functions that get/set internal attributes of the VMs are just convenience wrappers around ssh commands. Most of these settings are determined by the OS or user configuration and are not controlled by FABRIC. Getting them can be tricky so we are trying to add some helper functions, but there will always be some corner cases. In this case, we can add a check for successful return of a json object and fall back to parsing the string but parsing stdout is always going to be fragile.
Thanks for letting us know about this.
The json it is returning seems like its one big string. Which image are you using?
I’d like to try that image and see if it is behaving in a way I didn’t expect.
The way this works is that it ssh’s to your node and gets the result of ‘ip addr route’ as json. It then digs through that json to find the name of the device.
Your error seems to be that the json that is returned is ‘None’. This likely means the ssh failed.
Can you do a “mynode.execute(‘hello, fabric’)” ? Does that fail too?
-
AuthorPosts