Home › Forums › FABRIC General Questions and Discussion › Potential fix on certain bash command in notebook
- This topic has 16 replies, 4 voices, and was last updated 2 years ago by Paul Ruth.
-
AuthorPosts
-
October 22, 2022 at 4:15 pm #3347
I have created a stableok slice but running that block of command yields issues. Is there something I should check?
October 23, 2022 at 4:53 pm #3350Similarly, I am working with the libraries on Linix/MAC. I had working code break as of Friday.
the submit emits errors:
WARNING:root:Failed to get layer: ‘str’ object has no attribute ‘layer
My attempt to set a route after submit
(Pdb) network.get_subnet()
WARNING:root:Failed to get layer: ‘str’ object has no attribute ‘layer’I’m no expert, but the network.slice.slice is serialized java and looks like all the network provisioning was done on the server, but maybe on Friday, there was a change on the FABRIC server-side on how all this was serialized when it was sent to the client, and not it’s not de-serializing?
October 24, 2022 at 11:35 am #3352@Donald – Which site did your VM go to? I suspect your problem is that your VM landed on CLEM, FIU, GPN, or UCSD. These sites are not fully deployed yet and the networks will always fail.
Try adding
avoid=['CLEM', 'FIU', 'GPN', 'UCSD']
to your calls toadd_node
orget_random_sites
. You might also avoid STAR and MAX while we debug an issue with their dataplane switches.October 24, 2022 at 11:43 am #3353@Yingqiang – Its hard to tell what is happening here. It looks like you are trying to ssh with iPython/magic command. I am not familiar with these yet. The error is
Host key verification failed
. However, it looks like you are passingStrictHostChecking=no
andUserKnownHostsFile=/dev/null
. These parameters should instruct the system to skip host key verification. I suspect those parameters are not being passed correctly.Are you able to ssh from a regular command line? Can you use the
node.execute()
command in fablib?October 24, 2022 at 4:05 pm #3354no, the test is
two nodes at NCSA,
two at TACC. Each with
a L3 IPV4 network
after the submit, I go to set up routes so that all nodes talk to all other nodes.
When I get the network object to find things I need for routing the network.get_gateway() and network.get_subnet() calls give the errors like
(Pdb) network.get_subnet()
WARNING:root:Failed to get layer: ‘str’ object has no attribute ‘layer’It broke on Friday, so I suspected a change in the downtime, but after Greg and I worked together today, I see the software still works at NERSC, so I am beginning to wonder about my (rickety) python install on my mac… So let me clean house a bit and see what I learn.
October 25, 2022 at 4:33 pm #3369@Donald for a controlled stable environment consider using Jupyter Hub. The added benefit is a reproducible experiment in a notebook. You can start with one of the notebooks we provide and modify it.
- This reply was modified 2 years, 1 month ago by Ilya Baldin.
October 25, 2022 at 5:17 pm #3371I am not aware of how to use node.execute(). However, my ssh command yield by the successful creation of a slice does not work properly. @Paul Ruth
- This reply was modified 2 years, 1 month ago by Yingqiang Yuan.
October 26, 2022 at 7:33 am #3374@Yingqiang – Have you tried the FABRIC example notebooks that come pre-installed in your JupyterHub environment? They all use
node.execute()
. Try the Hello, FABRIC example.There are videos that walk you through some: https://www.youtube.com/playlist?list=PL64VqyRjOwSFaDlX-bk7KXAiiCF3FP4vv
- This reply was modified 2 years, 1 month ago by Paul Ruth.
October 27, 2022 at 7:42 pm #3399@Paul I have run through the Hello,FABRIC example and it yields the same error where I can create a slice but cannot ssh into the nodes. I suspect it has something to do with the keys because the error message is “Permission denied (publickey,gssapi-keyex,gssapi-with-mic).kex_exchange_identification: Connection closed by remote host”. Regarding the key, I have created the key this August at the earliest and should not expire.
October 28, 2022 at 8:04 am #3400This definitely has something to do with your keys or the fabric_rc configuration file.
Can you confirm you bastion key is valid? See the trouble shooting section of this article: https://learn.fabric-testbed.net/knowledge-base/logging-into-fabric-vms/
Can you re-run the configure environment example notebook? This will recreate your fabric_rc file. Note that you will need to set the correct paths to where you uploaded your known good bastion key.
Let me know if either of these work.
October 28, 2022 at 2:33 pm #3402@ilya @paul, so I moved to NERSC where Greg Daues is working successfully (and where we’d work to tie our prototype to the larger system,.) The good news is it works! (well almost). If Greg and l land on the same cluster head node, we conflict in /temp/fabric. (one of us owns the directory, the other cannot access it) I put a two line change fablib.py in my distribution to be sensitive to an environment variable to keep working. (see below with some surrounding lines for th change) I see there is a framework that tries to get FABRIC_LOG_FILE from a configuration setup, But FABRIC_LOG_FILE is not set-able from the environment. AFAICT.
is there a supportable way or me to change the logfile/data directory in the current code? If not are my changes useful to you?all in all this is working out very well for us.
LOG_LEVELS = {
‘DEBUG’: logging.DEBUG,
‘INFO’: logging.INFO,
‘WARNING’: logging.WARNING,
‘ERROR’: logging.ERROR,
‘CRITICAL’: logging.CRITICAL
}default_fabric_rc = os.environ[‘HOME’] + ‘/work/fabric_config/fabric_rc’
default_log_level = ‘DEBUG’
default_data_dir = os.environ.get(‘FABRIC_DATA_DIR’,’/tmp/fablib’) #change near line526 fablib.py
default_log_file = os.path.join(default_data_dir,’fablib.log’) # changefablib_object = None
ssh_thread_pool_executor = None
October 28, 2022 at 4:04 pm #3404You can set it in your fabric_rc file.
I had a similar issue with this when I run multiple fablib apps at the same time. What I did was to change the export in the fabric_rc to the follow so that the log is in the same folder at the executable.
export FABRIC_LOG_FILE=fablib.log export FABRIC_LOG_LEVEL=DEBUG
Right now some of the config code is a not tested as well as could be. We will get this smoothed out soon.
- This reply was modified 2 years ago by Paul Ruth.
October 28, 2022 at 6:29 pm #3406@Paul I have tried both methods and here are the issues that I am running into. I ran the configure_enviroment and when I try to ssh into a host, it reports that “Warning: Identity file /home/fabric/.ssh/fabric_bastion not accessible: No such file or directory. yyuan76_0000024961@bastion-1.fabric-testbed.net: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).” I checked the link you recommended for me and configure_enviroment.ipynb and I believe that none of them involve in creating files in the ~/ssh directory. Should it be concerning
October 29, 2022 at 9:53 am #3407@Yingqiang – Can you confirm that you are using the newest version of the jupyter-examples? They should in your jupyterhyub container in a folder called “jupyter-examples-rel1.3.3”.
October 29, 2022 at 6:55 pm #3411@Paul Previously I was mainly following the instruction from the URL you just listed: https://learn.fabric-testbed.net/knowledge-base/logging-into-fabric-vms/ . Also, I have been trying to figure out the difference between that link and configure_enviroment.ipynb in jupyter-examples-rel1.3.3. It would be great if you can offer me some insight into the difference between them. Meanwhile, I apologize for not mentioning that there was one time that I can actually ssh into the node. Are you sure that I absolutely need to delete all the key pairs etc and start from scratch?
-
AuthorPosts
- You must be logged in to reply to this topic.