Forum Replies Created
-
AuthorPosts
-
October 1, 2024 at 12:12 pm in reply to: getting ChannelException: ChannelException(2, ‘Connect failed’) Error #7591
@Tejas – Notebook is not attached. Could you please share it? You may have to change the file extension to .txt
Thank you Kim! Could you also please open a terminal on JH and try SSH to the VM using the command displayed above?
ssh -i /home/fabric/work/fabric_config/slice_key -F /home/fabric/work/fabric_config/ssh_config rocky@137.222.230.26
Thanks,
KomalHi Kim,
Both of your VMs are accessible via SSH. I can verify with Nova SSH key. I also verified that your SSH keys are pushed in to the VM.
Could you please share the output of the following?
Please run the following snippet in a notebook cell before the configure step.
If this does show the management IPs for the VMs, please try running the configure cells again and let us know if it works.
slice = fablib.get_slice(slice_name)slice.show();
slice.list_nodes();
Thanks,
KomalSeptember 27, 2024 at 11:16 am in reply to: getting ChannelException: ChannelException(2, ‘Connect failed’) Error #7570Hi Tejas,
It looks like SSH connection to bastion host is failing. Could you please re-rerun this notebook
jupyter-examples-rel1.7.*/configure_and_validate.ipynb
? Please retry your notebook after that.Please let us know if the issue persists.
Thanks,
Komal
Hi Sepideh,
Disk usage of your container is 100%. You seem to have
output_file
which seems to be taking majority of the space.The
/home/fabric/work
directory (1GB) in the JupyterHub environment serves as persistent storage for code, notebooks, scripts, and other materials related to configuring and running experiments, including the addition of extra Python modules. However, it is not designed to handle large datasets or output files.Please consider removing un-needed files or move
output_file
to avoid this error.Additionally, if you need more disk space, I recommend setting up your own FABRIC environment on your laptop or machine to run your experiments. This approach will allow you to capture more data and reduce reliance on Jupyter Hub. Consider configuring a local Python environment for the FABRIC API as described here, and run the notebooks locally.
fabric@spring:work-100%$ du -sh *
20K 5_clients_1_server.ipynb
60K fabric_config
228K hipft.ipynb
95M jupyter-examples-rel1.5.5
96M jupyter-examples-rel1.6.1
28K lost+found
686M output_file
82M rel1.7.0.tar.gz4q_e8dq5.tmp
0 rel1.7.0.tar.gzceav1uzw.tmp
0 rel1.7.0.tar.gzds9a6279.tmp
0 rel1.7.0.tar.gzgmqadnvv.tmp
0 rel1.7.0.tar.gziuc6xzxa.tmp
Thanks,
Komal
Hi Ilya,
Yes, it’s possible to pin vCPUs to physical cores. The following APIs on the node class may be of interest:
–
node.get_cpu_info()
provides information about the VM’s CPU in relation to the host.
– You can pin specific vCPUs to physical cores usingnode.poa(operation="cpupin", vcpu_cpu_map=vcpu_cpu_map)
.In this case,
vcpu_cpu_map
is a dictionary mapping each vCPU to the desired physical core.For more details, please refer to the documentation here. Let us know if you have any questions or encounter any issues!
Thanks,
KomalHi Khawar,
Ubuntu 18.04 LTS reached the end of its standard support on May 31, 2023, and is no longer available on FABRIC. Thank you for bringing the list of images to our attention. We will update it to reflect this change.
Thanks,
Komal
September 2, 2024 at 9:45 am in reply to: Subject: Issues with SSH Access to Fabric Nodes for Slice IDs: 34c41dad-c9f3-431 #7506Hi Yuanjun,
I suspect the bastion keys are expired. These keys are only on your JH container and are not pushed to your VMs. Sliver keys i.e. VM keys should not be affected. The error indicated above for SSH failure indicates login to bastion server was denied and hence the suggestion.
The error observed in the multi-processing pool cleanup can be ignored. We will address that error but it should not impact regeneration of the keys. Could you please see if you are able to SSH to your VMs?
Bastion key expiry can also be verified from the portal via: Experiments -> Manage SSH Keys.
Thanks,
Komal
Hi,
Please refer to the site details to view the available GPU models. For instance, the STAR site offers RTX600 and Tesla T4 GPUs. You can verify this information at [STAR site details](https://portal.fabric-testbed.net/sites/STAR).
Thanks,
Komal
September 2, 2024 at 8:11 am in reply to: Subject: Issues with SSH Access to Fabric Nodes for Slice IDs: 34c41dad-c9f3-431 #7502Hi Yuanjun,
I suspect your bastion keys have expired. Could you please re-run the notebook to regenerate your bastion keys
jupyter-examples-rel1.7.0/configure_and_validate.ipynb
?Please let us know if you still run into errors.
Thanks,
Komal
Hi Tianrui,
Users do not have
sudo
access on the JupyterHub container. To install Python packages, please use the commandpip install <package name> --user
.Thanks,
Komal
August 29, 2024 at 12:56 pm in reply to: Unable to (consistently) reach FABNetv4Ext addresses from outside FABRIC #7483Please consider checkingthe following examples on how to use these services:
Feel free to reach out in case you run into issues or have queries.
Thanks,
Komal
August 29, 2024 at 12:25 pm in reply to: Unable to (consistently) reach FABNetv4Ext addresses from outside FABRIC #7480Hi Sourya,
You will need permission for FabNetv*Ext services to enable public connections to your VMs. This request can be made by your Project Lead. For more details, please check here!
Thanks,
KomalHi Prateek,
This looks like a bug, we have a race condition which is preventing the updates to the Slice Graph Model. I will work on addressing this. For now as a workaround, you can determine the IP Addresses via slice commander using show commands. Refer https://learn.fabric-testbed.net/knowledge-base/using-slicecommander-with-fabric/ for slice-commander usage.
Alternatively, you can get the sliver information also via
for s in slice.get_slivers():
print(s._sliver) # This is a json object will provide the needed information
Thanks,
Komal
Hi Prateek,
Could you please share your slice ID?
Thanks,
Komal
-
AuthorPosts