Paul Ruth

Forum Replies Created

Viewing 15 posts - 181 through 195 (of 274 total)

← 1 2 3 … 12 13 14 … 17 18 19 →

Author

Posts
June 8, 2022 at 7:41 pm in reply to: Can’t make first slice using Portal interface #2065
Paul Ruth
Keymaster
Also, the example notebooks are automatically pulled from the following link when you start your JupyterHub container.

https://github.com/fabric-testbed/jupyter-examples

If you want to fully reset your JupyterHub container you can “rm -rf ~/work/jupyter-examples”, then stop/start your container (File->Hub Control Panel), then log out/in.

Restarting the container with a missing ~/work/jupyter-examples dir will re-pull the github repo. Logging out/in will get a new token.

Of course, you can do all that manually by using git commands and pulling a new token from the portal.
June 8, 2022 at 7:34 pm in reply to: Can’t make first slice using Portal interface #2061
Paul Ruth
Keymaster
It should work as long as your FABRIC_TOKEN_LOCATION env var is pointed at a valid token before you import the fablib library. By default, when you log into the JupyterHub, a new token will be put at ~/.tokens.json and FABRIC_TOKEN_LOCATION will point there.

You can grab another token from the portal you can use it in another notebook by setting the FABRIC_TOKEN_LOCATION env var in the top cell of the other notebook. Also, if you want to install the fablib library somewhere else (like your laptop) you will need to explicitly set the token location to wherever you put the token.

We tried to make this really transparent to the user but I think we will need to add an explicit call to set the token in a notebook to make it more clear.
June 8, 2022 at 7:10 pm in reply to: Can’t make first slice using Portal interface #2058
Paul Ruth
Keymaster
The token should have been created when you authenticated against cilogon when logging into JupyterHub. By default only one token is live in a JupyterLab session and you can’t run multiple notebooks concurrently (there are work arounds but by default it won’t work).

Try hitting the “restart kernel” button in the JupyterNotebook (or select Kernel->Restart Kernel from the menu bar). This should re-read the token in the current notebook. If that doesn’t work you might have a corrupted token and will need to log out/in to re-authenticate and get a new token.
June 8, 2022 at 6:49 pm in reply to: Can’t make first slice using Portal interface #2056
Paul Ruth
Keymaster
Try the JupyterHub tab/link. After it opens, click the “start_here” notebook in the “jupyter_examples” folder. Then click the “Hello, FABRIC” example notebook.

After you get that running, try all the more advanced networking-focused notebooks.

Paul
June 8, 2022 at 9:23 am in reply to: Create L3 network error #2025
Paul Ruth
Keymaster
I think this is an error that is unnecessarily reported when fablib tries to setup a network interface that is not connected to a network. In your case, I suspect you are using only one of the two interfaces on a baremetal NIC.

Since you are not actually using the extra interface, this exception should not affect outcome the experiment. Try to just ignore this exception and see if your experiment works.

We will fix this by suppressing this exception in the future. Let me know if your experiment doesn’t work because of this exception and we can find another short-term solution.

Paul
June 7, 2022 at 3:38 pm in reply to: Who should we ask for adding more permission tags? #2021
Paul Ruth
Keymaster
I can add tags. What permissions do you need?
June 7, 2022 at 3:35 pm in reply to: SSH to the Fabric nodes: Permission denied (publickey) #2020
Paul Ruth
Keymaster
Make sure you save the keys you used somewhere that is persistent.

Paul
June 7, 2022 at 1:37 pm in reply to: SSH to the Fabric nodes: Permission denied (publickey) #2015
Paul Ruth
Keymaster
One of the new features is that your slice is associated with a specific project. You will need to specify the project you want to use.

You probably just need to add a line in the first cell of your notebook that looks something like this:

os.environ['FABRIC_PROJECT_ID']=<your_project_id>

You can get the project id from the project tab in the portal.

Check out the new example notebooks as a reference.

Paul
June 7, 2022 at 9:46 am in reply to: SSH to the Fabric nodes: Permission denied (publickey) #2012
Paul Ruth
Keymaster
I can’t really be sure but one guess might be that the keys you are using for the VM are no longer in your JupyterHub container.

During the maintenance, the default jupyterhub container software was updated. We you you logged into jupyterhub your container was rebuilt and only files in ~/work were preserved. The key you are using is /home/fabric/.ssh/id_rsa and was likely recreated with your new container.

These keys are automatically created and are really just to get new users started easily. You may want to explicitly create a key pair and store it somewhere in ~/work.
June 6, 2022 at 1:49 pm in reply to: SSH to the Fabric nodes: Permission denied (publickey) #2005
Paul Ruth
Keymaster
The testbed is currently undergoing maintenance while we deploy many new features. We expect to complete the maintenance at about 5pm eastern today. See the announcement here: https://learn.fabric-testbed.net/forums/topic/fabric-software-release-1-2-update-june-6/

Your VMs are still there and accessible. However, the FABlib library gets the public IP addresses from the FABIRC services which are in maintenance. If you know the IPs you can still ssh to your nodes from a terminal window.

Paul
May 18, 2022 at 8:35 am in reply to: Cannot resolve hostnames e.g. “kafka.scimma.org” at NCSA and STAR #1771
Paul Ruth
Keymaster
Generally, the difference between the IPv4/IPv6 sites is limited to the management networks. The management networks are intended to allow access to the nodes (ssh) and a way to pull software/configuration (yum, apt, git, scp, etc.). The management networks are not really intended/designed to support experiment traffic. However, we understand that many experiments need to peer with the public Internet and, for now, the management network is the only way to do that. The long-term solution is to use our FABnet networks that will peer with the public Internet. Currently, FABnet works, but does not peer with the public Internet but it will soon. You might checkout FABnet to see if it supports what you are trying to do.

Let us know if you need any help with designing your experiment. We are still in our development phase and there are a lot of features still to come.
May 17, 2022 at 1:24 pm in reply to: Cannot resolve hostnames e.g. “kafka.scimma.org” at NCSA and STAR #1768
Paul Ruth
Keymaster
Yeah, there are limitations with IPv6 vs. IPv4.

Many sites are available via IPv6 but for the ones that are not, temporary solution is to use a NAT64 as mentioned on https://nat64.net/
May 9, 2022 at 9:18 am in reply to: What is the node storage class –NVME/SSD? #1757
Paul Ruth
Keymaster
The default storage in one of your VMs is a virtual block device attached to the VM and mapped to the VM’s host’s local disk. The host’s local physical disk is an SSD (i.e. not an NVMe drive). The performance of these virtual block devices is a bit difficult to assess because the virtualization is optimizing a lot of things for you.

Some things that will affect the performance of your local disk:
- The VM’s operating system will cache files in memory using any free memory allocated to your VM. Repeatedly accessing any file that is smaller than your VM’s free memory will likely be very fast because it will only read/write to memory. You can explicitly skip this cache. For example, if you use the dd command you can add oflag=direct and it will skip this cache.
- The host’s hypervisor uses any free physical memory to cache any virtual disk blocks used by the VMs that it is hosting. Repeatedly accessing the same block of a disk in a VM will likely be very fast because it will only use blocks that the host hypervisor is caching in its memory. This cache is shared by all VMs on that host and the availability of this cache depends on the way other VMs are using their disk blocks. There is no way for a user to choose to skip this cache.
- The physical disk is an SSD. Even if you manage to write to the physical disk, it should be faster than if you were using a spinning disk.
- Currently, most hosts on FABRIC have a couple hundred Gigs of memory and are underutilized. In most cases, accessing a local disk will only result in memory reads/writes. This may change as the testbed utilization increases.
If you want to use an NVMe drive you need to add an NVMe drive component to your VM (like how you add an NIC component). Checkout the NVMe examples listed the in the “start here” Jupyter example notebook.

Paul
May 4, 2022 at 9:26 am in reply to: Fail to connect node due to “No route to host” #1750
Paul Ruth
Keymaster
Yes, the bigger and more complicated the slice, the more automation will make it easier to be resilient to any problems external to, or within, your experiment.

Longer-term FABRIC will have persistent storage for larger data sets. This is not available yet but watch out for it. This capability should make it easier to stage large data sets without relying on a persistent VMs. Generally, relying on persistent VMs for storing and serving large data sets can be unreliable.

Maybe I can help design the best way to store and serve your data. I have a few questions:
- How much data do you need to store?
- Where is the data currently stored (i.e. where do you need to copy it from?)
- How does your application consume/process the data? Does the app need all the data or just a subset? Does the app need the data locally or can it be served remotely?
I might have a few more questions but this will get us started.

thanks,

Paul
May 3, 2022 at 12:59 pm in reply to: Policies around /home/fabric/work in JupyterHub #1747
Paul Ruth
Keymaster
… and if you want to ssh/scp from you laptop to a slice you created on the JupyterHub, then you need to copy the slice/sliver keys to your laptop (or copy keys from your laptop to the JupyterHub and start a new slice with the env vars using those keys).
Author

Posts

Viewing 15 posts - 181 through 195 (of 274 total)

← 1 2 3 … 12 13 14 … 17 18 19 →