Forum Replies Created
-
AuthorPosts
-
It should work as long as your FABRIC_TOKEN_LOCATION env var is pointed at a valid token before you import the fablib library. By default, when you log into the JupyterHub, a new token will be put at ~/.tokens.json and FABRIC_TOKEN_LOCATION will point there.
You can grab another token from the portal you can use it in another notebook by setting the FABRIC_TOKEN_LOCATION env var in the top cell of the other notebook. Also, if you want to install the fablib library somewhere else (like your laptop) you will need to explicitly set the token location to wherever you put the token.
We tried to make this really transparent to the user but I think we will need to add an explicit call to set the token in a notebook to make it more clear.
The token should have been created when you authenticated against cilogon when logging into JupyterHub. By default only one token is live in a JupyterLab session and you can’t run multiple notebooks concurrently (there are work arounds but by default it won’t work).
Try hitting the “restart kernel” button in the JupyterNotebook (or select Kernel->Restart Kernel from the menu bar). This should re-read the token in the current notebook. If that doesn’t work you might have a corrupted token and will need to log out/in to re-authenticate and get a new token.
Try the JupyterHub tab/link. After it opens, click the “start_here” notebook in the “jupyter_examples” folder. Then click the “Hello, FABRIC” example notebook.
After you get that running, try all the more advanced networking-focused notebooks.
Paul
I think this is an error that is unnecessarily reported when fablib tries to setup a network interface that is not connected to a network. In your case, I suspect you are using only one of the two interfaces on a baremetal NIC.
Since you are not actually using the extra interface, this exception should not affect outcome the experiment. Try to just ignore this exception and see if your experiment works.
We will fix this by suppressing this exception in the future. Let me know if your experiment doesn’t work because of this exception and we can find another short-term solution.
Paul
I can add tags. What permissions do you need?
Make sure you save the keys you used somewhere that is persistent.
Paul
One of the new features is that your slice is associated with a specific project. You will need to specify the project you want to use.
You probably just need to add a line in the first cell of your notebook that looks something like this:
os.environ['FABRIC_PROJECT_ID']=<your_project_id>
You can get the project id from the project tab in the portal.
Check out the new example notebooks as a reference.
Paul
I can’t really be sure but one guess might be that the keys you are using for the VM are no longer in your JupyterHub container.
During the maintenance, the default jupyterhub container software was updated. We you you logged into jupyterhub your container was rebuilt and only files in ~/work were preserved. The key you are using is /home/fabric/.ssh/id_rsa and was likely recreated with your new container.
These keys are automatically created and are really just to get new users started easily. You may want to explicitly create a key pair and store it somewhere in ~/work.
The testbed is currently undergoing maintenance while we deploy many new features. We expect to complete the maintenance at about 5pm eastern today. See the announcement here: https://learn.fabric-testbed.net/forums/topic/fabric-software-release-1-2-update-june-6/
Your VMs are still there and accessible. However, the FABlib library gets the public IP addresses from the FABIRC services which are in maintenance. If you know the IPs you can still ssh to your nodes from a terminal window.
Paul
May 18, 2022 at 8:35 am in reply to: Cannot resolve hostnames e.g. “kafka.scimma.org” at NCSA and STAR #1771Generally, the difference between the IPv4/IPv6 sites is limited to the management networks. The management networks are intended to allow access to the nodes (ssh) and a way to pull software/configuration (yum, apt, git, scp, etc.). The management networks are not really intended/designed to support experiment traffic. However, we understand that many experiments need to peer with the public Internet and, for now, the management network is the only way to do that. The long-term solution is to use our FABnet networks that will peer with the public Internet. Currently, FABnet works, but does not peer with the public Internet but it will soon. You might checkout FABnet to see if it supports what you are trying to do.
Let us know if you need any help with designing your experiment. We are still in our development phase and there are a lot of features still to come.
May 17, 2022 at 1:24 pm in reply to: Cannot resolve hostnames e.g. “kafka.scimma.org” at NCSA and STAR #1768Yeah, there are limitations with IPv6 vs. IPv4.
Many sites are available via IPv6 but for the ones that are not, temporary solution is to use a NAT64 as mentioned on https://nat64.net/
The default storage in one of your VMs is a virtual block device attached to the VM and mapped to the VM’s host’s local disk. The host’s local physical disk is an SSD (i.e. not an NVMe drive). The performance of these virtual block devices is a bit difficult to assess because the virtualization is optimizing a lot of things for you.
Some things that will affect the performance of your local disk:
- The VM’s operating system will cache files in memory using any free memory allocated to your VM. Repeatedly accessing any file that is smaller than your VM’s free memory will likely be very fast because it will only read/write to memory. You can explicitly skip this cache. For example, if you use the
dd
command you can addoflag=direct
and it will skip this cache. - The host’s hypervisor uses any free physical memory to cache any virtual disk blocks used by the VMs that it is hosting. Repeatedly accessing the same block of a disk in a VM will likely be very fast because it will only use blocks that the host hypervisor is caching in its memory. This cache is shared by all VMs on that host and the availability of this cache depends on the way other VMs are using their disk blocks. There is no way for a user to choose to skip this cache.
- The physical disk is an SSD. Even if you manage to write to the physical disk, it should be faster than if you were using a spinning disk.
- Currently, most hosts on FABRIC have a couple hundred Gigs of memory and are underutilized. In most cases, accessing a local disk will only result in memory reads/writes. This may change as the testbed utilization increases.
If you want to use an NVMe drive you need to add an NVMe drive component to your VM (like how you add an NIC component). Checkout the NVMe examples listed the in the “start here” Jupyter example notebook.
Paul
Yes, the bigger and more complicated the slice, the more automation will make it easier to be resilient to any problems external to, or within, your experiment.
Longer-term FABRIC will have persistent storage for larger data sets. This is not available yet but watch out for it. This capability should make it easier to stage large data sets without relying on a persistent VMs. Generally, relying on persistent VMs for storing and serving large data sets can be unreliable.
Maybe I can help design the best way to store and serve your data. I have a few questions:
- How much data do you need to store?
- Where is the data currently stored (i.e. where do you need to copy it from?)
- How does your application consume/process the data? Does the app need all the data or just a subset? Does the app need the data locally or can it be served remotely?
I might have a few more questions but this will get us started.
thanks,
Paul
… and if you want to ssh/scp from you laptop to a slice you created on the JupyterHub, then you need to copy the slice/sliver keys to your laptop (or copy keys from your laptop to the JupyterHub and start a new slice with the env vars using those keys).
Eventually, the sliver keys will be managed by the portal and will be automatically deployed to the VMs. For now, you need to specify a key pair in the API.
At the top of every example notebook there is this:
Set the keypair FABRIC will install in your slice. os.environ['FABRIC_SLICE_PRIVATE_KEY_FILE']=os.environ['HOME']+'/.ssh/id_rsa' os.environ['FABRIC_SLICE_PUBLIC_KEY_FILE']=os.environ['HOME']+'/.ssh/id_rsa.pub'
In the JupyterHub the ~/.ssh/id_rsa key pair is created for you and used by default. If you set the API up somewhere else (e.g. your laptop), you will need to specify a path to a pair of keys (I’m just now noticing that it calls it “slice” key here and ‘sliver’ key on the portal).
When key management is setup on the portal you will only need to specify the name of the key pair to use.
For now, ssh/scp from a command line needs your bastion key in the ~/.ssh/config file and the “slice/sliver” key following the -i on the command line.
- This reply was modified 2 years, 6 months ago by Paul Ruth.
- This reply was modified 2 years, 6 months ago by Paul Ruth.
- This reply was modified 2 years, 6 months ago by Paul Ruth.
- The VM’s operating system will cache files in memory using any free memory allocated to your VM. Repeatedly accessing any file that is smaller than your VM’s free memory will likely be very fast because it will only read/write to memory. You can explicitly skip this cache. For example, if you use the
-
AuthorPosts