Policies around /home/fabric/work in JupyterHub

This topic has 8 replies, 2 voices, and was last updated 3 years, 3 months ago by Fraida Fund.

Viewing 9 posts - 1 through 9 (of 9 total)

Author

Posts
May 3, 2022 at 8:50 am #1731
Fraida Fund
Participant
Hi FABRIC team,

I understand that each user has a persistent /home/fabric/work directory in the JupyterHub environment. Can you clarify the policies surrounding this space?

Does each user have a quota? (The current space will fill up very fast if you are installing additional Python modules and/or transferring data off FABRIC nodes to do data analysis in Jupyter.)

Is this space considered “stable” or should we assume that anything we keep in /work could disappear without warning?

Thanks!
May 3, 2022 at 9:09 am #1733
Fraida Fund
Participant
Related question 😉

How do I copy a file off a FABRIC node if the file does not fit in /home/fabric/work?
May 3, 2022 at 9:28 am #1734
Paul Ruth
Keymaster
The /home/fabric/work dir in the JupyterHub environment is intended to be persistent storage of code, notebooks, scripts, etc. related to configuring and running experiments. This includes adding extra Python modules. However, it is not really intended to support large data sets.

We are currently implementing a service for storing larger data sets in FABRIC. This service will look more like a persistent (or long-lived) storage sliver that can be attached and re-attached to many slices over time.

It is also important to note that, currently, the JupyterHub is hosted by a local machine at RENCI but we are in the process of moving it to an existing production Kubernetes cluster managed by our IT staff. This should increase the reliability and performance of the whole system. We are still discussing the resource caps that will be possible on this new cluster but the intent is to support code and very small data but not large data sets. Also, the transition to the production Kubernetes will be tricky. I would recommend creating backups of any valuable data before then and probably for a little while after that until we discover/fix any bugs in the new setup.

When this is all finalized, the target is to have Jupyter environments with persistent storage quotas big enough to install any packaged needed to run and manage experiments. Larger data sets and should use the yet-to-be-deployed persistent storage.
May 3, 2022 at 9:31 am #1735
Fraida Fund
Participant
Thanks!

Will we be able to:

* access data in persistent storage directly from the Jupyter environment, without necessarily having an active slice?

* use something like fablib download_file to transfer a file that is too big for “work” directly from a FABRIC node to the persistent storage?

* access the persistent storage space directly (without having to copy stuff to “work”) to do data analysis in the same notebook as the rest of the experiment?
May 3, 2022 at 11:50 am #1736
Paul Ruth
Keymaster
RE: copying large files off VMs.

Option 1: Transfer through the bastion host

All VMs have a management network that you are using to ssh to your VMs. You probably noticed that we are requiring users to jump through a bastion host to access their VMs (this increases security).

The easiest way to transfer data to/from a VM is with scp (or similar tool). The trick is that you still need to jump through the bastion host. To do this you will need to configure the external machine (your laptop, etc.) to jump use the bastion host for regular ssh.

SSH config info can be found here: https://learn.fabric-testbed.net/knowledge-base/logging-into-fabric-vms/

The main thing you need to get from that document is that you need to create (or add to) your ~/.ssh/config. Add config info about the bastion hosts. Basic config could look like this:
```
Host bastion-*.fabric-testbed.net
User <bastion_username>
IdentityFile /path/to/bastion/private/key
```
Then you should be able to ssh like this:

ssh -i /path/to/vm/private/key -J bastion-1.fabric-testbed.net <vm_username>@<vm_ip>

You should also be able to scp like this:

scp -i /path/to/vm/private/key -o "ProxyJump bastion-1.fabric-testbed.net" <vm_username>@<vm_ip>:file.txt .

Note that if the VM IP is IPv6 then the colons confuse the parsing and you need to do something like this (escape chars are necessary on OSX but not bash. Using escape chars works either way. I’m not sure how to do this on Windows):

scp -i /path/to/vm/private/key -o "ProxyJump bastion-1.fabric-testbed.net" ubuntu@\[2001:400:a100:3030:f816:3eff:fe93:a1a0\]:file.txt .

Option 2: Push directly from the VM

You can push data from a VM to a service on the Internet. The tricky part with doing this is that some VMs are connect to the IPv4 Internet using NAT, but most use IPv6. If you are using an IPv4 site, your transfer will need to work through a NAT. If you are on an IPv6 site, your transfer will need to be to a service that supports IPv6.

Also, FABRIC is itself an IPv6 AS and will soon peer with the public IPv6 Internet. We also have a small IPv4 allocation. Eventually, it will be possible to connect experiment dataplanes to the public Internet (although security concerns will likely limit what you can do with this feature).
- This reply was modified 3 years, 3 months ago by Paul Ruth.
- This reply was modified 3 years, 3 months ago by Paul Ruth.
- This reply was modified 3 years, 3 months ago by Paul Ruth.
- This reply was modified 3 years, 3 months ago by Paul Ruth.
May 3, 2022 at 12:03 pm #1741
Fraida Fund
Participant
Thanks. I am somehow under the impression that right now, the “sliver key” in the FABRIC portal doesn’t do anything – is that right or did I misunderstand that? So for the moment, the only key installed in a sliver is one from the Jupyter environment, and I can’t ssh/scp from my laptop.
May 3, 2022 at 12:16 pm #1742
Paul Ruth
Keymaster
Eventually, the sliver keys will be managed by the portal and will be automatically deployed to the VMs. For now, you need to specify a key pair in the API.

At the top of every example notebook there is this:
```
 Set the keypair FABRIC will install in your slice.
os.environ['FABRIC_SLICE_PRIVATE_KEY_FILE']=os.environ['HOME']+'/.ssh/id_rsa'
os.environ['FABRIC_SLICE_PUBLIC_KEY_FILE']=os.environ['HOME']+'/.ssh/id_rsa.pub'
```
In the JupyterHub the ~/.ssh/id_rsa key pair is created for you and used by default. If you set the API up somewhere else (e.g. your laptop), you will need to specify a path to a pair of keys (I’m just now noticing that it calls it “slice” key here and ‘sliver’ key on the portal).

When key management is setup on the portal you will only need to specify the name of the key pair to use.

For now, ssh/scp from a command line needs your bastion key in the ~/.ssh/config file and the “slice/sliver” key following the -i on the command line.
- This reply was modified 3 years, 3 months ago by Paul Ruth.
- This reply was modified 3 years, 3 months ago by Paul Ruth.
- This reply was modified 3 years, 3 months ago by Paul Ruth.
May 3, 2022 at 12:59 pm #1747
Paul Ruth
Keymaster
… and if you want to ssh/scp from you laptop to a slice you created on the JupyterHub, then you need to copy the slice/sliver keys to your laptop (or copy keys from your laptop to the JupyterHub and start a new slice with the env vars using those keys).
May 3, 2022 at 1:36 pm #1748
Fraida Fund
Participant
Cool, thanks, that should work for now.
Author

Posts

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic.