Forum Replies Created
-
AuthorPosts
-
Praveen,
NAS and persistent storage are the same thing. The portal expects the volume name to match that of the volume that was created for you. The reason it fails is because you do not have volumes ‘s1’ or ‘s2’ on whichever site you are using.
STAR and MAX are available again. They are under watch to see if the vendor bug shows up again, but experimenters should feel free to use it and report any problems you may see.
Praveen,
The NAS is the persistent storage. There is no other option currently available. If you need persistent storage at more sites, please request it.
Praveen,
Persistent storage does work and is in use by a number of users. We will check what is going on with the portal provisioning, just be sure to use the correct sites and volume names – you have to remember that we allocate persistent storage ahead of time on the specific sites you request via the ticket system. If you try to use it on another site where this volume hasn’t been allocated, the provisioning will fail.
Sean,
You are correct that your project needs
Net.PortMirror
tag in order to access this service (project owner needs to request it through the portal).In general we need to understand specifically what your usecase is. PortMirror service obviously is quite powerful in that it allows to mirror traffic on any port into another port. Only the port you are mirroring *to* has to be in your slice (expected to be a dedicated 10/25 or 100G port), the port you are mirroring from can be any port on the switch within a given site. Before you start mirroring traffic belonging to others we need to understand the purpose and the scope (and also have you test port mirroring on your own slices first).
The port mirroring service is not yet well-integrated into the fablib, it is available as a lower-level library call like so (presumably myinterface is the interface of a dedicated card):
myinterface.get_slice().get_fim_topology().add_port_mirror_service(name=name, from_interface_name=port_name, to_interface=self.get_fim_interface())
We really do need to understand your usecase though before we proceed to make sure you have the right tools.
My best suggestion is to check the output of
lspci
command once you have a slice with the card to get the version of hardware and firmware and then to look through Mellanox documentation on their website.These sites are in maintenance mode and should not be usable i.e. produce errors when anyone not empowered to perform acceptance testing tries to use them. We are adding features for fablib to automatically avoid sites in maintenance in the future.
February 21, 2023 at 11:51 am in reply to: Unable to allocate resources after the updates/maintenance. #3875Praveen (and the team), just to close the loop and post a version of my private reply:
Individual FABRIC sites are not as large as CloudLab. They typically have between 3 and 6 worker nodes. Each worker has 64 cores. If you ask for VMs of more than 32 cores, that means at most one VM can be accommodated by a worker node. For your storage requirements I suspect you should rely on persistent storage in some cases – not every worker internal storage is the same, so some combinations of core/ram/disk are not possible on all workers, just some. We can create multiple persistent volumes for you on each site if required.
Another alternative is to use a combination of resources from FABRIC and other testbeds. Chameleon@Chicago is already reachable and we will be shortly adding access to Chameleon@TACC (a much larger installation) as well as CloudLab@Utah, Wisconsin and Clemson locations.
Yes! The challenge of updating images is that we should not remove or significantly change the images under existing labels, so some form of versioning is necessary with a history of versions going back for some predetermined period of time. This way if you created an experiment with image ubuntu_20_ver_1.0, that image is immutable for the duration of its lifetime (with the exception of mandatory security updates, which must be applied to preserve facility security).
This is exactly why we have so far not rolled out this feature as it requires some thinking and careful deployment.
Brandon,
This is an excellent point. We are discussing within the team both the question of keeping the existing images updated and allowing experimenters to provide their own images. There are of course many pitfalls with the latter, as we test the images to make sure they boot properly and remote debugging of boot issues is difficult. That said we have this in our sights.
At the very least we plan to get on a regular cadence with updating the images we host (we’ve just been too busy to do it) and potentially we will start allowing experimenters to supply their own images as well.
Just as a form of explanation – we host the Jupyter Hub in Google Cloud, which costs real $$s allocated to us from NSF via a project named CloudBank. We are still evaluating the true costs of running it in its current configuration (so we can more accurately project future costs). We may revise the amount of disk space and other resources each notebook server gets, however we are constrained by the budget and this will not be a decision we will be making in the near term.
In general the Hub is not intended as a place to park or transfer large files.
We will open an internal ticket about it. The VFs are created on the worker node at boot and then given out by the Control Framework to the virtual machines and we need to check what options are set on them at creation time (typically they cannot be changed once created).
@yoursunny may be right and it may or may not be possible for us to change this behavior – we will report here once we know more. Thank you all for your feedback.
Fraida,
We do basically two types of changes:
1. Underlying control framework changes (which generally bring forward new features, but they aren’t available to experimenters until the second change type happens), which are installed on our infrastructure and may affect the look/feel of the portal.
2. FABlib changes to make CF features from above available to users – they generally get installed into a new version of a notebook container image. They affect how the notebooks are run (although we try to keep the changes backward compatible as much as we can).
The change we did last week was of Type 1 as it were, and thus wasn’t going to impact anything you were already doing. The problem you saw is likely a coincidence with another change of FABlib (type 2) that happened earlier. In the coming weeks Paul will be bringing updated version(s) of FABlib that support the features of CF 1.4 and there will be separate announcements about it.
- This reply was modified 1 year, 11 months ago by Ilya Baldin.
- This reply was modified 1 year, 11 months ago by Ilya Baldin.
January 18, 2023 at 10:34 am in reply to: Is it possible to compile p4 program and do experiments with programmable switch #3618We are working on it. We are port-constrained on our dataplane switches in a number of desirable locations. Once we are able to resolve those constraints we should be able to ship the switches out and add the necessary code support in the control framework to enable working with them.
I’m assuming you or your professor have signed the SLACA with Intel and have access to their compiler tools. This is not something we will be providing – we will be providing P4 switches with runtime that allows you remotely to load the bytecode, but compiling the code using Intel-licensed tools will be the user’s responsibility.
January 17, 2023 at 10:03 pm in reply to: Is it possible to compile p4 program and do experiments with programmable switch #3616Also I moved this topic to a General Questions and Discussion forum.
-
AuthorPosts