1. Ilya Baldin

Ilya Baldin

Forum Replies Created

Viewing 15 posts - 166 through 180 (of 285 total)
  • Author
    Posts
  • in reply to: Local NAS storage/VM #4062
    Ilya Baldin
    Participant

      Praveen,

      NAS and persistent storage are the same thing. The portal expects the volume name to match that of the volume that was created for you. The reason it fails is because you do not have volumes ‘s1’ or ‘s2’ on whichever site you are using.

      in reply to: STAR and MAX unavailable [RESOLVED (we think)] #4055
      Ilya Baldin
      Participant

        STAR and MAX are available again. They are under watch to see if the vendor bug shows up again, but experimenters should feel free to use it and report any problems you may see.

        in reply to: Local NAS storage/VM #4054
        Ilya Baldin
        Participant

          Praveen,

          The NAS is the persistent storage. There is no other option currently available. If you need persistent storage at more sites, please request it.

          in reply to: Local NAS storage/VM #4045
          Ilya Baldin
          Participant

            Praveen,

            Persistent storage does work and is in use by a number of users. We will check what is going on with the portal provisioning, just be sure to use the correct sites and volume names – you have to remember that we allocate persistent storage ahead of time on the specific sites you request via the ticket system. If you try to use it on another site where this volume hasn’t been allocated, the provisioning will fail.

            in reply to: Fabric Port Mirroring Service #3954
            Ilya Baldin
            Participant

              Sean,

              You are correct that your project needs Net.PortMirror tag in order to access this service (project owner needs to request it through the portal).

              In general we need to understand specifically what your usecase is. PortMirror service obviously is quite powerful in that it allows to mirror traffic on any port into another port. Only the port you are mirroring *to* has to be in your slice (expected to be a dedicated 10/25 or 100G port), the port you are mirroring from can be any port on the switch within a given site. Before you start mirroring traffic belonging to others we need to understand the purpose and the scope (and also have you test port mirroring on your own slices first).

              The port mirroring service is not yet well-integrated into the fablib, it is available as a lower-level library call like so (presumably myinterface is the interface of a dedicated card):

              myinterface.get_slice().get_fim_topology().add_port_mirror_service(name=name, from_interface_name=port_name, to_interface=self.get_fim_interface())

              We really do need to understand your usecase though before we proceed to make sure you have the right tools.

              in reply to: Obtain the NICs’ CRC specification #3935
              Ilya Baldin
              Participant

                My best suggestion is to check the output of lspci command once you have a slice with the card to get the version of hardware and firmware and then to look through Mellanox documentation on their website.

                in reply to: ALERT : SRI/RUTG Site in Acceptance Testing – AVOID #3879
                Ilya Baldin
                Participant

                  These sites are in maintenance mode and should not be usable i.e. produce errors when anyone not empowered to perform acceptance testing tries to use them. We are adding features for fablib to automatically avoid sites in maintenance in the future.

                  in reply to: Unable to allocate resources after the updates/maintenance. #3875
                  Ilya Baldin
                  Participant

                    Praveen (and the team), just to close the loop and post a version of my private reply:

                    Individual FABRIC sites are not as large as CloudLab. They typically have between 3 and 6 worker nodes. Each worker has 64 cores. If you ask for VMs of more than 32 cores, that means at most one VM can be accommodated by a worker node. For your storage requirements I suspect you should rely on persistent storage in some cases – not every worker internal storage is the same, so some combinations of core/ram/disk are not possible on all workers, just some. We can create multiple persistent volumes for you on each site if required.

                    Another alternative is to use a combination of resources from FABRIC and other testbeds. Chameleon@Chicago is already reachable and we will be shortly adding access to Chameleon@TACC (a much larger installation) as well as CloudLab@Utah, Wisconsin and Clemson locations.

                     

                    in reply to: Updating the Default VM Images #3830
                    Ilya Baldin
                    Participant

                      Yes! The challenge of updating images is that we should not remove or significantly change the images under existing labels, so some form of versioning is necessary with a history of versions going back for some predetermined period of time. This way if you created an experiment with image ubuntu_20_ver_1.0, that image is immutable for the duration of its lifetime (with the exception of mandatory security updates, which must be applied to preserve facility security).

                      This is exactly why we have so far not rolled out this feature as it requires some thinking and careful deployment.

                      in reply to: Updating the Default VM Images #3824
                      Ilya Baldin
                      Participant

                        Brandon,

                        This is an excellent point. We are discussing within the team both the question of keeping the existing images updated and allowing experimenters to provide their own images. There are of course many pitfalls with the latter, as we test the images to make sure they boot properly and remote debugging of boot issues is difficult. That said we have this in our sights.

                        At the very least we plan to get on a regular cadence with updating the images we host (we’ve just been too busy to do it)  and potentially we will start allowing experimenters to supply their own images as well.

                        in reply to: File save error and Load file error #3766
                        Ilya Baldin
                        Participant

                          Just as a form of explanation – we host the Jupyter Hub in Google Cloud, which costs real $$s allocated to us from NSF via a project named CloudBank. We are still evaluating the true costs of running it in its current configuration (so we can more accurately project future costs). We may revise the amount of disk space and other resources each notebook server gets, however we are constrained by the budget and this will not be a decision we will be making in the near term.

                          In general the Hub is not intended as a place to park or transfer large files.

                          in reply to: L2Bridge without MAC learning? #3701
                          Ilya Baldin
                          Participant

                            We will open an internal ticket about it. The VFs are created on the worker node at boot and then given out by the Control Framework to the virtual machines and we need to check what options are set on them at creation time (typically they cannot be changed once created).

                            @yoursunny may be right and it may or may not be possible for us to change this behavior – we will report here once we know more. Thank you all for your feedback.

                            in reply to: Broken get_ssh_command #3698
                            Ilya Baldin
                            Participant

                              Fraida,

                              We do basically two types of changes:

                              1. Underlying control framework changes (which generally bring forward new features, but they aren’t available to experimenters until the second change type happens), which are installed on our infrastructure and may affect the look/feel of the portal.

                              2. FABlib changes to make CF features from above available to users – they generally get installed into a new version of a notebook container image. They affect how the notebooks are run (although we try to keep the changes backward compatible as much as we can).

                              The change we did last week was of Type 1 as it were, and thus wasn’t going to impact anything you were already doing. The problem you saw is likely a coincidence with another change of FABlib (type 2) that happened earlier. In the coming weeks Paul will be bringing updated version(s) of FABlib that support the features of CF 1.4 and there will be separate announcements about it.

                              • This reply was modified 1 year, 11 months ago by Ilya Baldin.
                              • This reply was modified 1 year, 11 months ago by Ilya Baldin.
                              Ilya Baldin
                              Participant

                                We are working on it. We are port-constrained on our dataplane switches in a number of desirable locations. Once we are able to resolve those constraints we should be able to ship the switches out and add the necessary code support in the control framework to enable working with them.

                                I’m assuming you or your professor have signed the SLACA with Intel and have access to their compiler tools. This is not something we will be providing – we will be providing P4 switches with runtime that allows you remotely to load the bytecode, but compiling the code using Intel-licensed tools will be the user’s responsibility.

                                Ilya Baldin
                                Participant

                                  Also I moved this topic to a General Questions and Discussion forum.

                                Viewing 15 posts - 166 through 180 (of 285 total)