1. Paul Ruth

Paul Ruth

Forum Replies Created

Viewing 15 posts - 76 through 90 (of 273 total)
  • Author
    Posts
  • in reply to: Potential fix on certain bash command in notebook #3374
    Paul Ruth
    Keymaster

      @Yingqiang  –  Have you tried the FABRIC example notebooks that come pre-installed in your JupyterHub environment?  They all use node.execute().   Try the Hello, FABRIC example.

      There are videos that walk you through some: https://www.youtube.com/playlist?list=PL64VqyRjOwSFaDlX-bk7KXAiiCF3FP4vv

      • This reply was modified 2 years ago by Paul Ruth.
      in reply to: Get Physical Topology of Slice #3358
      Paul Ruth
      Keymaster

        @Ertza – If you are using an L2Bridge at a site then your VMs are directly connected with a VLAN on one of the Cisco switches.  If have 6 nodes, you can find the host each of them is on. The topology is simply all of those hosts connected directly to a single Cisco 5500 or 5700.  The Cisco switch model depends on which the site.  Most sites have a 5700 a few have a 5500.

        Do you need more info? Are you asking to know the model of the specific switch you are using?

        • This reply was modified 2 years, 1 month ago by Paul Ruth.
        in reply to: Potential fix on certain bash command in notebook #3353
        Paul Ruth
        Keymaster

          @Yingqiang – Its hard to tell what is happening here.  It looks like you are trying to ssh with iPython/magic command. I am not familiar with these yet. The error is Host key verification failed. However, it looks like you are passing StrictHostChecking=no and UserKnownHostsFile=/dev/null. These parameters should instruct the system to skip host key verification. I suspect those parameters are not being passed correctly.

          Are you able to ssh from a regular command line? Can you use the node.execute() command in fablib?

          in reply to: Potential fix on certain bash command in notebook #3352
          Paul Ruth
          Keymaster

            @Donald – Which site did your VM go to? I suspect your problem is that your VM landed on CLEM, FIU, GPN, or UCSD.  These sites are not fully deployed yet and the networks will always fail.

            Try adding avoid=['CLEM', 'FIU', 'GPN', 'UCSD'] to your calls to add_node or get_random_sites.   You might also avoid STAR and MAX while we debug an issue with their dataplane switches.

            Paul Ruth
            Keymaster

              This is likely a bug that happens when the testbed is busy and a bit slow.   What happens is that the slice becomes “StableOK” before the management IP is set on the node.   Usually this happens so fast that the management IP is ready when you need it but occasionally there is enough of a delay to trigger this error.

              There are a few ways to work around this.

              One option is to wait a few seconds after the failure and then call fablib.get_slice(“<slice_name>”) again and it will pull a new copy the slice information that will have the management IP.   Depending on when you do this ,you may need to re-call “post_boot_config” on the slice as well.

              Another option is to install a new pre-release version of fablib which has a permanent fix for this.  There are a bunch of bug fixes and some extra features too. Try:

              pip install fabrictestbed-extensions==1.3.2rc3 --user

               

              • This reply was modified 2 years, 1 month ago by Paul Ruth.
              • This reply was modified 2 years, 1 month ago by Paul Ruth.
              in reply to: Destination Host Unreachable From Node #3323
              Paul Ruth
              Keymaster

                This is because that Nvidia site is not setup for IPv6 and FABRIC’s CLEM site uses IPv6 addresses on its management network.

                One reasonable workaround for this is to use the NAT64 services described here: https://nat64.net/

                There is an example FABRIC Jupyter notebook in your JupyterHub container called  “Accessing IPv4 Sites from IPv6 Nodes” that shows how to set this up on a FABRIC node.

                Paul

                in reply to: Destination Host Unreachable From Node #3321
                Paul Ruth
                Keymaster

                  Which FABRIC site is your node on?

                  in reply to: Create 8-node slice in Fabric #3272
                  Paul Ruth
                  Keymaster

                    There is no limit per project.  I suspect you are hitting another resource limit.   What error are you getting?

                    You are asking for specific hardware (GPUs) and significant amounts of cores/ram.  I suspect there is not enough cores or ram available on the hosts that have the GPUs (keep in mind other users are using them too).

                    Try reducing the cores/ram and it will probably work.  Or try another site.

                    in reply to: Local FABRIC Setup on Mac #3271
                    Paul Ruth
                    Keymaster

                      You can make those env vars persistent in any way you can on your machine.  The important part is that they are set before you run your FABlib application OR if you are using JupyterLab on  your machine, you will need have the vars set before you start JupyterLab.

                      On the FABRIC JupyterHub, we skip this issue by having the application read the fabric_rc file and set the vars.  You might want to do this as well.  The easiest way to do this is to put all your FABRIC config files in the same place as on the JupyterHub (i.e. ${HOME}/work/fabric_config/).  You can also set a different location whey you create a FABlib manager.

                      in reply to: Local FABRIC Setup on Mac #3266
                      Paul Ruth
                      Keymaster

                        I think you need to add your bastion key  to the fabric_rc file.   Refer to the “Confgure Environment” example note book.  The second to last cell shows what needs to be set on the fabric_rc file.

                        https://github.com/fabric-testbed/jupyter-examples/blob/master/fabric_examples/fablib_api/configure_environment/configure_environment.ipynb

                        in reply to: What if institution is not in the list on the CI Logon page? #3243
                        Paul Ruth
                        Keymaster

                          If the email addresses are different then there will be two accounts.  This would be the case if you used an NCSA account and then a non-NCSA Google/GSuite account.  I don’t think we can move an account to different identity provider.

                          If email addresses are the same, it might merge the accounts.   This is the case if, for example, you used your NYU account and then tried to login with Google/GSuite using your NYU credentials.  That said, this will probably produce unpredictable results.  I would not recommend it.

                          If you have a list of non-Incommon users, you should create a ticket here: https://fabric-testbed.atlassian.net/servicedesk/customer/portal/2/group/8/create/18

                          in reply to: What if institution is not in the list on the CI Logon page? #3234
                          Paul Ruth
                          Keymaster

                            We can use Google accounts too. However, institutional accounts are strongly preferred and we will only approve Google accounts in situations where an institutional account is not possible.

                            If you have collaborators/students who need to use Google accounts, we need the project lead to provide a list of users/Google accounts that we should approve and we will approve them as they come in.

                            One other possibility is if an institution is not listed on the CIlogon page but the institution uses Google as a email provider.  In this case, they can choose ‘Google’ from the dropdown and log in with their institutional ID.  We can approve these accounts without extra information because the account is still tied to the institution.

                             

                            in reply to: Error on get_management_os_interface #3231
                            Paul Ruth
                            Keymaster

                              Yeah, it looks like the version of ‘ip’ in Ubuntu 18 is not capable of json as output.

                              A workaround would be to use this execute command:

                              stdout, stderr = node.execute("ip route list default | cut -d ' ' -f 5")
                              print(stdout)

                              In general, the fablib functions that get/set internal attributes of the VMs are just convenience wrappers around ssh commands. Most of these settings are determined by the OS or user configuration and are not controlled by FABRIC. Getting them can be tricky so we are trying to add some helper functions, but there will always be some corner cases.  In this case, we can add a check for successful return of a json object and fall back to parsing the string but parsing stdout is always going to be fragile.

                              Thanks for letting us know about this.

                              in reply to: Error on get_management_os_interface #3227
                              Paul Ruth
                              Keymaster

                                The json it is returning seems like its one big string.  Which image are you using?

                                I’d like to try that image and see if it is behaving in a way I didn’t expect.

                                in reply to: Error on get_management_os_interface #3223
                                Paul Ruth
                                Keymaster

                                  The way this works is that it ssh’s to your node and gets the result of ‘ip addr route’ as json.  It then digs through that json to find the name of the device.

                                  Your error seems to be that the json that is returned is ‘None’.  This likely means the ssh failed.

                                  Can you do a “mynode.execute(‘hello, fabric’)” ?  Does that fail too?

                                Viewing 15 posts - 76 through 90 (of 273 total)