1. Fraida Fund

Fraida Fund

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 32 total)
  • Author
    Posts
  • in reply to: Setting up Kubernetes cluster on FABRIC #7926
    Fraida Fund
    Participant

      Hi, you can use this example: https://github.com/teaching-on-testbeds/k8s

      I just tested it and the playbook failed on the first run, but was successful on a second attempt. If it is successful, you should see zero “failed”, like this –

      PLAY RECAP *********************************************************************
      localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
      node-0                     : ok=715  changed=35   unreachable=0    failed=0    skipped=1252 rescued=0    ignored=1   
      node-1                     : ok=610  changed=26   unreachable=0    failed=0    skipped=1109 rescued=0    ignored=1   
      node-2                     : ok=502  changed=20   unreachable=0    failed=0    skipped=779  rescued=0    ignored=1   
      

      (I developed that example for teaching this material: https://github.com/teaching-on-testbeds/k8s-ml, if you want to see an example of how it is used. )

      in reply to: SSH Key authenticating error #7310
      Fraida Fund
      Participant

        Re:

        NOTE: Bastion host is only used as a jump box, we do not allow login to Bastion Nodes.

        the knowledge base says we can test the bastion host login using

        ssh -i ~/.ssh/fabric_bastion -C2T -D 14000 -M -N username_0123456789@bastion.fabric-testbed.net

        is this no longer current guidance?

        in reply to: Cannot SSH to VMs on newy-w2.fabric-testbed.net #6176
        Fraida Fund
        Participant

          They’re back up. thanks!

          in reply to: L2Bridge without MAC learning? #5332
          Fraida Fund
          Participant

            Thanks for keeping me informed!

            in reply to: Project member I did not specify is being added to my project #5256
            Fraida Fund
            Participant

              The project did not have any members before.

              I realized that the CSV file had an “Email” header at the top. It appears that the first existing FABRIC user with “email” in their email address was matched to this line. You can reproduce with a blank file with just the text

              Email
              

              in it. Similarly, if I use this CSV file:

              nyu
              

              it tries to add the first FABRIC user with “nyu” in their email address.

              It seems to be matching on partial string instead of the entire string, so I guess if there were two users “xx@email.org” and “axx@email.org”, and I upload a CSV with “xx@email.org”, it might match to “axx@email.org” instead of “xx@email.org”.

              Fraida Fund
              Participant

                Perhaps the instructions at https://learn.fabric-testbed.net/knowledge-base/obtaining-and-using-fabric-api-tokens/#using-tokens-within-the-jupyter-hub can be updated. Currently, it says to generate a new token and upload to JH when you get a “Refresh Token: (invalid grant)” error. But at least for this instance of that error it doesn’t work (and, my students say that solution also has not worked for them when they encounter this error) – whatever is not initialized properly fails even with a new token. It only works if the JH is stopped and restarted from the Hub Control Panel.

                Fraida Fund
                Participant

                  Following up on this to share more info –

                  In Step 4 above, the first JH server I start after the timeout does have a new refresh token. the contents of .tokens.json show the token is created when I start the JH server:

                  {
                      "refresh_token": "XXX",
                      "created_at": "2023-08-30 15:13:26"
                  }
                  

                  but when I try to use fablib I get that token error, and no ID token.

                  After stopping the JH server from the Hub Control Panel and starting it again, then it gets another new refresh token – .tokens.json has –

                  {
                      "refresh_token": "XXX",
                      "created_at": "2023-08-30 15:16:47"
                  }
                  

                  and this one works. When attempting to use fablib, I get an ID token and no error.

                  Not clear why the first refresh token does not work, even though it is new.

                  Fraida Fund
                  Participant

                    Thanks, the part that I consider a “bug” is that when I log in again and start a new server in Step 4, it does not get a new “good” token.  Is that behavior expected?

                    in reply to: L2Bridge without MAC learning? #5115
                    Fraida Fund
                    Participant

                      Thanks, I appreciate the update!

                      Fraida Fund
                      Participant

                        (related question – is that jupyter_startup.py anywhere in the fabric-testbed Github? I thought it should be https://github.com/fabric-testbed/jupyternb-setup but that hasn’t been updated in a while, and neither branch matches what’s currently on the “default” server.)

                        in reply to: Adding large number of members to a project #4138
                        Fraida Fund
                        Participant

                          Thanks for following up! I look forward to trying this feature.

                          in reply to: L2Bridge without MAC learning? #4011
                          Fraida Fund
                          Participant

                            Hi! I wanted to follow up on this, since this functionality is used in educational materials, I am working to transition those materials ahead of the imminent retirement of InstaGENI, and I need to consider what platform to transition them to.

                            Is this issue expected to be fix-able? If yes, is there a rough timeline? (Is it likely to be fixed before InstaGENI is retired?)

                            in reply to: Bandwidth on FABRIC links #3962
                            Fraida Fund
                            Participant

                              Thanks. Did I get this right –

                              • A dedicated  ConnectX-6/5 has its full bandwidth within a site (even in a hypothetical situation where the site has high utilization)
                              • A dedicated ConnectX-6/5s is currently best effort between sites, but eventually we’ll be able to reserve bandwidth on these links between sites.
                              • Basic NICs have (and only ever will have) best effort, with a 780 Mbps minimum in the hypothetical where the site has high utilization.
                              in reply to: Bandwidth on FABRIC links #3960
                              Fraida Fund
                              Participant

                                Thanks! Could you clarify this point –

                                Basic NICs: The existing Basic NICs are implemented as SR-IOV virtual functions on a 100Gbps ConnectX-6.  The only limitation is that the bandwidth is shared with the other Basic NICs on that port.

                                This means that 100 Gbps is divided by all of the Basic NICs on that port, and the port may be shared by Basic NICs across my slice but also other users’ slices? Hypothetically, if all 128 SR-IOV VFs on the port are used, then the bandwidth could max out at ~780 Mbps? (And I don’t have any visibility into how many SR-IOV VFs are on the port.)

                                in reply to: Updating the Default VM Images #3829
                                Fraida Fund
                                Participant

                                  As an experimenter, I would prefer if images were not updated so that I could develop experiments against a hosted image, and I wouldn’t have to keep updating experiments to reflect the latest software versions.

                                  (Keeping the images stable gives me two choices: I could update to latest software versions, or I could choose not to. Updating the images means I have no choice.)

                                  This is especially a concern for e.g. education experiments, where we may also record video materials to go along with each experiment, and it’s very time intensive to prepare. I have a strong interest in those experiments staying stable.

                                  Maybe there could be one hosted image for each OS that is updated (e.g. “default_ubuntu_latest” is keep updated and there is an announcement so we know when it is updated), but also keep some stable images (e.g. “default_ubuntu_20” is not updated except for security updates).

                                Viewing 15 posts - 1 through 15 (of 32 total)