1. Paul Ruth

Paul Ruth

Forum Replies Created

Viewing 15 posts - 31 through 45 (of 273 total)
  • Author
    Posts
  • in reply to: KeyError When creating a Slice #4051
    Paul Ruth
    Keymaster

      That looks like a bug. We’ll fix it.

      One workaround is to use the NIC by adding it to a network.

      thanks,

      Paul

       

      in reply to: Local NAS storage/VM #4031
      Paul Ruth
      Keymaster

        Thanks. We will update that. It probably won’t be the default until the end of the semester so that we can keep everything stable for educational users.

        Paul Ruth
        Keymaster

          Dedicated nics are either ConnectX-5 or ConnectX-6 (i.e. anything other that Basic NICs).

           

          in reply to: Network to GPU Hardware Support #4021
          Paul Ruth
          Keymaster

            I’m sure the hardware supports this.  The trick might be in how to handle it in a VM with devices using PCI passthrough.   I’m not sure anyone has tried this yet.

            Do you know how to do this on a non-virtualized machine?

            in reply to: L2Bridge without MAC learning? #4020
            Paul Ruth
            Keymaster

              Ezra and I looked a this a while back and it seemed that the Mellanox cards were not handling these frames the way we expected given the config options that we used.  We’ll  need to revisit this.   I’ll get back to you about this.

              thanks,

              Paul

              in reply to: go does not work with node.execute API #4019
              Paul Ruth
              Keymaster

                Each of those execute calls creates a separate ssh session that runs the the command.  This means that each of the calls runs in its own shell and the result of any pervious ‘export’ or ‘source’ calls will not be represented in the environment of a new call.

                In this case you are putting everything in the .bashrc file, so I would think that the new call would source the .bashrc file when the shell is created but maybe there is some other piece of the environment  missing.

                Try putting that all in one call, something like this:

                node.execute('curl -O -L "https://golang.org/dl/go1.19.5.linux-amd64.tar.gz" ;'
                              'tar -xf "go1.19.5.linux-amd64.tar.gz" ;'
                              'sudo mv -v go /usr/local ;'
                              'echo "export GOPATH=$HOME/go" >> ~/.bashrc ;' 
                              'echo "export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin" >> ~/.bashrc ;' 
                              'echo "export PATH=$PATH:/usr/local/go/bin" >> ~/.bashrc ;' 
                              'source ~/.bashrc ;' 
                              'go install github.com/named-data/YaNFD/cmd/yanfd@latest')
                
                
                in reply to: Error in exchanging ospf protocol routes #4003
                Paul Ruth
                Keymaster

                  This is great. Note that I do have FRR/OSPF working without issue using shared/basic NICs.  I’m using the Rocky image and FRR docker image.  I’m not sure what is preventing your config from working.

                  The FRR/OSPF notebooks I am putting together are based on a yet-to-be-released version of FABlib.  I’ll release the example when the new FABlib is released.

                  Paul

                  in reply to: Deploying a containerized application #4002
                  Paul Ruth
                  Keymaster

                    Are you asking to expose services to the public Internet? In general, we don’t want to expose internal FABRIC slices to the Internet. Our experience running testbeds has taught us that its too easy for slices to become compromised.

                    This is why we have the bastion host and are requiring you to use ssh tunnels or a proxy to intentionally expose internals.  If a reverse proxy works for you then you can keep using it. If you do need to have a service exposed more generally, we can support that too, however, we will need to know more about what want to do and how you plan to keep it secure.

                     

                    Paul Ruth
                    Keymaster

                      I think this might be an issue we have with Basic_NICs.  The physical NICs sometimes filter legal L2 frames that should be allowed through.  We are working on a solution but it might be a firmware issue with the NICs.

                      Can you try this with dedicated NICs?

                       

                      in reply to: Deploying a containerized application #3987
                      Paul Ruth
                      Keymaster

                        You should be able to install just about any software you want in a FABRIC VM.   I don’t know anything about Flask, but you should be able to set it up.

                        Can you describe what you have tried and where you are stuck?

                        in reply to: Error in exchanging ospf protocol routes #3986
                        Paul Ruth
                        Keymaster

                          Sorry for the delayed response.

                          The short answer is that there is no filtering on any of the network services and I have reliably set up OSPF using rocky VMs.

                          Are you saying that the VM receives the OSPF packets but does not send a response?  If so, this seems like an issue with the OSPF config in the VM.

                          Can you share the notebook you are using?

                          Paul

                          in reply to: Bandwidth on FABRIC links #3963
                          Paul Ruth
                          Keymaster

                            Yes, that is correct.

                            It might be useful to add that with the SRI-OV NICs the virtual switching in the host is done in hardware.  In most other cloud-ish systems the multi-tenant switching is done in software.  These software switches use a lot of CPU which causes performance interference between the computation and networking (even between experiments).  The SR-IOV solution should isolate the performance a bit more.

                            Also, it should be rare that there would be enough other active network-heavy experiments that you would see less than 10 or 20Gbps in the Basic NICs (I’m interesting testing this when we have more users).  I think you could get a lot of good experiments done by using tc to rate limit your own traffic to some modest amount. (10-20 Gbps). I suspect, a lot of the network contention would be from your own experiment.   Then, when you are ready, we could move you too a dedicated NIC an you could try higher bandwidths.

                            in reply to: Bandwidth on FABRIC links #3961
                            Paul Ruth
                            Keymaster

                              The Basic NICs/VFs are all best effort through the NIC itself with a cap of 100Gbps shared between all VFs on a physical host.  It is possible that each would max out at ~780Mbps but it is extremely unlikely.  In practice you will likely see 10s of Gbps.  Currently, your max bandwidth will likely be very close to 100Gbps.

                              If you want dedicated bandwidth, you will need to reserve dedicated NICs.   These NICs are dedicated to your experiment and are limited only by their hardware.  When QoS reservations are available they will be on the WAN links between the sites. Most of these links will be 100Gbps that can be divided and allocated to individual experiments.  The “super core” links will be 1200Gbps and can be divided and allocated as well.   One of the main uses of the “super core” will be to allocate many dedicated 100Gbps QoS links to individual experiments.

                              in reply to: Bandwidth on FABRIC links #3956
                              Paul Ruth
                              Keymaster

                                Bandwidth QoS provisioning of WAN links is still being developed.  Stay tuned.

                                • Basic NICs: The existing Basic NICs are implemented as SR-IOV virtual functions on a 100Gbps ConnectX-6.  The only limitation is that the bandwidth is shared with the other Basic NICs on that port.
                                • ConnectX-6/5s: The dedicated ConnectX-6s come with 2 100Gpbs ports while the ConnectX-5s have 2 25Gbps ports.  The dedicated ConnectX-6/5s are fully dedicate to a single VM and have full bandwidth to the switch.

                                Currently, there is little competition for bandwidth and you can see very nearly the full bandwidth in most cases (even with Basic NICs).  This is especially true for connections that stay within a site.

                                WAN links vary in performance. Eventually, they will nearly all be on 100+ Gbps L1 connections owned by FABRIC. However, they are currently being deployed as fast as we can.  In the mean time, many of the links are AL2S or other L2 service while we wait for the real links to be deployed. You will likely see lower bandwidth on these links.

                                Also, there are some quirks we are trying to work out where some SR-IOV NICs occasionally only get 25-30Gbps.  It seem like they are being left in a weird state by a previous experiment.  We are trying to figure out how to detect and reset these cases.

                                Generally, you should expect at lest 25Gbps and will often get close to 100Gbps.  Note that in order to get these speeds you will need a bit more memory and cores than the default.  Also, the app will need to be multi-threaded and many tools like iperf3 are single threaded even if you use ‘-P’ (https://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf/multi-stream-iperf3/)

                                 

                                • This reply was modified 1 year, 8 months ago by Paul Ruth.
                                in reply to: P4_bmv2 example is not working when executed on fabrictestbed. #3880
                                Paul Ruth
                                Keymaster

                                  There are some update coming to fablib to better support docker containers.  When these arrive we will revisit all the “complex” examples and get them working in a much simpler way.

                                  For now, I can assure you there is nothing in FABRIC that prevents these examples from working.  They just need to be updated.  It is possible for someone with P4 BMV2 experience to get them working on their own.

                                  I’ll keep this forum update on any progress.

                                   

                                   

                                Viewing 15 posts - 31 through 45 (of 273 total)