1. Paul Ruth

Paul Ruth

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 274 total)
  • Author
    Posts
  • in reply to: GPU node is not available #7960
    Paul Ruth
    Keymaster

      Yuanjun,

      I moved this topic to the “General Questions” forum. The “Announcements” is used by the FABRIC team to make announcements about new features, outages, etc.

      thanks,

      Paul

      Paul Ruth
      Keymaster

        I added the Chameleon-TACC facility port permission to your project.   Can you try again?

        Note that we are at the SC24 conference this week and may be slow to respond. Also, there are some very large demos running this week.

        in reply to: Request for Project extension #7231
        Paul Ruth
        Keymaster

          Nirmala,

          I will email you.

          Paul

          in reply to: OpenVSwitch link under Complex Recipes doesn’t go anywhere #6648
          Paul Ruth
          Keymaster

            Violet,

            Is this for a research project or a classroom? If its a classroom, how many students are in the class?  How many students are there? How many ports do their OVS switches need?

            Keep in mind that smart NICs for this will require a full NIC for every 2 ports that are on the OVS switch.  So a 4 port OVS switch will need two full smart NICs on the same hosts.

            Paul

            in reply to: Difference between throughput after maintenance #6546
            Paul Ruth
            Keymaster

              There is nothing in particular that is different about LOSA.

              What jumps out at me about these results is that they are at least an order of magnitude too low.  With dedicated ConnectX-5 cards you should be seeing nearly 25 Gpbs.  I suspect that your test case is too small. Your 100 MB test probably doesn’t get out of the TCP ramp up phase of the connection.  You should try transferring several hundred GB… or better yet, run the tests for a set amount of time (at least 1 min).  You should also use much larger VMs, set the MTUs to 9000, and consider adjusting your buffer sizes.

              Try running the example iPerf3 notebook but manually set the sites to LOSA and DALL. You should see much higher bandwidths. Then tweak that test, in small steps, with your desired configuration and see what causes the bandwidth to drop.

              I think your tests are really testing the performance capabilities of the VMs, buffers, etc. but not the network.

              Also, if you really want repeatability, you will need to use the NUMA pinning examples. Without explicitly choosing the NUMA domain for your cores, you will get random physical cores that may result much lower performance.

              For reference, here is the output of a the example iPerf3 notebook using LOSA and DALL. Note that you can get nearly 100 Gbps if you increase the VM size and pin the cores to the correct NUMA domain:

              
              <pre>Connecting to host 10.137.3.2, port 5201
              [  5] local 10.133.130.2 port 56288 connected to 10.137.3.2 port 5201
              [  7] local 10.133.130.2 port 56294 connected to 10.137.3.2 port 5201
              [  9] local 10.133.130.2 port 56310 connected to 10.137.3.2 port 5201
              [ 11] local 10.133.130.2 port 56318 connected to 10.137.3.2 port 5201
              [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
              [  5]   0.00-10.01  sec  14.3 GBytes  12.3 Gbits/sec  11207   52.6 MBytes       (omitted)
              [  7]   0.00-10.01  sec  15.4 GBytes  13.2 Gbits/sec  12714   63.5 MBytes       (omitted)
              [  9]   0.00-10.01  sec  15.6 GBytes  13.4 Gbits/sec  11597   64.3 MBytes       (omitted)
              [ 11]   0.00-10.01  sec  20.3 GBytes  17.4 Gbits/sec  31095    201 MBytes       (omitted)
              [SUM]   0.00-10.01  sec  65.5 GBytes  56.2 Gbits/sec  66613             (omitted)
              - - - - - - - - - - - - - - - - - - - - - - - - -
              [  5]   0.00-10.01  sec  11.4 GBytes  9.77 Gbits/sec  2531   84.9 MBytes       
              [  7]   0.00-10.01  sec  15.7 GBytes  13.4 Gbits/sec  3213    123 MBytes       
              [  9]   0.00-10.01  sec  17.7 GBytes  15.2 Gbits/sec  3833    143 MBytes       
              [ 11]   0.00-10.01  sec  18.4 GBytes  15.8 Gbits/sec  3280    145 MBytes       
              [SUM]   0.00-10.01  sec  63.2 GBytes  54.2 Gbits/sec  12857             
              - - - - - - - - - - - - - - - - - - - - - - - - -
              [  5]  10.01-20.01  sec  11.4 GBytes  9.79 Gbits/sec    0   89.5 MBytes       
              [  7]  10.01-20.01  sec  16.4 GBytes  14.1 Gbits/sec    0    124 MBytes       
              [  9]  10.01-20.01  sec  18.7 GBytes  16.1 Gbits/sec    0    144 MBytes       
              [ 11]  10.01-20.01  sec  18.7 GBytes  16.0 Gbits/sec    0    142 MBytes       
              [SUM]  10.01-20.01  sec  65.2 GBytes  56.0 Gbits/sec    0             
              - - - - - - - - - - - - - - - - - - - - - - - - -
              [  5]  20.01-30.00  sec  11.0 GBytes  9.43 Gbits/sec  3639   86.7 MBytes       
              [  7]  20.01-30.00  sec  15.7 GBytes  13.5 Gbits/sec  5665    124 MBytes       
              [  9]  20.01-30.00  sec  17.9 GBytes  15.4 Gbits/sec  6044    139 MBytes       
              [ 11]  20.01-30.00  sec  17.6 GBytes  15.1 Gbits/sec  6159    139 MBytes       
              [SUM]  20.01-30.00  sec  62.1 GBytes  53.4 Gbits/sec  21507             
              - - - - - - - - - - - - - - - - - - - - - - - - -
              [ ID] Interval           Transfer     Bitrate         Retr
              [  5]   0.00-30.00  sec  33.8 GBytes  9.66 Gbits/sec  6170             sender
              [  5]   0.00-30.05  sec  33.6 GBytes  9.61 Gbits/sec                  receiver
              [  7]   0.00-30.00  sec  47.7 GBytes  13.7 Gbits/sec  8878             sender
              [  7]   0.00-30.05  sec  48.0 GBytes  13.7 Gbits/sec                  receiver
              [  9]   0.00-30.00  sec  54.3 GBytes  15.5 Gbits/sec  9877             sender
              [  9]   0.00-30.05  sec  54.5 GBytes  15.6 Gbits/sec                  receiver
              [ 11]   0.00-30.00  sec  54.7 GBytes  15.7 Gbits/sec  9439             sender
              [ 11]   0.00-30.05  sec  54.6 GBytes  15.6 Gbits/sec                  receiver
              [SUM]   0.00-30.00  sec   190 GBytes  54.5 Gbits/sec  34364             sender
              [SUM]   0.00-30.05  sec   191 GBytes  54.5 Gbits/sec                  receiver
              </pre>
              

               

              • This reply was modified 10 months, 2 weeks ago by Paul Ruth.
              • This reply was modified 10 months, 2 weeks ago by Paul Ruth.
              • This reply was modified 10 months, 2 weeks ago by Paul Ruth.
              in reply to: Difference between throughput after maintenance #6360
              Paul Ruth
              Keymaster

                One more question… were able to replicated this before?  By replicate I mean run it once in one slice, then delete that slice and run it again in a new slice.

                I think the main issue here is combination of VMs that are too small (memory/cores) to achieve good bandwidth and that you are not pinning cores to NUMA domains. Without pinning you will not likely get repeatable performance. The issue is that if your VM cores are not in the same NUMA domain as your NIC, you will get worse performance.  This is especially true for the router nodes.  When you create a slice, your virtual cores will float between the available physical cores. Since there are other users on the host, you will not know anything about the placement of your virtual cores.

                I suggest using much larger VMs and pinning the cores to the appropriate NUMA domains.

                One more thing, which version of iPerf3 are you using?  The iPerf3 that is available in most linux repos is single threaded. I recommend using the new version suported by ESnet (https://github.com/esnet/iperf).

                in reply to: Difference between throughput after maintenance #6345
                Paul Ruth
                Keymaster

                  Edgard,

                  I don’t think there would be anything that would limit you to bandwidth that low.  All but a few sites should support 100 Gbps (a few can only provide 10 Gpbs).  I would expect much higher bandwidth than you are seeing. Even using multiple software routes, I would expect 10s of Gbps.

                  What NIC types are you using?

                  What VM size are you using?

                  How are you forwarding traffic in you routers?

                  Are you tuning the TCP/IP configuration of your nodes (congestion control algorithm, MTU, buffer sizes, etc)?

                  Also, are you pinning your nodes to the NIC’s NUMA domain? NUMA pinning example. 

                  Paul

                  • This reply was modified 11 months, 1 week ago by Paul Ruth.
                  in reply to: How can I run experiments? #6313
                  Paul Ruth
                  Keymaster

                    Hares,

                    If you a professor or researcher who is eligible to be a project lead, you will need to request authorization to be a project lead. Then you can create a project and add your students or colleagues.

                    If you are a student or otherwise not eligible to be a project lead, you will need to talk to a professor at your university and have them create an account and lead a project with you as a member.  Students are not able to use FABRIC without the supervision of a professor or other senior researcher.

                    Paul

                    Paul Ruth
                    Keymaster

                      Recently deployed features require fablib to be upgraded.

                      See the post in the Announcements forum: https://learn.fabric-testbed.net/forums/topic/fabric-testbed-is-open-and-ready-for-use/

                      You should only need to run the following pip command:

                      pip install fabrictestbed-extensions==1.5.6

                      Note that you might want to upgrade all the way to the latest fablib version, which is 1.6.0

                      Paul

                      • This reply was modified 11 months, 2 weeks ago by Paul Ruth.
                      in reply to: Exposing Ports to the Outside World #6306
                      Paul Ruth
                      Keymaster

                        The answer depends on what you are trying to do.  Generally, FABRIC is a secure sandbox that allows students and researchers to freely experiment with very disruptive and, potentially, vulnerable software architectures in a secure way.  If you are trying to connect your laptop or other server that you control to nodes in your slice, you will need to use secure mechanism, for example ssh tunnels.  There is an example Jupyter notebook that describes how to create ssh tunnels through the FABRIC bastion host.  Another power way to do this is to use a personal VPN such as Tailscale.

                        If you are trying to expose a port to the whole of the Internet then we will only allow that in extremely rare circumstances where an alternative solution is not otherwise possible. In addition, these capabilities would require the user to deploy, maintain, and monitor the security of the experiments at level similar to a production data center.  This is the capability enabled by the  IPv4Ext and IPv6Ext services.

                        For starters, I would recommend becoming familiar with ssh tunnels. They are fairly simple to deploy.

                        let us know if you have any additional questions,

                        Paul

                        • This reply was modified 11 months, 2 weeks ago by Paul Ruth.
                        • This reply was modified 11 months, 2 weeks ago by Paul Ruth.
                        Paul Ruth
                        Keymaster

                          Emmanuel,

                          This just means you need extra permissions added to your project.

                          Which project are you using? Who is the project lead?

                          The project lead will need to request extra permissions as described here.

                          Paul

                          in reply to: Unable to add nodes to slice #6292
                          Paul Ruth
                          Keymaster

                            Vaiden,

                            The following line from your example returns a list of site names. This is true even if it returns only one site in the list.

                            site_5 = fablib.get_random_sites(count=1,filter_function=lambda x:x[‘ptp_capable’] is True, avoid=(avoid_sites))

                            When you pass it to add_node in the following line, you are passing a list as the site argument. That argument need to be a string.

                            node5 = slice_modified.add_node(name=node5_name, site=site_5, cores=16, ram=32, disk=75, image=’default_ubuntu_22′)

                            Paul

                            in reply to: Not able to create slice #6287
                            Paul Ruth
                            Keymaster

                              I think I see the issue now.

                              In your example you have something like:

                              from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager
                              fablib = fablib_manager()
                              slice = fablib.new_slice(name='MySlice')
                              slice.show()

                              The problem is that when you are calling slice.show() the slice does not yet exist.  You need to create the slice first.  Add some nodes/networks something like the following:

                              from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager
                              fablib = fablib_manager()
                              slice = fablib.new_slice(name='MySlice')
                              node = slice.add_node(name='node1', site='STAR', image="default_rocky_8" )
                              slice.submit()
                              slice.show()

                              This should work.  That said, I think you should be able to slice.show() a slice that is only half built and has not been submitted yet.  This is a bug that we will look into.

                              Let me know if this works for you.

                              • This reply was modified 11 months, 2 weeks ago by Paul Ruth.
                              in reply to: Not able to create slice #6277
                              Paul Ruth
                              Keymaster

                                Can you try adding back in the following line and report the output?

                                fablib.show_config()

                                • This reply was modified 11 months, 2 weeks ago by Paul Ruth.
                                in reply to: How can I run experiments? #6268
                                Paul Ruth
                                Keymaster

                                  Hares,

                                  Thank you for your interest in FABRIC.  Before you can start using FABRIC you must be added to an active project. Projects are typically lead by a professor or researcher who has been authorized to be a project lead.

                                  More information about project permissions can be found here: https://learn.fabric-testbed.net/knowledge-base/fabric-user-roles-and-project-permissions/

                                  If you a professor or researcher who is eligible to be a project lead, you will need to request authorization to be a project lead. Then you can create a project and add your students or colleagues. If you are a student or otherwise not eligible to be a project lead, you will need to find a professor to lead your project.

                                  Paul

                                  • This reply was modified 11 months, 2 weeks ago by Paul Ruth.
                                Viewing 15 posts - 1 through 15 (of 274 total)