Forum Replies Created
-
AuthorPosts
-
November 20, 2024 at 7:45 am in reply to: Chameleon Facility Ports: FABnetv4 (Layer 3) internal server error #7823
I added the Chameleon-TACC facility port permission to your project. Can you try again?
Note that we are at the SC24 conference this week and may be slow to respond. Also, there are some very large demos running this week.
Nirmala,
I will email you.
Paul
February 29, 2024 at 4:42 pm in reply to: OpenVSwitch link under Complex Recipes doesn’t go anywhere #6648Violet,
Is this for a research project or a classroom? If its a classroom, how many students are in the class? How many students are there? How many ports do their OVS switches need?
Keep in mind that smart NICs for this will require a full NIC for every 2 ports that are on the OVS switch. So a 4 port OVS switch will need two full smart NICs on the same hosts.
Paul
There is nothing in particular that is different about LOSA.
What jumps out at me about these results is that they are at least an order of magnitude too low. With dedicated ConnectX-5 cards you should be seeing nearly 25 Gpbs. I suspect that your test case is too small. Your 100 MB test probably doesn’t get out of the TCP ramp up phase of the connection. You should try transferring several hundred GB… or better yet, run the tests for a set amount of time (at least 1 min). You should also use much larger VMs, set the MTUs to 9000, and consider adjusting your buffer sizes.
Try running the example iPerf3 notebook but manually set the sites to LOSA and DALL. You should see much higher bandwidths. Then tweak that test, in small steps, with your desired configuration and see what causes the bandwidth to drop.
I think your tests are really testing the performance capabilities of the VMs, buffers, etc. but not the network.
Also, if you really want repeatability, you will need to use the NUMA pinning examples. Without explicitly choosing the NUMA domain for your cores, you will get random physical cores that may result much lower performance.
For reference, here is the output of a the example iPerf3 notebook using LOSA and DALL. Note that you can get nearly 100 Gbps if you increase the VM size and pin the cores to the correct NUMA domain:
<pre>Connecting to host 10.137.3.2, port 5201 [ 5] local 10.133.130.2 port 56288 connected to 10.137.3.2 port 5201 [ 7] local 10.133.130.2 port 56294 connected to 10.137.3.2 port 5201 [ 9] local 10.133.130.2 port 56310 connected to 10.137.3.2 port 5201 [ 11] local 10.133.130.2 port 56318 connected to 10.137.3.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.01 sec 14.3 GBytes 12.3 Gbits/sec 11207 52.6 MBytes (omitted) [ 7] 0.00-10.01 sec 15.4 GBytes 13.2 Gbits/sec 12714 63.5 MBytes (omitted) [ 9] 0.00-10.01 sec 15.6 GBytes 13.4 Gbits/sec 11597 64.3 MBytes (omitted) [ 11] 0.00-10.01 sec 20.3 GBytes 17.4 Gbits/sec 31095 201 MBytes (omitted) [SUM] 0.00-10.01 sec 65.5 GBytes 56.2 Gbits/sec 66613 (omitted) - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 0.00-10.01 sec 11.4 GBytes 9.77 Gbits/sec 2531 84.9 MBytes [ 7] 0.00-10.01 sec 15.7 GBytes 13.4 Gbits/sec 3213 123 MBytes [ 9] 0.00-10.01 sec 17.7 GBytes 15.2 Gbits/sec 3833 143 MBytes [ 11] 0.00-10.01 sec 18.4 GBytes 15.8 Gbits/sec 3280 145 MBytes [SUM] 0.00-10.01 sec 63.2 GBytes 54.2 Gbits/sec 12857 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 10.01-20.01 sec 11.4 GBytes 9.79 Gbits/sec 0 89.5 MBytes [ 7] 10.01-20.01 sec 16.4 GBytes 14.1 Gbits/sec 0 124 MBytes [ 9] 10.01-20.01 sec 18.7 GBytes 16.1 Gbits/sec 0 144 MBytes [ 11] 10.01-20.01 sec 18.7 GBytes 16.0 Gbits/sec 0 142 MBytes [SUM] 10.01-20.01 sec 65.2 GBytes 56.0 Gbits/sec 0 - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 20.01-30.00 sec 11.0 GBytes 9.43 Gbits/sec 3639 86.7 MBytes [ 7] 20.01-30.00 sec 15.7 GBytes 13.5 Gbits/sec 5665 124 MBytes [ 9] 20.01-30.00 sec 17.9 GBytes 15.4 Gbits/sec 6044 139 MBytes [ 11] 20.01-30.00 sec 17.6 GBytes 15.1 Gbits/sec 6159 139 MBytes [SUM] 20.01-30.00 sec 62.1 GBytes 53.4 Gbits/sec 21507 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-30.00 sec 33.8 GBytes 9.66 Gbits/sec 6170 sender [ 5] 0.00-30.05 sec 33.6 GBytes 9.61 Gbits/sec receiver [ 7] 0.00-30.00 sec 47.7 GBytes 13.7 Gbits/sec 8878 sender [ 7] 0.00-30.05 sec 48.0 GBytes 13.7 Gbits/sec receiver [ 9] 0.00-30.00 sec 54.3 GBytes 15.5 Gbits/sec 9877 sender [ 9] 0.00-30.05 sec 54.5 GBytes 15.6 Gbits/sec receiver [ 11] 0.00-30.00 sec 54.7 GBytes 15.7 Gbits/sec 9439 sender [ 11] 0.00-30.05 sec 54.6 GBytes 15.6 Gbits/sec receiver [SUM] 0.00-30.00 sec 190 GBytes 54.5 Gbits/sec 34364 sender [SUM] 0.00-30.05 sec 191 GBytes 54.5 Gbits/sec receiver </pre>
- This reply was modified 9 months, 2 weeks ago by Paul Ruth.
- This reply was modified 9 months, 2 weeks ago by Paul Ruth.
- This reply was modified 9 months, 2 weeks ago by Paul Ruth.
One more question… were able to replicated this before? By replicate I mean run it once in one slice, then delete that slice and run it again in a new slice.
I think the main issue here is combination of VMs that are too small (memory/cores) to achieve good bandwidth and that you are not pinning cores to NUMA domains. Without pinning you will not likely get repeatable performance. The issue is that if your VM cores are not in the same NUMA domain as your NIC, you will get worse performance. This is especially true for the router nodes. When you create a slice, your virtual cores will float between the available physical cores. Since there are other users on the host, you will not know anything about the placement of your virtual cores.
I suggest using much larger VMs and pinning the cores to the appropriate NUMA domains.
One more thing, which version of iPerf3 are you using? The iPerf3 that is available in most linux repos is single threaded. I recommend using the new version suported by ESnet (https://github.com/esnet/iperf).
Edgard,
I don’t think there would be anything that would limit you to bandwidth that low. All but a few sites should support 100 Gbps (a few can only provide 10 Gpbs). I would expect much higher bandwidth than you are seeing. Even using multiple software routes, I would expect 10s of Gbps.
What NIC types are you using?
What VM size are you using?
How are you forwarding traffic in you routers?
Are you tuning the TCP/IP configuration of your nodes (congestion control algorithm, MTU, buffer sizes, etc)?
Also, are you pinning your nodes to the NIC’s NUMA domain? NUMA pinning example.
Paul
- This reply was modified 10 months, 1 week ago by Paul Ruth.
Hares,
If you a professor or researcher who is eligible to be a project lead, you will need to request authorization to be a project lead. Then you can create a project and add your students or colleagues.
If you are a student or otherwise not eligible to be a project lead, you will need to talk to a professor at your university and have them create an account and lead a project with you as a member. Students are not able to use FABRIC without the supervision of a professor or other senior researcher.
Paul
January 11, 2024 at 8:04 pm in reply to: Problem to run basic fabric_examples in JupyterLab since 3.1.2024 #6309Recently deployed features require fablib to be upgraded.
See the post in the Announcements forum: https://learn.fabric-testbed.net/forums/topic/fabric-testbed-is-open-and-ready-for-use/
You should only need to run the following pip command:
pip install fabrictestbed-extensions==1.5.6
Note that you might want to upgrade all the way to the latest fablib version, which is
1.6.0
Paul
- This reply was modified 10 months, 2 weeks ago by Paul Ruth.
The answer depends on what you are trying to do. Generally, FABRIC is a secure sandbox that allows students and researchers to freely experiment with very disruptive and, potentially, vulnerable software architectures in a secure way. If you are trying to connect your laptop or other server that you control to nodes in your slice, you will need to use secure mechanism, for example ssh tunnels. There is an example Jupyter notebook that describes how to create ssh tunnels through the FABRIC bastion host. Another power way to do this is to use a personal VPN such as Tailscale.
If you are trying to expose a port to the whole of the Internet then we will only allow that in extremely rare circumstances where an alternative solution is not otherwise possible. In addition, these capabilities would require the user to deploy, maintain, and monitor the security of the experiments at level similar to a production data center. This is the capability enabled by the IPv4Ext and IPv6Ext services.
For starters, I would recommend becoming familiar with ssh tunnels. They are fairly simple to deploy.
let us know if you have any additional questions,
Paul
- This reply was modified 10 months, 2 weeks ago by Paul Ruth.
- This reply was modified 10 months, 2 weeks ago by Paul Ruth.
January 11, 2024 at 6:13 pm in reply to: Frrouting, policy validation issues—VM.NolimitCpu or VM.nolimit tag issues. #6302Emmanuel,
This just means you need extra permissions added to your project.
Which project are you using? Who is the project lead?
The project lead will need to request extra permissions as described here.
Paul
Vaiden,
The following line from your example returns a list of site names. This is true even if it returns only one site in the list.
site_5 = fablib.get_random_sites(count=1,filter_function=lambda x:x[‘ptp_capable’] is True, avoid=(avoid_sites))
When you pass it to
add_node
in the following line, you are passing a list as the site argument. That argument need to be a string.node5 = slice_modified.add_node(name=node5_name, site=site_5, cores=16, ram=32, disk=75, image=’default_ubuntu_22′)
Paul
I think I see the issue now.
In your example you have something like:
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager fablib = fablib_manager() slice = fablib.new_slice(name='MySlice') slice.show()
The problem is that when you are calling
slice.show()
the slice does not yet exist. You need to create the slice first. Add some nodes/networks something like the following:from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager fablib = fablib_manager() slice = fablib.new_slice(name='MySlice') node = slice.add_node(name='node1', site='STAR', image="default_rocky_8" ) slice.submit() slice.show()
This should work. That said, I think you should be able to
slice.show()
a slice that is only half built and has not been submitted yet. This is a bug that we will look into.Let me know if this works for you.
- This reply was modified 10 months, 2 weeks ago by Paul Ruth.
Can you try adding back in the following line and report the output?
fablib.show_config()
- This reply was modified 10 months, 2 weeks ago by Paul Ruth.
Hares,
Thank you for your interest in FABRIC. Before you can start using FABRIC you must be added to an active project. Projects are typically lead by a professor or researcher who has been authorized to be a project lead.
More information about project permissions can be found here: https://learn.fabric-testbed.net/knowledge-base/fabric-user-roles-and-project-permissions/
If you a professor or researcher who is eligible to be a project lead, you will need to request authorization to be a project lead. Then you can create a project and add your students or colleagues. If you are a student or otherwise not eligible to be a project lead, you will need to find a professor to lead your project.
Paul
- This reply was modified 10 months, 2 weeks ago by Paul Ruth.
One thing to check…. Does your notebook contain an unmodified version of this line?:
fablib = fablib_manager(fabric_rc="/path/to/fabric_rc")
If it does, you need to modify the path to point to your actual
fabric_rc
file.Paul
-
AuthorPosts