Forum Replies Created
-
AuthorPosts
-
BTW, @Paul. Does FABRIC have network usage plots by time? That way we can see how busy the links are.
– Chengyi
Hi Brandon,
Thank you for your help! With your settings and sites choose, I can also get 90 or so Gbps, cheers! And in my ConnectX6, it reaches the number too!
I will test it tomorrow morning again and see, since I suspect no one is using FABRIC at late night now 🙂
Anyway, thank you for your help. Have a good night~
Best,
Chengyi
Thank you, Rice for the test! It was me occupied two NICs between SALT and UTAH. And I tested this morning with 32 parallel streams on ConnectX6s and achieved 56Gbps. I don’t want to release my reservation, so would you mind sharing your notebooks with me so I could test based on your settings on my side? Thank you!
Facing the same problem…. Seems like Fabric is updating the format of environment configuration…. with a fabric_config directory…
Hi Paul,
After a few test, I still cannot reach 30G for the tests. (I can only get at most 20 between Utah and Salt with ConnectX6 NIC) Would you like to share your settings so that I could duplicate and see the results? Some settings e.g., how may cores/RAM you use for test, how many parallel tasks you start for the iperf3 test, between which 2 sites you process your test, and use which NIC you test and how the network is set?
Thank you so much for your help..
Thank you for your advices, Paul! I basically use the instructions here: https://srcc.stanford.edu/100g-network-adapter-tuning and here: https://fasterdata.es.net/host-tuning/linux/ for TCP, and https://fasterdata.es.net/host-tuning/linux/udp-tuning/ for UDP. Specifically:
——-TCP
/etc/sysctl.conf
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control=bbr
net.ipv4.tcp_mtu_probing=1
net.core.default_qdisc = fq
net.core.netdev_max_backlog = 250000
net.ipv4.tcp_no_metrics_save = 1$ ethtool -K <eth1> lro on
$ ifconfig <eth1> txqueuelen 20000
$ systemctl stop irqbalance———–UDP
$ iperf3 -s
$ iperf3 -l8972 -u -w4m -b0 -A 4,4 -c 192.168.1.1 -t 60I can try later for i) nearer nodes and ii) connectX-6 cards to explore better results.
And I will also follow up your advice on UDP tuning and try to find a good b/w.
Looking forward to useful examples on fully usage of the Fabric network link capacities.
BTW…
Cores RAM Disk
8 32 100According to the tuning instruction, for now, if the site is STAR and SALT (between which I assume is a 100 Gpbs link), and with Basic 100G NIC, I can achieve, with TCP:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-60.00 sec 86.8 GBytes 12.4 Gbits/sec 806071 sender
[ 5] 0.00-60.04 sec 86.8 GBytes 12.4 Gbits/sec receiverWhich I think it is not as ideal as I expect. And with UDP test, I can achieve:
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-60.00 sec 39.7 GBytes 5.68 Gbits/sec 0.000 ms 0/4746870 (0%) sender
[ 5] 0.00-60.03 sec 438 MBytes 61.3 Mbits/sec 0.003 ms 2483897/2535131 (98%) receiverWhich also not that much.
Is there any way I could increase the bandwidth? Thank you so much!
June 22, 2022 at 1:27 pm in reply to: Slides delete without notice before the extended lease end date #2181Thank you for your help, Komal!
Thank you for the explaination. I tried to assigne ip addresses and ping between nodes, and everything works fine as normal.
Thank you for your help. I will just ignore this exception.
When I want to reserve l2 network, still facing the same problem….
Here is my code:
# Add host node h1
h1 = slice.add_node(name=h1_name, site=site_1)
h1.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
h1.set_image(image)
h1_iface = h1.add_component(model=’NIC_ConnectX_5′, name=”h1_nic”).get_interfaces()[0]# Add host node h2
h2 = slice.add_node(name=h2_name, site=site_1)
h2.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
h2.set_image(image)
h2_iface = h2.add_component(model=’NIC_ConnectX_5′, name=”h2_nic”).get_interfaces()[0]# Add host node h3
h3 = slice.add_node(name=h3_name, site=site_2)
h3.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
h3.set_image(image)
[h3_iface,h3_iface_pub] = h3.add_component(model=’NIC_ConnectX_5′, name=”h3_nic”).get_interfaces()# Add host node h4
h4 = slice.add_node(name=h4_name, site=site_2)
h4.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
h4.set_image(image)
h4_iface = h4.add_component(model=’NIC_ConnectX_5′, name=”h4_nic”).get_interfaces()[0]# Add host node h5
h5 = slice.add_node(name=h5_name, site=site_2)
h5.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
h5.set_image(image)
h5_iface = h5.add_component(model=’NIC_ConnectX_5′, name=”h5_nic”).get_interfaces()[0]#Add control panel networks
host_net1 = slice.add_l2network(name=net_1_name, interfaces=[h1_iface,h2_iface, h3_iface, h4_iface,h5_iface])And I got the same error as in the first post.
Hi Paul,
Thank you for your reply. I emailed you our requests. Please check.
Best,
Chengyi
I’m also facing the same situation on previous nodes. Seems like all the nodes created before are not accessible. So I decided to recreate them all 🙁
Hey Paul,
I’m facing similar problems when I want to ssh into my nodes via jupyterhub terminal.
ssh -i /home/fabric/.ssh/id_rsa -J cqy78_0038438951@bastion-1.fabric-testbed.net ubuntu@2001:400:a100:3010:f816:3eff:feb6:2d59
The authenticity of host ‘bastion-1.fabric-testbed.net (152.54.15.12)’ can’t be established.
ECDSA key fingerprint is SHA256:AIRhefx5rhgEfSSoO8NIc6g+ohFQuSU0yn0i7qGUkY8.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added ‘bastion-1.fabric-testbed.net,152.54.15.12’ (ECDSA) to the list of known hosts.
The authenticity of host ‘2001:400:a100:3010:f816:3eff:feb6:2d59 (<no hostip for proxy command>)’ can’t be established.
ECDSA key fingerprint is SHA256:XhcWRQ69Qw3tX0QEFpoBbrF7vI0SAvBZs3i+acbcSzI.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added ‘2001:400:a100:3010:f816:3eff:feb6:2d59’ (ECDSA) to the list of known hosts.
ubuntu@2001:400:a100:3010:f816:3eff:feb6:2d59: Permission denied (publickey).I ran the bastion_setup notebook (which modified the .ssh/config file in my jupyterhub) and was still facing the ‘permission denied’ problem. Is there anything I missed or it is because of the maintenance?
Thank you,
Chengyi
NVM, Komal helps me figure it out. Thank you Paul for help.
-
AuthorPosts