1. Chengyi Qu

Chengyi Qu

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 16 total)
  • Author
    Posts
  • in reply to: Any one get lucky with ~100Gbps bandwidth? #2635
    Chengyi Qu
    Participant

      BTW, @Paul. Does FABRIC have network usage plots by time? That way we can see how busy the links are.

      – Chengyi

      in reply to: Any one get lucky with ~100Gbps bandwidth? #2631
      Chengyi Qu
      Participant

        Hi Brandon,

        Thank you for your help! With your settings and sites choose, I can also get 90 or so Gbps, cheers! And in my ConnectX6, it reaches the number too!

        I will test it tomorrow morning again and see, since I suspect no one is using FABRIC at late night now 🙂

        Anyway, thank you for your help. Have a good night~

        Best,

        Chengyi

        in reply to: Any one get lucky with ~100Gbps bandwidth? #2629
        Chengyi Qu
        Participant

          Thank you, Rice for the test! It was me occupied two NICs between SALT and UTAH. And I tested this morning with 32 parallel streams on ConnectX6s and achieved 56Gbps. I don’t want to release my reservation, so would you mind sharing your notebooks with me so I could test based on your settings on my side? Thank you!

           

          in reply to: User is not a member of project: #2389
          Chengyi Qu
          Participant

            Facing the same problem…. Seems like Fabric is updating the format of environment configuration…. with a fabric_config directory…

            in reply to: get_physical_os_interface()[‘ifname’] failed #2379
            Chengyi Qu
            Participant

              Hi Paul,

              After a few test, I still cannot reach 30G for the tests. (I can only get at most 20 between Utah and Salt with ConnectX6 NIC) Would you like to share your settings so that I could duplicate and see the results? Some settings e.g., how may cores/RAM you use for test, how many parallel tasks you start for the iperf3 test, between which 2 sites you process your test, and use which NIC you test and how the network is set?

              Thank you so much for your help..

              in reply to: get_physical_os_interface()[‘ifname’] failed #2323
              Chengyi Qu
              Participant

                Thank you for your advices, Paul! I basically use the instructions here: https://srcc.stanford.edu/100g-network-adapter-tuning and here: https://fasterdata.es.net/host-tuning/linux/ for TCP, and https://fasterdata.es.net/host-tuning/linux/udp-tuning/ for UDP. Specifically:

                ——-TCP

                /etc/sysctl.conf
                net.core.rmem_max = 268435456
                net.core.wmem_max = 268435456
                net.ipv4.tcp_rmem = 4096 87380 134217728
                net.ipv4.tcp_wmem = 4096 65536 134217728
                net.ipv4.tcp_congestion_control=bbr
                net.ipv4.tcp_mtu_probing=1
                net.core.default_qdisc = fq
                net.core.netdev_max_backlog = 250000
                net.ipv4.tcp_no_metrics_save = 1

                $ ethtool -K <eth1> lro on
                $ ifconfig <eth1> txqueuelen 20000
                $ systemctl stop irqbalance

                ———–UDP

                $ iperf3 -s
                $ iperf3 -l8972 -u -w4m -b0 -A 4,4 -c 192.168.1.1 -t 60

                I can try later for i) nearer nodes and ii) connectX-6 cards to explore better results.

                And I will also follow up your advice on UDP tuning and try to find a good b/w.

                Looking forward to useful examples on fully usage of the Fabric network link capacities.

                in reply to: get_physical_os_interface()[‘ifname’] failed #2303
                Chengyi Qu
                Participant

                  BTW…

                  Cores RAM Disk
                  8 32 100

                  in reply to: get_physical_os_interface()[‘ifname’] failed #2300
                  Chengyi Qu
                  Participant

                    According to the tuning instruction, for now, if the site is STAR and SALT (between which I assume is a 100 Gpbs link),  and with Basic 100G NIC, I can achieve, with TCP:

                    [ ID] Interval Transfer Bitrate Retr
                    [ 5] 0.00-60.00 sec 86.8 GBytes 12.4 Gbits/sec 806071 sender
                    [ 5] 0.00-60.04 sec 86.8 GBytes 12.4 Gbits/sec receiver

                    Which I think it is not as ideal as I expect. And with UDP test, I can achieve:

                    [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
                    [ 5] 0.00-60.00 sec 39.7 GBytes 5.68 Gbits/sec 0.000 ms 0/4746870 (0%) sender
                    [ 5] 0.00-60.03 sec 438 MBytes 61.3 Mbits/sec 0.003 ms 2483897/2535131 (98%) receiver

                    Which also not that much.

                    Is there any way I could increase the bandwidth? Thank you so much!

                    Chengyi Qu
                    Participant

                      Thank you for your help, Komal!

                      in reply to: Create L3 network error #2026
                      Chengyi Qu
                      Participant

                        Thank you for the explaination. I tried to assigne ip addresses and ping between nodes, and everything works fine as normal.

                        Thank you for your help. I will just ignore this exception.

                        in reply to: Create L3 network error #2024
                        Chengyi Qu
                        Participant

                          When I want to reserve l2 network, still facing the same problem….

                          Here is my code:

                          # Add host node h1
                          h1 = slice.add_node(name=h1_name, site=site_1)
                          h1.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h1.set_image(image)
                          h1_iface = h1.add_component(model=’NIC_ConnectX_5′, name=”h1_nic”).get_interfaces()[0]

                          # Add host node h2
                          h2 = slice.add_node(name=h2_name, site=site_1)
                          h2.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h2.set_image(image)
                          h2_iface = h2.add_component(model=’NIC_ConnectX_5′, name=”h2_nic”).get_interfaces()[0]

                          # Add host node h3
                          h3 = slice.add_node(name=h3_name, site=site_2)
                          h3.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h3.set_image(image)
                          [h3_iface,h3_iface_pub] = h3.add_component(model=’NIC_ConnectX_5′, name=”h3_nic”).get_interfaces()

                          # Add host node h4
                          h4 = slice.add_node(name=h4_name, site=site_2)
                          h4.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h4.set_image(image)
                          h4_iface = h4.add_component(model=’NIC_ConnectX_5′, name=”h4_nic”).get_interfaces()[0]

                          # Add host node h5
                          h5 = slice.add_node(name=h5_name, site=site_2)
                          h5.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h5.set_image(image)
                          h5_iface = h5.add_component(model=’NIC_ConnectX_5′, name=”h5_nic”).get_interfaces()[0]

                          #Add control panel networks
                          host_net1 = slice.add_l2network(name=net_1_name, interfaces=[h1_iface,h2_iface, h3_iface, h4_iface,h5_iface])

                           

                          And I got the same error as in the first post.

                          in reply to: Who should we ask for adding more permission tags? #2022
                          Chengyi Qu
                          Participant

                            Hi Paul,

                            Thank you for your reply. I emailed you our requests. Please check.

                            Best,

                            Chengyi

                            in reply to: SSH to the Fabric nodes: Permission denied (publickey) #2018
                            Chengyi Qu
                            Participant

                              I’m also facing the same situation on previous nodes. Seems like all the nodes created before are not accessible. So I decided to recreate them all 🙁

                              in reply to: SSH to the Fabric nodes: Permission denied (publickey) #2011
                              Chengyi Qu
                              Participant

                                Hey Paul,

                                I’m facing similar problems when I want to ssh into my nodes via jupyterhub terminal.

                                ssh -i /home/fabric/.ssh/id_rsa -J cqy78_0038438951@bastion-1.fabric-testbed.net ubuntu@2001:400:a100:3010:f816:3eff:feb6:2d59
                                The authenticity of host ‘bastion-1.fabric-testbed.net (152.54.15.12)’ can’t be established.
                                ECDSA key fingerprint is SHA256:AIRhefx5rhgEfSSoO8NIc6g+ohFQuSU0yn0i7qGUkY8.
                                Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
                                Warning: Permanently added ‘bastion-1.fabric-testbed.net,152.54.15.12’ (ECDSA) to the list of known hosts.
                                The authenticity of host ‘2001:400:a100:3010:f816:3eff:feb6:2d59 (<no hostip for proxy command>)’ can’t be established.
                                ECDSA key fingerprint is SHA256:XhcWRQ69Qw3tX0QEFpoBbrF7vI0SAvBZs3i+acbcSzI.
                                Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
                                Warning: Permanently added ‘2001:400:a100:3010:f816:3eff:feb6:2d59’ (ECDSA) to the list of known hosts.
                                ubuntu@2001:400:a100:3010:f816:3eff:feb6:2d59: Permission denied (publickey).

                                I ran the bastion_setup notebook (which modified the .ssh/config file in my jupyterhub) and was still facing the ‘permission denied’ problem. Is there anything I missed or it is because of the maintenance?

                                Thank you,

                                Chengyi

                                in reply to: Renew slides time doesn’t work #1643
                                Chengyi Qu
                                Participant

                                  NVM, Komal helps me figure it out. Thank you Paul for help.

                                Viewing 15 posts - 1 through 15 (of 16 total)