1. Chengyi Qu

Chengyi Qu

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 16 total)
  • Author
    Posts
  • in reply to: Any one get lucky with ~100Gbps bandwidth? #2635
    Chengyi Qu
    Participant

      BTW, @Paul. Does FABRIC have network usage plots by time? That way we can see how busy the links are.

      – Chengyi

      in reply to: Any one get lucky with ~100Gbps bandwidth? #2631
      Chengyi Qu
      Participant

        Hi Brandon,

        Thank you for your help! With your settings and sites choose, I can also get 90 or so Gbps, cheers! And in my ConnectX6, it reaches the number too!

        I will test it tomorrow morning again and see, since I suspect no one is using FABRIC at late night now 🙂

        Anyway, thank you for your help. Have a good night~

        Best,

        Chengyi

        in reply to: Any one get lucky with ~100Gbps bandwidth? #2629
        Chengyi Qu
        Participant

          Thank you, Rice for the test! It was me occupied two NICs between SALT and UTAH. And I tested this morning with 32 parallel streams on ConnectX6s and achieved 56Gbps. I don’t want to release my reservation, so would you mind sharing your notebooks with me so I could test based on your settings on my side? Thank you!

           

          in reply to: User is not a member of project: #2389
          Chengyi Qu
          Participant

            Facing the same problem…. Seems like Fabric is updating the format of environment configuration…. with a fabric_config directory…

            in reply to: get_physical_os_interface()[‘ifname’] failed #2379
            Chengyi Qu
            Participant

              Hi Paul,

              After a few test, I still cannot reach 30G for the tests. (I can only get at most 20 between Utah and Salt with ConnectX6 NIC) Would you like to share your settings so that I could duplicate and see the results? Some settings e.g., how may cores/RAM you use for test, how many parallel tasks you start for the iperf3 test, between which 2 sites you process your test, and use which NIC you test and how the network is set?

              Thank you so much for your help..

              in reply to: get_physical_os_interface()[‘ifname’] failed #2323
              Chengyi Qu
              Participant

                Thank you for your advices, Paul! I basically use the instructions here: https://srcc.stanford.edu/100g-network-adapter-tuning and here: https://fasterdata.es.net/host-tuning/linux/ for TCP, and https://fasterdata.es.net/host-tuning/linux/udp-tuning/ for UDP. Specifically:

                ——-TCP

                /etc/sysctl.conf
                net.core.rmem_max = 268435456
                net.core.wmem_max = 268435456
                net.ipv4.tcp_rmem = 4096 87380 134217728
                net.ipv4.tcp_wmem = 4096 65536 134217728
                net.ipv4.tcp_congestion_control=bbr
                net.ipv4.tcp_mtu_probing=1
                net.core.default_qdisc = fq
                net.core.netdev_max_backlog = 250000
                net.ipv4.tcp_no_metrics_save = 1

                $ ethtool -K <eth1> lro on
                $ ifconfig <eth1> txqueuelen 20000
                $ systemctl stop irqbalance

                ———–UDP

                $ iperf3 -s
                $ iperf3 -l8972 -u -w4m -b0 -A 4,4 -c 192.168.1.1 -t 60

                I can try later for i) nearer nodes and ii) connectX-6 cards to explore better results.

                And I will also follow up your advice on UDP tuning and try to find a good b/w.

                Looking forward to useful examples on fully usage of the Fabric network link capacities.

                in reply to: get_physical_os_interface()[‘ifname’] failed #2303
                Chengyi Qu
                Participant

                  BTW…

                  Cores RAM Disk
                  8 32 100

                  in reply to: get_physical_os_interface()[‘ifname’] failed #2300
                  Chengyi Qu
                  Participant

                    According to the tuning instruction, for now, if the site is STAR and SALT (between which I assume is a 100 Gpbs link),  and with Basic 100G NIC, I can achieve, with TCP:

                    [ ID] Interval Transfer Bitrate Retr
                    [ 5] 0.00-60.00 sec 86.8 GBytes 12.4 Gbits/sec 806071 sender
                    [ 5] 0.00-60.04 sec 86.8 GBytes 12.4 Gbits/sec receiver

                    Which I think it is not as ideal as I expect. And with UDP test, I can achieve:

                    [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
                    [ 5] 0.00-60.00 sec 39.7 GBytes 5.68 Gbits/sec 0.000 ms 0/4746870 (0%) sender
                    [ 5] 0.00-60.03 sec 438 MBytes 61.3 Mbits/sec 0.003 ms 2483897/2535131 (98%) receiver

                    Which also not that much.

                    Is there any way I could increase the bandwidth? Thank you so much!

                    Chengyi Qu
                    Participant

                      Thank you for your help, Komal!

                      in reply to: Create L3 network error #2026
                      Chengyi Qu
                      Participant

                        Thank you for the explaination. I tried to assigne ip addresses and ping between nodes, and everything works fine as normal.

                        Thank you for your help. I will just ignore this exception.

                        in reply to: Create L3 network error #2024
                        Chengyi Qu
                        Participant

                          When I want to reserve l2 network, still facing the same problem….

                          Here is my code:

                          # Add host node h1
                          h1 = slice.add_node(name=h1_name, site=site_1)
                          h1.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h1.set_image(image)
                          h1_iface = h1.add_component(model=’NIC_ConnectX_5′, name=”h1_nic”).get_interfaces()[0]

                          # Add host node h2
                          h2 = slice.add_node(name=h2_name, site=site_1)
                          h2.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h2.set_image(image)
                          h2_iface = h2.add_component(model=’NIC_ConnectX_5′, name=”h2_nic”).get_interfaces()[0]

                          # Add host node h3
                          h3 = slice.add_node(name=h3_name, site=site_2)
                          h3.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h3.set_image(image)
                          [h3_iface,h3_iface_pub] = h3.add_component(model=’NIC_ConnectX_5′, name=”h3_nic”).get_interfaces()

                          # Add host node h4
                          h4 = slice.add_node(name=h4_name, site=site_2)
                          h4.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h4.set_image(image)
                          h4_iface = h4.add_component(model=’NIC_ConnectX_5′, name=”h4_nic”).get_interfaces()[0]

                          # Add host node h5
                          h5 = slice.add_node(name=h5_name, site=site_2)
                          h5.set_capacities(cores=host_cores, ram=host_ram, disk=host_disk)
                          h5.set_image(image)
                          h5_iface = h5.add_component(model=’NIC_ConnectX_5′, name=”h5_nic”).get_interfaces()[0]

                          #Add control panel networks
                          host_net1 = slice.add_l2network(name=net_1_name, interfaces=[h1_iface,h2_iface, h3_iface, h4_iface,h5_iface])

                           

                          And I got the same error as in the first post.

                          in reply to: Who should we ask for adding more permission tags? #2022
                          Chengyi Qu
                          Participant

                            Hi Paul,

                            Thank you for your reply. I emailed you our requests. Please check.

                            Best,

                            Chengyi

                            in reply to: SSH to the Fabric nodes: Permission denied (publickey) #2018
                            Chengyi Qu
                            Participant

                              I’m also facing the same situation on previous nodes. Seems like all the nodes created before are not accessible. So I decided to recreate them all 🙁

                              in reply to: SSH to the Fabric nodes: Permission denied (publickey) #2011
                              Chengyi Qu
                              Participant

                                Hey Paul,

                                I’m facing similar problems when I want to ssh into my nodes via jupyterhub terminal.

                                ssh -i /home/fabric/.ssh/id_rsa -J cqy78_0038438951@bastion-1.fabric-testbed.net ubuntu@2001:400:a100:3010:f816:3eff:feb6:2d59
                                The authenticity of host ‘bastion-1.fabric-testbed.net (152.54.15.12)’ can’t be established.
                                ECDSA key fingerprint is SHA256:AIRhefx5rhgEfSSoO8NIc6g+ohFQuSU0yn0i7qGUkY8.
                                Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
                                Warning: Permanently added ‘bastion-1.fabric-testbed.net,152.54.15.12’ (ECDSA) to the list of known hosts.
                                The authenticity of host ‘2001:400:a100:3010:f816:3eff:feb6:2d59 (<no hostip for proxy command>)’ can’t be established.
                                ECDSA key fingerprint is SHA256:XhcWRQ69Qw3tX0QEFpoBbrF7vI0SAvBZs3i+acbcSzI.
                                Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
                                Warning: Permanently added ‘2001:400:a100:3010:f816:3eff:feb6:2d59’ (ECDSA) to the list of known hosts.
                                ubuntu@2001:400:a100:3010:f816:3eff:feb6:2d59: Permission denied (publickey).

                                I ran the bastion_setup notebook (which modified the .ssh/config file in my jupyterhub) and was still facing the ‘permission denied’ problem. Is there anything I missed or it is because of the maintenance?

                                Thank you,

                                Chengyi

                                in reply to: Renew slides time doesn’t work #1643
                                Chengyi Qu
                                Participant

                                  NVM, Komal helps me figure it out. Thank you Paul for help.

                                Viewing 15 posts - 1 through 15 (of 16 total)
                                FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

                                KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)
                                FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

                                KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)