1. Paul Ruth

Paul Ruth

Forum Replies Created

Viewing 4 posts - 271 through 274 (of 274 total)
  • Author
    Posts
  • in reply to: Questions about node ports and bandwidth between sites #516
    Paul Ruth
    Keymaster

      The best examples will be in the Jupyterhub environment. We have pre-installed a suite of example notebooks from the following github repo. The notebooks are currently in development and will improve over time. You may need to do a git pull on the repo to get the newest example notebooks.

      https://github.com/fabric-testbed/jupyter-examples

      Specifically, the examples in this folder will be useful:

      https://github.com/fabric-testbed/jupyter-examples/tree/master/fabric_examples/basic_examples

      These notebooks are very new and maybe incomplete. I will try to update the notebooks today with the most current information.

      More generally, there are 3 types of layer2 “Network Services” on FABRIC (layer3 services are still in development):

      1. L2Bridge: These bridges are like a local network switch/bridge that connects any number of local nodes within a single site.  These local bridges are directly connected to the nodes so your bandwidth will be limited by the maximum bandwidth of the NICs that you are using (i.e. ConnectX_6 NICs will provide 100Gbps). This bridge is not programable and only performs simple MAC learning. The key use of these bridges are that they can only connect to nodes within a single FABRIC site.
      2. L2P2P (Peer-to-peer):  These are peer-to-peer layer2 circuits that connect exactly 2 nodes. These nodes must be on different FABRIC sites.  These circuits will have user specified QoS. QoS is currently in development and is not yet available.  Once QoS is is available, user will be able to request dedicated bandwidths on these circuits.
      3. L2S2S (Site-to-site):  Site-2-site is a hybrid of L2Bridge and L2P2P with some limitations. With S2S a user gets a pair of L2Bridges on different FABRIC sites that have a circuit connecting them.  Any number of nodes on either of the sites can be connected to the single S2S Network Service.  All nodes connected to the S2S service are on the same L2 network and can use the same layer3 subnet. One big limitation of S2S is that the wide are circuit will be best effort and will not have any guaranteed QoS.  If you want guaranteed QoS, you will need to use L2P2P and setup your own routing or switching on each end.

       

      Although the notebooks include examples of how to configure and use the data plane networks, the context below might be helpful.

      When you add an interface to a network service you have a couple options. The interfaces can be tagged or untagged with VLANs.  By default adding an interface to a network service will result in an untagged interface.

      • untagged interfaces: These behave like an access port. In other words, any VLAN FABRIC assigned tags will be stripped from the layer2 traffic before it is passed to the user’s node.  The user’s node will not need to process VLAN tags.

       

      • tagged interfaces: The interfaces behave lit a trunk port. VLAN tags are left on the l2 traffic that enters the node. In this case, the user must process the VLAN tags with in the node. FABRIC allows the user to specify the VLAN tag that should be on the traffic as it enters the node. There are separate name spaces for these tags per interface so users are free to use any VLAN tag they wish and there will be no conflict with other users.

      Tagged example that applies VLAN tag 200:

      
      n1.add_component(model_type=ComponentModelType.SmartNIC_ConnectX_6, name='n1-nic1')
      n2.add_component(model_type=ComponentModelType.SmartNIC_ConnectX_5, name='n2-nic1')
      
      n1_iface=n1.interface_list[0]
      n2_iface=n2.interface_list[0]
      
      t.add_network_service(name='ptp1', nstype=ServiceType.L2PTP,
      interfaces=[n1_iface, n2_iface])
      
      if_labels = n1_iface.get_property(pname="labels")
      if_labels.vlan = "200"
      i.set_properties(labels=if_labels)
      
      if_labels = n2_iface.get_property(pname="labels")
      if_labels.vlan = "200"
      i.set_properties(labels=if_labels)

      Let me know if this helps

      in reply to: Questions about node ports and bandwidth between sites #478
      Paul Ruth
      Keymaster

        Update:

        I tried this myself and was able to get ~6Gbps but only after tuning as suggested by the ESnet site.  I did this with VMs at UKY and LBNL. Both VMs were bigger than the default (32 cores, 64G ram… this is probably bigger than necessary).

        I also found that jumbo frames is not yet possible between these sites. We are working on making this possible soon.

        in reply to: Questions about node ports and bandwidth between sites #477
        Paul Ruth
        Keymaster

          Generally, eth0 is used as a management interface. This is the network that you use when you ssh to the node from the Internet.  You should avoid using this network for experiments.

          The interfaces numbered eth1 (or higher) will be the ones associated with the network component(s) that you have added to your node. These are the ones you should use for experiments.

          Re: the slow performance of the experimental network.  Our initial deployment does not yet use the dedicated L1 circuits that we will have as they become available. Instead it uses I2 AL2S.  However, even with AL2S you should be able to get much higher bandwidth.  I would expect you could get over 10Gbps (maybe even as much at 100Gbps).  There are a couple of possible issues:

          1. Our network deployment needs to be configured/tuned correctly. This is such low bandwidth that I suspect something in the path is dropping packets.  What slice configuration did you use? I assume you have one node at UKY and one at LBNL, is this true? Also, which components did you include on the nodes?
          2. Your end hosts need to be tuned for high-latency, high-bandwidth data transfers.  From the ifconfig info I can see that your nodes are not using jumbo frames.  There are probably some other tuning optimization you can make. ESnet has a great resource for learning about this:  https://fasterdata.es.net/host-tuning/linux/ 

          Please let us know which components you are using in the VMs. I would like to try this myself and see if I have the same issues.

          Paul

           

           

          in reply to: Map/GUI Notebooks #417
          Paul Ruth
          Keymaster

            Can you post post the current version of the notebook ipynb file?

          Viewing 4 posts - 271 through 274 (of 274 total)