1. Rasman Mubtasim Swargo

Rasman Mubtasim Swargo

Forum Replies Created

Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • in reply to: Performance Drop on ConnectX-6 After Release 1.9 #9047

    It worked after manually doing the steps you described. Thanks.

    in reply to: Performance Drop on ConnectX-6 After Release 1.9 #9046

    I have booked a slice ( d6065a22-c893-425f-b12f-3bc0fe4d2481 ) with NEWY and CERN nodes which are listed as 320Gbps. This time, it did not get stuck. Everything went smoothly. I am still getting around 3 Gbps.

    Could you please have a look?
    I saw that there is another 8 Gbps line listed for (NewY, CERN). Can you guide me on how to pick sites so that I can get the fastest network speed?

    (‘NEWY’, ‘CERN’) link:local-port+cern-data-sw:FourHundredGigE0/0/0/26.3733:remote-port+newy-data-sw:FourHundredGigE0/0/0/60.3733 320 N/A L2

     

    in reply to: Performance Drop on ConnectX-6 After Release 1.9 #9041

    Your provided snippet gets stuck at the ‘make’ command in both of the nodes:

    ubuntu@Node-GATECH:~/iperf-3.18$ make
    Making all in src
    make[1]: Entering directory '/home/ubuntu/iperf-3.18/src'
    make all-am
    make[2]: Entering directory '/home/ubuntu/iperf-3.18/src'
    CC iperf3-main.o
    main.c:212:1: fatal error: opening dependency file .deps/iperf3-main.Tpo: Permission denied
    212 | }
    | ^
    compilation terminated.
    make[2]: *** [Makefile:974: iperf3-main.o] Error 1
    make[2]: Leaving directory '/home/ubuntu/iperf-3.18/src'
    make[1]: *** [Makefile:733: all] Error 2
    make[1]: Leaving directory '/home/ubuntu/iperf-3.18/src'
    make: *** [Makefile:404: all-recursive] Error 1
    in reply to: Performance Drop on ConnectX-6 After Release 1.9 #9039

    Slice ID: 25c5b6c2-f0f8-4cc9-b4e1-cad570231aca

    One thing I forgot to mention is the execution often gets stuck in slice submission cell. Like, post boot config of one node is usually done but the other gets stuck. It gets stuck at this point, FIU’s node does not deliver the ‘done!’ message:

    Time to StableOK 246 seconds
    Running post_boot_config ... 
    Running post boot config threads ...
    Post boot config Node-GATECH, Done! (16 sec)
    

    Here’s the code:

    sites = ['GATECH', 'FIU']
    print(f"Sites: {sites}")
    
    node1_name = 'Node1'
    node2_name = 'Node2'
    cores=8
    ram=64
    disk=1000
    image='default_ubuntu_20'
    
    slice_name = 'iPerf3-tuned-nic-x6-64gb-1tb-GF-2'
    nic_name = 'nic1'
    model_name = 'NIC_ConnectX_6'
    network_name='net1'
    from ipaddress import ip_address, IPv4Address, IPv6Address, IPv4Network, IPv6Network
    
    subnet = IPv4Network("192.168.1.0/24")
    available_ips = list(subnet)[1:]
    
    #Create Slice
    slice = fablib.new_slice(name=slice_name)
    net1 = slice.add_l2network(name=network_name, subnet=subnet)
    
    for s in sites:
    # Node1
    node1 = slice.add_node(name=f"Node-{s}", cores=cores, ram=ram, disk=disk, site=s, image=image)
    
    iface1 = node1.add_component(model=model_name, name=nic_name).get_interfaces()[0]
    node1.add_component(model='NVME_P4510', name='nvme1')
    iface1.set_mode('auto')
    net1.add_interface(iface1)
    net1.set_bandwidth(50)
    
    node1.add_post_boot_upload_directory('node_tools','.')
    node1.add_post_boot_execute('sudo node_tools/host_tune.sh')
    # node1.add_post_boot_execute('node_tools/enable_docker.sh {{ _self_.image }} ')
    # node1.add_post_boot_execute('docker pull fabrictestbed/slice-vm-ubuntu20-network-tools:0.0.1 ')
    
    #Submit Slice Request
    slice.submit();
    
    
    

    I have to stop the execution and move to the next cell. I’ll report here what I get after running the esnet iperf3. Let me know if you need anything to investigate this issue.

    It’s working now.

    Thanks,

    Swargo

    Hi Rasman,

    I was able to run iperf3 optimized notebook without issues. I am unable to access your notebook. It says Page Not Found.

    Could you please share your slice ID?

    Thanks,

    Komal

    I have tried again, this time without any modifications to the original notebook, but I still got the same error.

    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    Source:  Node-MASS to Dest: Node-TACC
    iperf3: error - unable to connect to server: No route to host
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    Source:  Node-TACC to Dest: Node-MASS
    iperf3: error - unable to connect to server: No route to host
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

     

    Slice ID: 919b590f-87aa-41e8-bb22-58acf4c79d4c

    Hi Rasman,

    I was able to run iperf3 optimized notebook without issues. I am unable to access your notebook. It says Page Not Found.

    Could you please share your slice ID?

    Thanks,

    Komal

    Hi Komal,
    Here is the slice ID: f05dedc0-468f-406e-a566-8041d507ad60

    I am trying to transfer some files from MASS to TACC but it is failing to find any route.

     

    Here is the notebook that I used: https://github.com/swargo98/LLM-based-Data-Movement-Optimizer/blob/main/iperf3_optimized_w_error.ipynb

    Sorry, I could not upload the notebook as the file type is not supported.

    Here is the github link of the notebook: https://github.com/swargo98/LLM-based-Data-Movement-Optimizer/blob/main/iperf3_optimized_w_error.ipynb

Viewing 8 posts - 1 through 8 (of 8 total)