1. Testbed-wide maintenance

Testbed-wide maintenance

Home Forums FABRIC Announcements Testbed-wide maintenance

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #4538
    Komal Thareja
    Participant

      Dear experimenters,

      FABRIC is open for business again. Much of the maintenance is complete. A number of sites will be showing in maintenance for now:

      • ATLA, LOSA, NEWY, PSC, SEAT, SRI – these do not have dataplane connections yet, we are working to provision it and will make them available as soon as that happens
      • WASH, SALT, DALL, UTAH, TACC, STAR, MAX, MASS, MICH, NCSA – these older sites have undergone a significant upgrade, however we discovered a last minute problem with detaching persistent storage which will be addressed in the next few days and we expect to add these sites back next week.
      • The following sites are immediately available:-
        • CERN, CLEM, GATECH, GPN, INDI and UCSD
      • We are observing some traffic issues with FabNetV4Ext services and investigating that. We will keep you posted once this is resolved. FabNetV6Ext works without issues.

      Please read the release notes for Release 1.5 for details but some of the highlights include:-

      • Long-lived slices – it will now be possible with proper project permissions to keep slices for up to 6 months without renewal
      • Resource availability – we have modified how we count available Cores, RAM and disk in the sites and you should see a significant increase in available capacities
      • NUMA an CPU pinning – it is now possible to pin vCPUs to specific host CPUs to share NUMA domain with attached devices and to tune the memory allocated to the VMs to be on a specific NUMA node
      • In Jupyter Hub you will see multiple options for containers with different versions of examples and FABlib/MFlib
      • MTU has been adjusted on all data plane switches and with the exceptions of the following links has been set to 9100 to allow 9k jumbo frames in the data plane:
        • UTAH-GPN: 9000
        • DALL-TACC 9062
        • LBNL-RENC: 9000
        • UTAH-UCSD: 9060
        • WASH-MASS:9018
      • Performance Tuning of the various pieces now require user to explicitly get the latest slice topology before performing any operations on VMs by explicitly invoking slice=fablib.get_slice(slice_name)

      Thanks,

      Komal

    Viewing 1 post (of 1 total)
    • The topic ‘Testbed-wide maintenance’ is closed to new replies.