1. Slice resubmit fails with already configured error.

Slice resubmit fails with already configured error.

Home Forums FABRIC General Questions and Discussion Slice resubmit fails with already configured error.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #7223
    Nishanth Shyamkumar
    Participant

      Hi,

      In my port mirroring experiment, if I fail to get sufficient resources during slice submission, my program reduces the requirements and resubmits itself.
      To that effect I delete the current failed slice(which is in either StableOK or Closing state) and retry after a wait period.

      However I get the following error on some sites:

      failed lease update- all units failed priming: Exception during create for unit: 4b99e670-f7e7-406f-9b25-e99bbbf293f9 Playbook has failed tasks: NSO commit returned JSON-RPC error: type: rpc.method.failed, code: -32000, message: Method failed, data: message: External error in the NED implementation for device max-data-sw: Tue Jul  2 19:09:36.967 UTCrnrn Failed to commit one or more configuration items during a pseudo-atomic operation. All changes made have been reverted.rn  SEMANTIC ERRORS: This configuration was rejected by rn the system due to semantic errors. The individual rn errors with each failed configuration command can be rn found below.rnrnrnmonitor-session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 ethernetrn destination interface TwentyFiveGigE0/0/0/16/2rn Destination interface TwentyFiveGigE0/0/0/16/2 for session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 is already configured for use with session mon_MAX_MAX_node0_pm-e80ed1e2-e96b-4724-885a-04c1a7173bc4rnrnend, internal: jsonrpc_tx_commit357#all units failed priming: Exception during create for unit: 4b99e670-f7e7-406f-9b25-e99bbbf293f9 Playbook has failed tasks: NSO commit returned JSON-RPC error: type: rpc.method.failed, code: -32000, message: Method failed, data: message: External error in the NED implementation for device max-data-sw: Tue Jul  2 19:09:36.967 UTCrnrn Failed to commit one or more configuration items during a pseudo-atomic operation. All changes made have been reverted.rn  SEMANTIC ERRORS: This configuration was rejected by rn the system due to semantic errors. The individual rn errors with each failed configuration command can be rn found below.rnrnrnmonitor-session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 ethernetrn destination interface TwentyFiveGigE0/0/0/16/2rn Destination interface TwentyFiveGigE0/0/0/16/2 for session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 is already configured for use with session mon_MAX_MAX_node0_pm-e80ed1e2-e96b-4724-885a-04c1a7173bc4rnrnend, internal: jsonrpc_tx_commit357#

      It complains that the mirrored port is already configured in another session. Is there a way to ensure that my slice deletion succeeds first, before I try my resubmit ?

      #7224
      Komal Thareja
      Participant

        Hi Nishant,

        It appears that a network service has leaked. In a distributed system like our testbed, encountering some leaked resources is not unusual. We plan to deploy updates in the coming week to address this issue. In the meantime, I recommend introducing a delay between deletion and recreation, as the resources are distributed across the testbed.

        For now, I have cleaned up the leaked services, so provisioning should work.

        Also, if possible could you please share your notebook or code- snippet that might help reproduce this state. Would be super helpful to debug and address this issue? Appreciate your help with this!

        Best regards,

        Komal

        #7243
        Nishanth Shyamkumar
        Participant

          Hi Komal,

          Here is a code snippet, it’s a bit complex since there are a some design mechanisms at play here. However, the essential part is:
          There is a while loop that attempts to setup the slice and request port mirror resources by invoking setup_slice(). If it fails, then the failed slice is deleted and in the next attempt, the number of VMs requested are reduced and the slice creation is once again requested.

           

           

           

          def setup_slice():
              …
              A block of code that checks for available smartNICS and assigns a VM for each.
              Splits total available switch ports on 1 site into N groups, where N is the number of VMs.
              Specify other resources like CPU, RAM etc.
              …
              pmnet = {}
              num_pmservices = {}     # Track mirrored port count per VM
              listener_pmservice_name = {}
              ports_mirrored = {}     # Track mirrored port count per site
              random.seed(None, 2)
              for listener_site in listener_sites:
                  pmnet[listener_site]=[]
                  # To keep track of ports mirrored on each site, within the port list
                  ports_mirrored[listener_site] = 0
                  j = 0
                  max_active_ports = port_count[listener_site]
                  for listener_node in listener_nodes[listener_site]:
                      k = 0
                      listener_interface_idx = 0
                      listener_pmservice_name[listener_node] = []
                      node_name = listener_node_name[listener_site][j]
                      avail_port_node_maxcnt = len(mod_port_list[listener_site][node_name])  # Each node(VM) monitors an assigned fraction of the total available ports.
                      for listener_interface in listener_interfaces[node_name]:
                          #print(f’listener_interface = {listener_interface}’)
                          if (listener_interface_idx % 2 == 0):
                              random_index = random.randint(0, int(avail_port_node_maxcnt / 2 – 1))   # first listener interface of NIC randomizes within the first half
                          else:
                              random_index = random.randint(int(avail_port_node_maxcnt/2), avail_port_node_maxcnt – 1) # second listener interface randomizes within the second half
                          listener_interface_idx += 1
                          if ports_mirrored[listener_site] < max_active_ports:
                              listener_pmservice_name[listener_node].append(f'{listener_site}_{node_name}_pmservice{ports_mirrored[listener_site]}’)
                              pmnet[listener_site].append(pmslice.add_port_mirror_service(name=listener_pmservice_name[listener_node][k],
                                                    mirror_interface_name=mod_port_list[listener_site][node_name][random_index],
                                                    receive_interface=listener_interface,
                                                    mirror_direction = listener_direction[listener_site]))
                              with open(startup_log_file, “a”) as slog:
                                  slog.write(f”{listener_site}# mirror interface name: {mod_port_list[listener_site][node_name][random_index]} mirrored to {listener_interface}\n”)
                                  slog.close()
                              ports_mirrored[listener_site] = ports_mirrored[listener_site] + 1
                              k = k + 1
                          else:
                              with open(startup_log_file, “a”) as slog:
                                  slog.write(f”No more ports available for mirroring\n”)
                                  slog.close()
                                  break
                      j = j + 1
                      num_pmservices[listener_node] = k
          #Submit Slice Request
          port_reduce_count = 0
          retry = 0
          while (retry != 1):
              try:
                  setup_slice(port_reduce_count)
                  pmslice.submit(progress=True, wait_timeout=2400, wait_interval=120)
                  if pmslice.get_state() == “StableError”:
                      raise Exception(“Slice state is StableError”)
                  retry = 1
              except Exception as e:
                  if pmslice.get_state() == “StableError”:
                      fablib.delete_slice(listener_slice_name)
                  else:
                      pmslice.delete()
                  time.sleep(120)

           

           

           

           

        Viewing 3 posts - 1 through 3 (of 3 total)
        • You must be logged in to reply to this topic.
        FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

        KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)
        FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

        KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)