1. Slice resubmit fails with already configured error.

Slice resubmit fails with already configured error.

Home Forums FABRIC General Questions and Discussion Slice resubmit fails with already configured error.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #7223
    Nishanth Shyamkumar
    Participant

      Hi,

      In my port mirroring experiment, if I fail to get sufficient resources during slice submission, my program reduces the requirements and resubmits itself.
      To that effect I delete the current failed slice(which is in either StableOK or Closing state) and retry after a wait period.

      However I get the following error on some sites:

      failed lease update- all units failed priming: Exception during create for unit: 4b99e670-f7e7-406f-9b25-e99bbbf293f9 Playbook has failed tasks: NSO commit returned JSON-RPC error: type: rpc.method.failed, code: -32000, message: Method failed, data: message: External error in the NED implementation for device max-data-sw: Tue Jul  2 19:09:36.967 UTCrnrn Failed to commit one or more configuration items during a pseudo-atomic operation. All changes made have been reverted.rn  SEMANTIC ERRORS: This configuration was rejected by rn the system due to semantic errors. The individual rn errors with each failed configuration command can be rn found below.rnrnrnmonitor-session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 ethernetrn destination interface TwentyFiveGigE0/0/0/16/2rn Destination interface TwentyFiveGigE0/0/0/16/2 for session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 is already configured for use with session mon_MAX_MAX_node0_pm-e80ed1e2-e96b-4724-885a-04c1a7173bc4rnrnend, internal: jsonrpc_tx_commit357#all units failed priming: Exception during create for unit: 4b99e670-f7e7-406f-9b25-e99bbbf293f9 Playbook has failed tasks: NSO commit returned JSON-RPC error: type: rpc.method.failed, code: -32000, message: Method failed, data: message: External error in the NED implementation for device max-data-sw: Tue Jul  2 19:09:36.967 UTCrnrn Failed to commit one or more configuration items during a pseudo-atomic operation. All changes made have been reverted.rn  SEMANTIC ERRORS: This configuration was rejected by rn the system due to semantic errors. The individual rn errors with each failed configuration command can be rn found below.rnrnrnmonitor-session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 ethernetrn destination interface TwentyFiveGigE0/0/0/16/2rn Destination interface TwentyFiveGigE0/0/0/16/2 for session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 is already configured for use with session mon_MAX_MAX_node0_pm-e80ed1e2-e96b-4724-885a-04c1a7173bc4rnrnend, internal: jsonrpc_tx_commit357#

      It complains that the mirrored port is already configured in another session. Is there a way to ensure that my slice deletion succeeds first, before I try my resubmit ?

      #7224
      Komal Thareja
      Participant

        Hi Nishant,

        It appears that a network service has leaked. In a distributed system like our testbed, encountering some leaked resources is not unusual. We plan to deploy updates in the coming week to address this issue. In the meantime, I recommend introducing a delay between deletion and recreation, as the resources are distributed across the testbed.

        For now, I have cleaned up the leaked services, so provisioning should work.

        Also, if possible could you please share your notebook or code- snippet that might help reproduce this state. Would be super helpful to debug and address this issue? Appreciate your help with this!

        Best regards,

        Komal

        #7243
        Nishanth Shyamkumar
        Participant

          Hi Komal,

          Here is a code snippet, it’s a bit complex since there are a some design mechanisms at play here. However, the essential part is:
          There is a while loop that attempts to setup the slice and request port mirror resources by invoking setup_slice(). If it fails, then the failed slice is deleted and in the next attempt, the number of VMs requested are reduced and the slice creation is once again requested.

           

           

           

          def setup_slice():
              …
              A block of code that checks for available smartNICS and assigns a VM for each.
              Splits total available switch ports on 1 site into N groups, where N is the number of VMs.
              Specify other resources like CPU, RAM etc.
              …
              pmnet = {}
              num_pmservices = {}     # Track mirrored port count per VM
              listener_pmservice_name = {}
              ports_mirrored = {}     # Track mirrored port count per site
              random.seed(None, 2)
              for listener_site in listener_sites:
                  pmnet[listener_site]=[]
                  # To keep track of ports mirrored on each site, within the port list
                  ports_mirrored[listener_site] = 0
                  j = 0
                  max_active_ports = port_count[listener_site]
                  for listener_node in listener_nodes[listener_site]:
                      k = 0
                      listener_interface_idx = 0
                      listener_pmservice_name[listener_node] = []
                      node_name = listener_node_name[listener_site][j]
                      avail_port_node_maxcnt = len(mod_port_list[listener_site][node_name])  # Each node(VM) monitors an assigned fraction of the total available ports.
                      for listener_interface in listener_interfaces[node_name]:
                          #print(f’listener_interface = {listener_interface}’)
                          if (listener_interface_idx % 2 == 0):
                              random_index = random.randint(0, int(avail_port_node_maxcnt / 2 – 1))   # first listener interface of NIC randomizes within the first half
                          else:
                              random_index = random.randint(int(avail_port_node_maxcnt/2), avail_port_node_maxcnt – 1) # second listener interface randomizes within the second half
                          listener_interface_idx += 1
                          if ports_mirrored[listener_site] < max_active_ports:
                              listener_pmservice_name[listener_node].append(f'{listener_site}_{node_name}_pmservice{ports_mirrored[listener_site]}’)
                              pmnet[listener_site].append(pmslice.add_port_mirror_service(name=listener_pmservice_name[listener_node][k],
                                                    mirror_interface_name=mod_port_list[listener_site][node_name][random_index],
                                                    receive_interface=listener_interface,
                                                    mirror_direction = listener_direction[listener_site]))
                              with open(startup_log_file, “a”) as slog:
                                  slog.write(f”{listener_site}# mirror interface name: {mod_port_list[listener_site][node_name][random_index]} mirrored to {listener_interface}\n”)
                                  slog.close()
                              ports_mirrored[listener_site] = ports_mirrored[listener_site] + 1
                              k = k + 1
                          else:
                              with open(startup_log_file, “a”) as slog:
                                  slog.write(f”No more ports available for mirroring\n”)
                                  slog.close()
                                  break
                      j = j + 1
                      num_pmservices[listener_node] = k
          #Submit Slice Request
          port_reduce_count = 0
          retry = 0
          while (retry != 1):
              try:
                  setup_slice(port_reduce_count)
                  pmslice.submit(progress=True, wait_timeout=2400, wait_interval=120)
                  if pmslice.get_state() == “StableError”:
                      raise Exception(“Slice state is StableError”)
                  retry = 1
              except Exception as e:
                  if pmslice.get_state() == “StableError”:
                      fablib.delete_slice(listener_slice_name)
                  else:
                      pmslice.delete()
                  time.sleep(120)

           

           

           

           

        Viewing 3 posts - 1 through 3 (of 3 total)
        • You must be logged in to reply to this topic.