Home › Forums › FABRIC General Questions and Discussion › Slice resubmit fails with already configured error.
- This topic has 2 replies, 2 voices, and was last updated 5 months, 1 week ago by Nishanth Shyamkumar.
-
AuthorPosts
-
July 9, 2024 at 2:26 pm #7223
Hi,
In my port mirroring experiment, if I fail to get sufficient resources during slice submission, my program reduces the requirements and resubmits itself.
To that effect I delete the current failed slice(which is in either StableOK or Closing state) and retry after a wait period.However I get the following error on some sites:
failed lease update- all units failed priming: Exception during create for unit: 4b99e670-f7e7-406f-9b25-e99bbbf293f9 Playbook has failed tasks: NSO commit returned JSON-RPC error: type: rpc.method.failed, code: -32000, message: Method failed, data: message: External error in the NED implementation for device max-data-sw: Tue Jul 2 19:09:36.967 UTCrnrn Failed to commit one or more configuration items during a pseudo-atomic operation. All changes made have been reverted.rn SEMANTIC ERRORS: This configuration was rejected by rn the system due to semantic errors. The individual rn errors with each failed configuration command can be rn found below.rnrnrnmonitor-session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 ethernetrn destination interface TwentyFiveGigE0/0/0/16/2rn Destination interface TwentyFiveGigE0/0/0/16/2 for session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 is already configured for use with session mon_MAX_MAX_node0_pm-e80ed1e2-e96b-4724-885a-04c1a7173bc4rnrnend, internal: jsonrpc_tx_commit357#all units failed priming: Exception during create for unit: 4b99e670-f7e7-406f-9b25-e99bbbf293f9 Playbook has failed tasks: NSO commit returned JSON-RPC error: type: rpc.method.failed, code: -32000, message: Method failed, data: message: External error in the NED implementation for device max-data-sw: Tue Jul 2 19:09:36.967 UTCrnrn Failed to commit one or more configuration items during a pseudo-atomic operation. All changes made have been reverted.rn SEMANTIC ERRORS: This configuration was rejected by rn the system due to semantic errors. The individual rn errors with each failed configuration command can be rn found below.rnrnrnmonitor-session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 ethernetrn destination interface TwentyFiveGigE0/0/0/16/2rn Destination interface TwentyFiveGigE0/0/0/16/2 for session mon_MAX_MAX_node2_pm-4b99e670-f7e7-406f-9b25-e99bbbf293f9 is already configured for use with session mon_MAX_MAX_node0_pm-e80ed1e2-e96b-4724-885a-04c1a7173bc4rnrnend, internal: jsonrpc_tx_commit357# It complains that the mirrored port is already configured in another session. Is there a way to ensure that my slice deletion succeeds first, before I try my resubmit ?
July 9, 2024 at 3:19 pm #7224Hi Nishant,
It appears that a network service has leaked. In a distributed system like our testbed, encountering some leaked resources is not unusual. We plan to deploy updates in the coming week to address this issue. In the meantime, I recommend introducing a delay between deletion and recreation, as the resources are distributed across the testbed.
For now, I have cleaned up the leaked services, so provisioning should work.
Also, if possible could you please share your notebook or code- snippet that might help reproduce this state. Would be super helpful to debug and address this issue? Appreciate your help with this!
Best regards,
Komal
July 12, 2024 at 3:59 pm #7243Hi Komal,
Here is a code snippet, it’s a bit complex since there are a some design mechanisms at play here. However, the essential part is:
There is a while loop that attempts to setup the slice and request port mirror resources by invoking setup_slice(). If it fails, then the failed slice is deleted and in the next attempt, the number of VMs requested are reduced and the slice creation is once again requested.def setup_slice(): … A block of code that checks for available smartNICS and assigns a VM for each. Splits total available switch ports on 1 site into N groups, where N is the number of VMs. Specify other resources like CPU, RAM etc. … pmnet = {} num_pmservices = {} # Track mirrored port count per VM listener_pmservice_name = {} ports_mirrored = {} # Track mirrored port count per site random.seed(None, 2) for listener_site in listener_sites: pmnet[listener_site]=[] # To keep track of ports mirrored on each site, within the port list ports_mirrored[listener_site] = 0 j = 0 max_active_ports = port_count[listener_site] for listener_node in listener_nodes[listener_site]: k = 0 listener_interface_idx = 0 listener_pmservice_name[listener_node] = [] node_name = listener_node_name[listener_site][j] avail_port_node_maxcnt = len(mod_port_list[listener_site][node_name]) # Each node(VM) monitors an assigned fraction of the total available ports. for listener_interface in listener_interfaces[node_name]: #print(f’listener_interface = {listener_interface}’) if (listener_interface_idx % 2 == 0): random_index = random.randint(0, int(avail_port_node_maxcnt / 2 – 1)) # first listener interface of NIC randomizes within the first half else: random_index = random.randint(int(avail_port_node_maxcnt/2), avail_port_node_maxcnt – 1) # second listener interface randomizes within the second half listener_interface_idx += 1 if ports_mirrored[listener_site] < max_active_ports: listener_pmservice_name[listener_node].append(f'{listener_site}_{node_name}_pmservice{ports_mirrored[listener_site]}’) pmnet[listener_site].append(pmslice.add_port_mirror_service(name=listener_pmservice_name[listener_node][k], mirror_interface_name=mod_port_list[listener_site][node_name][random_index], receive_interface=listener_interface, mirror_direction = listener_direction[listener_site])) with open(startup_log_file, “a”) as slog: slog.write(f”{listener_site}# mirror interface name: {mod_port_list[listener_site][node_name][random_index]} mirrored to {listener_interface}\n”) slog.close() ports_mirrored[listener_site] = ports_mirrored[listener_site] + 1 k = k + 1 else: with open(startup_log_file, “a”) as slog: slog.write(f”No more ports available for mirroring\n”) slog.close() break j = j + 1 num_pmservices[listener_node] = k #Submit Slice Request port_reduce_count = 0 retry = 0 while (retry != 1): try: setup_slice(port_reduce_count) pmslice.submit(progress=True, wait_timeout=2400, wait_interval=120) if pmslice.get_state() == “StableError”: raise Exception(“Slice state is StableError”) retry = 1 except Exception as e: if pmslice.get_state() == “StableError”: fablib.delete_slice(listener_slice_name) else: pmslice.delete() time.sleep(120) -
AuthorPosts
- You must be logged in to reply to this topic.