Forum Replies Created
-
AuthorPosts
-
Hi,
A follow up question on this,
1) Does this mean that HC always holds the correct value of that counter ?
2) What happens to non-HC counters when it exceeds 32 bits? Does it get set to 2^32 – 1, or does it overflow and we see the remainder (true value % (2^32)) in this field ?Thanks Komal, I tested it and it is working without any issues after the update.
Thanks for the info. Is there some way to get the maintenance status of a site through some API , or must the user just keep track of it through forum updates?
Hi Komal,
Here is a code snippet, it’s a bit complex since there are a some design mechanisms at play here. However, the essential part is:
There is a while loop that attempts to setup the slice and request port mirror resources by invoking setup_slice(). If it fails, then the failed slice is deleted and in the next attempt, the number of VMs requested are reduced and the slice creation is once again requested.def setup_slice(): … A block of code that checks for available smartNICS and assigns a VM for each. Splits total available switch ports on 1 site into N groups, where N is the number of VMs. Specify other resources like CPU, RAM etc. … pmnet = {} num_pmservices = {} # Track mirrored port count per VM listener_pmservice_name = {} ports_mirrored = {} # Track mirrored port count per site random.seed(None, 2) for listener_site in listener_sites: pmnet[listener_site]=[] # To keep track of ports mirrored on each site, within the port list ports_mirrored[listener_site] = 0 j = 0 max_active_ports = port_count[listener_site] for listener_node in listener_nodes[listener_site]: k = 0 listener_interface_idx = 0 listener_pmservice_name[listener_node] = [] node_name = listener_node_name[listener_site][j] avail_port_node_maxcnt = len(mod_port_list[listener_site][node_name]) # Each node(VM) monitors an assigned fraction of the total available ports. for listener_interface in listener_interfaces[node_name]: #print(f’listener_interface = {listener_interface}’) if (listener_interface_idx % 2 == 0): random_index = random.randint(0, int(avail_port_node_maxcnt / 2 – 1)) # first listener interface of NIC randomizes within the first half else: random_index = random.randint(int(avail_port_node_maxcnt/2), avail_port_node_maxcnt – 1) # second listener interface randomizes within the second half listener_interface_idx += 1 if ports_mirrored[listener_site] < max_active_ports: listener_pmservice_name[listener_node].append(f'{listener_site}_{node_name}_pmservice{ports_mirrored[listener_site]}’) pmnet[listener_site].append(pmslice.add_port_mirror_service(name=listener_pmservice_name[listener_node][k], mirror_interface_name=mod_port_list[listener_site][node_name][random_index], receive_interface=listener_interface, mirror_direction = listener_direction[listener_site])) with open(startup_log_file, “a”) as slog: slog.write(f”{listener_site}# mirror interface name: {mod_port_list[listener_site][node_name][random_index]} mirrored to {listener_interface}\n”) slog.close() ports_mirrored[listener_site] = ports_mirrored[listener_site] + 1 k = k + 1 else: with open(startup_log_file, “a”) as slog: slog.write(f”No more ports available for mirroring\n”) slog.close() break j = j + 1 num_pmservices[listener_node] = k #Submit Slice Request port_reduce_count = 0 retry = 0 while (retry != 1): try: setup_slice(port_reduce_count) pmslice.submit(progress=True, wait_timeout=2400, wait_interval=120) if pmslice.get_state() == “StableError”: raise Exception(“Slice state is StableError”) retry = 1 except Exception as e: if pmslice.get_state() == “StableError”: fablib.delete_slice(listener_slice_name) else: pmslice.delete() time.sleep(120) Hi Komal,
I tried this and it still does not work. Here are the fabric packages in my environment:
[code]
pip list | grep fab │
fabric-credmgr-client 1.6.1 │
fabric_fim 1.6.1 │
fabric_fss_utils 1.5.1 │
fabric-orchestrator-client 1.6.1 │
fabrictestbed 1.6.9 │
fabrictestbed-extensions 1.6.5[/code]
The fabrictestbed is at 1.6.9, yet the slice_manager.py and specifically the __load_tokens still has the refresh token Exception check.
Hi Komal,
Looking at the source code, the required change in slice_manager.py is not present on the main branch. It is available in the other branches: adv-res, llt and 1.7
Should I use one of these branches to use the long lived tokens?
Essentially:
pip install git+https://github.com/fabric-testbed/fabrictestbed@1.7Hi Komal,
I am using fablib from within a Python program. Can you let me know which branch of fabrictestbed-extensions should I use to have this updated change? Is it the main branch? Or branch 1.7?
pip install git+https://github.com/fabric-testbed/fabrictestbed-extensions@main
Thanks, so it does indeed stand for disk space.
When I look at the graphical stats on the Fabric Portal, it mentions that TACC has 103263/107463 GB free (it may not be the latest info, but I don’t think it varies by much). How can I ask Fabric to assign my VM on an underlying server where there is enough hard disk space ?
“These 4 sites will be placed in pre-maintenance mode several days in advance so that no new experiments can be created after the indicated date. We apologize for any inconvenience this may cause.”
Which is the indicated date mentioned here? Is it the date that these sites go into pre-maintenance or is it Jan 1st? In other words, can I create new slivers on these sites until Jan 1st?
Thanks Ilya. This solution worked. It requires a slight bit of manual intervention, but is still mostly scriptable. It can possibly be fully scripted as well using the Fablib APIs, and I will look into that when time permits.
Thanks Paul, I looked into the source code for this and I saw that the ‘main’ branch actually includes a change that propagates the wait_timeout parameter to the self.wait() function call. However it’s available only on the ‘beyond bleeding’ while I was testing on the ‘bleeding’ framework.
I am still sticking with the ‘bleeding’ framework as of now because the code is structured in such a way that if I set progress=True (which is the default), then the wait_timeout propagates to self.wait(). I was testing earlier with progress=False, since I didn’t want the overhead of GUI representation of the data, but I got to use it for now at least.
I tested with progress=True and wait_timeout=2400 and it works for now. The slice submission takes between 1000 to 1500 seconds to complete, but it does succeed in the end.
Thanks for the information.
Xi, following your guidance, I double checked the ports and for AMST it’s using 2 vlans on the same port, and my code was treating it as 2 different uplink ports, which is where the duplication happened. I fixed it on my side and now AMST is being provisioned successfully.
I was unable to recreate the above issue in SEAT at the moment. If I see this issue in SEAT again, I will update with a message on this thread. Thanks for the help.
Thanks, I agree with you Ilya. I have seen some of the APIs in the link you shared, and they can delete files, and shutdown the kernel etc. which is a huge security risk.
It’s probably more pragmatic to download using the right click option via the UI.Thanks Ilya for the very useful answer. I was able to download my compressed file with a GET request via curl using this method.
Just some additional information for others,
1) To access the Hub Panel in Jupyter notebook, click on ‘File’->’Hub Control Panel’2) From here click on ‘Token’->’Request New API token’ and save the generated token number, which can be used in the curl request as a header (-H).
I still have to look into generating tokens without manual intervention, that will close the automation loop. If there are any updates I will post them here.
-
AuthorPosts