1. Nishanth Shyamkumar

Nishanth Shyamkumar

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 19 total)
  • Author
    Posts
  • in reply to: Infrastructure-metrics queries #7888
    Nishanth Shyamkumar
    Participant

      Hi,

      A follow up question on this,
      1) Does this mean that HC always holds the correct value of that counter ?
      2) What happens to non-HC counters when it exceeds 32 bits? Does it get set to 2^32 – 1, or does it overflow and we see the remainder (true value % (2^32)) in this field ?

      in reply to: How to use long-lived tokens in experiments #7283
      Nishanth Shyamkumar
      Participant

        Thanks Komal, I tested it and it is working without any issues after the update.

        in reply to: No candidate nodes found error #7244
        Nishanth Shyamkumar
        Participant

          Thanks for the info. Is there some way to get the maintenance status of a site through some API , or must the user just keep track of it through forum updates?

          in reply to: Slice resubmit fails with already configured error. #7243
          Nishanth Shyamkumar
          Participant

            Hi Komal,

            Here is a code snippet, it’s a bit complex since there are a some design mechanisms at play here. However, the essential part is:
            There is a while loop that attempts to setup the slice and request port mirror resources by invoking setup_slice(). If it fails, then the failed slice is deleted and in the next attempt, the number of VMs requested are reduced and the slice creation is once again requested.

             

             

             

            def setup_slice():
                …
                A block of code that checks for available smartNICS and assigns a VM for each.
                Splits total available switch ports on 1 site into N groups, where N is the number of VMs.
                Specify other resources like CPU, RAM etc.
                …
                pmnet = {}
                num_pmservices = {}     # Track mirrored port count per VM
                listener_pmservice_name = {}
                ports_mirrored = {}     # Track mirrored port count per site
                random.seed(None, 2)
                for listener_site in listener_sites:
                    pmnet[listener_site]=[]
                    # To keep track of ports mirrored on each site, within the port list
                    ports_mirrored[listener_site] = 0
                    j = 0
                    max_active_ports = port_count[listener_site]
                    for listener_node in listener_nodes[listener_site]:
                        k = 0
                        listener_interface_idx = 0
                        listener_pmservice_name[listener_node] = []
                        node_name = listener_node_name[listener_site][j]
                        avail_port_node_maxcnt = len(mod_port_list[listener_site][node_name])  # Each node(VM) monitors an assigned fraction of the total available ports.
                        for listener_interface in listener_interfaces[node_name]:
                            #print(f’listener_interface = {listener_interface}’)
                            if (listener_interface_idx % 2 == 0):
                                random_index = random.randint(0, int(avail_port_node_maxcnt / 2 – 1))   # first listener interface of NIC randomizes within the first half
                            else:
                                random_index = random.randint(int(avail_port_node_maxcnt/2), avail_port_node_maxcnt – 1) # second listener interface randomizes within the second half
                            listener_interface_idx += 1
                            if ports_mirrored[listener_site] < max_active_ports:
                                listener_pmservice_name[listener_node].append(f'{listener_site}_{node_name}_pmservice{ports_mirrored[listener_site]}’)
                                pmnet[listener_site].append(pmslice.add_port_mirror_service(name=listener_pmservice_name[listener_node][k],
                                                      mirror_interface_name=mod_port_list[listener_site][node_name][random_index],
                                                      receive_interface=listener_interface,
                                                      mirror_direction = listener_direction[listener_site]))
                                with open(startup_log_file, “a”) as slog:
                                    slog.write(f”{listener_site}# mirror interface name: {mod_port_list[listener_site][node_name][random_index]} mirrored to {listener_interface}\n”)
                                    slog.close()
                                ports_mirrored[listener_site] = ports_mirrored[listener_site] + 1
                                k = k + 1
                            else:
                                with open(startup_log_file, “a”) as slog:
                                    slog.write(f”No more ports available for mirroring\n”)
                                    slog.close()
                                    break
                        j = j + 1
                        num_pmservices[listener_node] = k
            #Submit Slice Request
            port_reduce_count = 0
            retry = 0
            while (retry != 1):
                try:
                    setup_slice(port_reduce_count)
                    pmslice.submit(progress=True, wait_timeout=2400, wait_interval=120)
                    if pmslice.get_state() == “StableError”:
                        raise Exception(“Slice state is StableError”)
                    retry = 1
                except Exception as e:
                    if pmslice.get_state() == “StableError”:
                        fablib.delete_slice(listener_slice_name)
                    else:
                        pmslice.delete()
                    time.sleep(120)

             

             

             

             

            in reply to: How to use long-lived tokens in experiments #7207
            Nishanth Shyamkumar
            Participant

              Hi Komal,

              I tried this and it still does not work. Here are the fabric packages in my environment:

              [code]

              pip list | grep fab │
              fabric-credmgr-client 1.6.1 │
              fabric_fim 1.6.1 │
              fabric_fss_utils 1.5.1 │
              fabric-orchestrator-client 1.6.1 │
              fabrictestbed 1.6.9 │
              fabrictestbed-extensions 1.6.5

              [/code]

              The fabrictestbed is at 1.6.9, yet the slice_manager.py and specifically the __load_tokens still has the refresh token Exception check.

              in reply to: How to use long-lived tokens in experiments #7125
              Nishanth Shyamkumar
              Participant

                Hi Komal,

                Looking at the source code, the required change in slice_manager.py is not present on the main branch. It is available in the other branches: adv-res, llt and 1.7
                Should I use one of these branches to use the long lived tokens?
                Essentially:
                pip install git+https://github.com/fabric-testbed/fabrictestbed@1.7

                in reply to: How to use long-lived tokens in experiments #7093
                Nishanth Shyamkumar
                Participant

                  Hi Komal,

                  I am using fablib from within a Python program. Can you let me know which branch of fabrictestbed-extensions should I use to have this updated change? Is it the main branch? Or branch 1.7?

                  pip install git+https://github.com/fabric-testbed/fabrictestbed-extensions@main

                  in reply to: TACC always failing with insufficient resources:Disk# #7059
                  Nishanth Shyamkumar
                  Participant

                    Thanks, so it does indeed stand for disk space.

                    When I look at the graphical stats on the Fabric Portal, it mentions that TACC has 103263/107463 GB free (it may not be the latest info, but I don’t think it varies by much). How can I ask Fabric to assign my VM on an underlying server where there is enough hard disk space ?

                    in reply to: Multi-day FABRIC maintenance (January 1-5, 2024) #6223
                    Nishanth Shyamkumar
                    Participant

                      “These 4 sites will be placed in pre-maintenance mode several days in advance so that no new experiments can be created after the indicated date. We apologize for any inconvenience this may cause.”

                      Which is the indicated date mentioned here? Is it the date that these sites go into pre-maintenance or is it Jan 1st? In other words, can I create new slivers on these sites until Jan 1st?

                      in reply to: Lack of space in Server Filesystem #6220
                      Nishanth Shyamkumar
                      Participant

                        Thanks Ilya. This solution worked. It requires a slight bit of manual intervention, but is still mostly scriptable. It can possibly be fully scripted as well using the Fablib APIs, and I will look into that when time permits.

                        in reply to: Timeout while creating slice #6209
                        Nishanth Shyamkumar
                        Participant

                          Thanks Paul, I looked into the source code for this and I saw that the ‘main’ branch actually includes a change that propagates the wait_timeout parameter to the self.wait() function call. However it’s available only on the ‘beyond bleeding’ while I was testing on the ‘bleeding’ framework.

                          I am still sticking with the ‘bleeding’ framework as of now because the code is structured in such a way that if I set progress=True (which is the default), then the wait_timeout propagates to self.wait(). I was testing earlier with progress=False, since I didn’t want the overhead of GUI representation of the data, but I got to use it for now at least.

                          I tested with progress=True and wait_timeout=2400 and it works for now. The slice submission takes between 1000 to 1500 seconds to complete, but it does succeed in the end.

                          in reply to: Port mirroring issue for Bundle-Ether ports #6206
                          Nishanth Shyamkumar
                          Participant

                            Thanks for the information.

                            in reply to: pmservice issue for multiple uplink ports #6169
                            Nishanth Shyamkumar
                            Participant

                              Xi, following your guidance, I double checked the ports and for AMST it’s using 2 vlans on the same port, and my code was treating it as 2 different uplink ports, which is where the duplication happened. I fixed it on my side and now AMST is being provisioned successfully.

                              I was unable to recreate the above issue in SEAT at the moment. If I see this issue in SEAT again, I will update with a message on this thread. Thanks for the help.

                              in reply to: Download file to local system from Jupyter notebook #6154
                              Nishanth Shyamkumar
                              Participant

                                Thanks, I agree with you Ilya. I have seen some of the APIs in the link you shared, and they can delete files, and shutdown the kernel etc. which is a huge security risk.
                                It’s probably more pragmatic to download using the right click option via the UI.

                                in reply to: Download file to local system from Jupyter notebook #6152
                                Nishanth Shyamkumar
                                Participant

                                  Thanks Ilya for the very useful answer. I was able to download my compressed file with a GET request via curl using this method.

                                  Just some additional information for others,
                                  1) To access the Hub Panel in Jupyter notebook, click on ‘File’->’Hub Control Panel’

                                  2) From here click on ‘Token’->’Request New API token’ and save the generated token number, which can be used in the curl request as a header (-H).

                                  I still have to look into generating tokens without manual intervention, that will close the automation loop. If there are any updates I will post them here.

                                Viewing 15 posts - 1 through 15 (of 19 total)