1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 286 through 300 (of 416 total)
  • Author
    Posts
  • in reply to: Maintenance on Network AM – 12/11/2023 (3:30pm-4:30pm EST) #6182
    Komal Thareja
    Participant

      Maintenance has been completed!

      in reply to: Unable to reserve slice #6177
      Komal Thareja
      Participant

        Hi Kriti,

         

        There was an issue on new-y2 where your VMs were being provisioned as it had some leaked VMs. We rebooted the worker node, your slices should work on NEWY. We will also check STAR and WASH as well.

        Thanks,

        Komal

        in reply to: Fabric Portal Jupytur Not Finding File #6138
        Komal Thareja
        Participant

          Hello,

          Could you please check if the file exists at the specified path using the command: ls /home/fabric/work/re_vit/notebooks/animal-blur-canine-551628.jpg ?

          Thanks,
          Komal

          in reply to: Maintenance on Network AM – 11/12/2023 (3:00pm-4:00pm EST) #6089
          Komal Thareja
          Participant

            The maintenance is complete!

            in reply to: error when attempting to numa_tune #6063
            Komal Thareja
            Participant

              You are right Greg, this is totally dependent on how much memory is available on the Numa Node on the Host where your VM is launched at the current time.

              in reply to: error when attempting to numa_tune #6041
              Komal Thareja
              Participant

                @yoursunny

                What happens if there are multiple components that are on distinct NUMA sockets?
                If you have multiple components, we try to pin the memory for the VM to both the Numa Nodes.

                Example: your VM has a ConnectX-5 and GPU both on different sockets, invoking numa_tune would pin the memory to both the sockets provided that the combined available memory on both the sockets >= requested VM RAM.

                Is it possible to specify how much RAM to pin to each NUMA socket?
                In the current version, this is not supported. We may be limited on this by the underlying OS API as well. But we would explore to improve on this.

                If we pin a CPU core or certain amount of RAM onto a NUMA socket, does it prevent other VMs from using the same CPU core or RAM capacity?
                Yes, if you have pinned CPUs/Memory to a specific NUMA socket, other VMs cannot use the same cores/memory on that socket.

                For CPU pinning, you can explicitly specify how many cores to pin to a Numa Node.

                Thanks,
                Komal

                1 user thanked author for this post.
                in reply to: error when attempting to numa_tune #6000
                Komal Thareja
                Participant

                  No, having lesser memory requested would have better chances or deploying on a relatively less used site would give better success. I checked on the portal GPN seems to be very sparsely used. Please consider requesting the VM there and try with 32G ram.

                  Upper limit for a VM connected with only one component would map to a single Numa Node. Max limit on memory for a numa node is 64G so exceeding that limit would not work.

                  Adding more flexibility to this API would help alleviate this issue. Will definitely work on that and keep you updated once that is available.

                  Thanks,
                  Komal

                  in reply to: problems reserving slice resource #5997
                  Komal Thareja
                  Participant

                    Hello Nirmala,

                    Could you please share your Slice ID to help debug this issue?
                    Also, could you please provide more details as to what you mean by reservation is not completed? Does the reservation stay in ticketed state or time out. Does it become Active?

                    Thanks,
                    Komal

                    in reply to: error when executing code in slice #5996
                    Komal Thareja
                    Participant

                      Hello Kriti,

                      Could you please share your Slice ID to help debug this issue?

                      Thanks,
                      Komal

                      in reply to: error when attempting to numa_tune #5995
                      Komal Thareja
                      Participant

                        Hello Greg,

                        node.numa_tune() tries to pin the memory for a VM to the Numa Nodes belonging to the components attached to the VM in the current implementation.

                        Looking at the sliver details, this sliver has 64G memory allocated. The Numa node for the component attached to this VM is Node 1. In our topology we have 8 Numa Nodes per worker and each is allocated 64G memory. The error message above implies that the requested memory in the above case (64G) is not available on the Numa Node(1) and hence the VM’s memory cannot be pinned to Node(1).


                        sliver_id: '0764c99c-0e76-4aaa-94de-c291bd2b23f0',
                        'name': 'compute1-ATLA'
                        'capacities': '{ core: 16 , ram: 64 G, disk: 500 G}',
                        'capacity_allocations': '{ core: 16 , ram: 64 G, disk: 500 G}'

                        Also, in the current version, the API doesn’t allow to pin only a percentage of the memory to the numa node. We will work on adding that capability to serve the memory request better in the next release. Appreciate your feedback!

                        Thanks,

                        Komal

                        Komal Thareja
                        Participant

                          Topology update is completed and the maintenance has been lifted.

                          Thanks,

                          Komal

                          in reply to: Few nodes in the cluster not getting management IP #5897
                          Komal Thareja
                          Participant

                            Thank you for reporting this issue Manas!

                            Node2 (STAR) and Node7 (UCSD) are in closed state and hence they do not have a management IP.

                            Both these nodes failed to provision with the error: Last ticket update: Redeem/Ticket timeout.

                            Currently investigating it, will keep you posted with our findings.

                            Also, in the mean while, could you please share your notebook?  Just trying to see if we can reproduce this consistently with your notebook. Haven’t been successful in recreating this problem.

                            Appreciate your help in making the testbed better!

                            Thanks,

                            Komal

                            in reply to: Issues with MASS #5859
                            Komal Thareja
                            Participant

                              Good morning Bruce,

                              Thank you for sharing your observations. VM provisioning to the worker (mass-w1.fabric-testbed.net) to which the VM was allocated is not working. We are working to resolve it. In the meanwhile, Please consider creating a slice on a different site or a different worker.

                              Node can be requested on a specific worker by passing in the host field as below:

                              node1 = slice.add_node(name="Node1", cores=16, ram=32, site="MASS", image='docker_rocky_8', host="mass-w2.fabric-testbed.net")

                              Thanks,

                              Komal

                              in reply to: FABNetv4/FABNetv6 gateway is not IP address #5426
                              Komal Thareja
                              Participant

                                This looks like a bug and would address this. In the meanwhile, I had modified your script to make it run with the latest fablib to get past this issue. Sharing it here, please note the changes around setting interface mode to auto which lets fablib configure IP addresses and also running post_boot_config() this ensures instantiated is set.

                                 

                                in reply to: FABNetv4/FABNetv6 gateway is not IP address #5422
                                Komal Thareja
                                Participant

                                  Thank you for sharing your observations! I created a Fabnet slice via JH and was not able to reproduce this problem.

                                  Could you please share the generated graphml file from the following the code?


                                  slice = fablib.get_slice(slice_id="7e6a41e5-ab21-4cc5-9582-4c37e54dc2d8")
                                  slice.get_fim_topology().serialize(file_name="fabnet-slice.graphml")

                                  I can confirm that the slivers for your slices do have Network and Gateway assigned. Enclosing snapshot for slice: 7e6a41e5-ab21-4cc5-9582-4c37e54dc2d8


                                  Reservation ID: 4c9b702b-1346-4fe5-b61e-f5cb7790e75f Slice ID: 7e6a41e5-ab21-4cc5-9582-4c37e54dc2d8
                                  Resource Type: FABNetv4 Notices: Reservation 4c9b702b-1346-4fe5-b61e-f5cb7790e75f (Slice mtu@AMST(7e6a41e5-ab21-4cc5-9582-4c37e54dc2d8) Graph Id:98452967-6246-4517-a030-7d76d7044d05 Owner:shijunxiao@arizona.edu) is in state (Active,None_)
                                  Start: 2023-09-22 19:56:48 +0000 End: 2023-09-23 19:56:47 +0000 Requested End: 2023-09-23 19:56:47 +0000
                                  Units: 1 State: Active Pending State: None_
                                  Predecessors
                                  1f8599bc-68cb-450a-9c49-962d2f5a5b4f
                                  Sliver: {'node_id': 'be2e2e72-5bdd-4301-98aa-bb9e3fe23a56', 'gateway': 'IPv4 subnet: 10.145.7.0/24 GW: 10.145.7.1', 'layer': 'L3', 'name': 'net4', 'node_map': "('bbf6a0a7-8981-4613-b797-0960e7e8ea9d', 'node+amst-data-sw:ip+192.168.42.3-ipv4-ns')", 'reservation_info': '{"error_message": "", "reservation_id": "4c9b702b-1346-4fe5-b61e-f5cb7790e75f", "reservation_state": "Active"}', 'site': 'AMST', 'type': 'FABNetv4', 'user_data': '{"fablib_data": {"instantiated": "False", "mode": "manual"}}'}
                                  IFS: {'node_id': '115b71f3-5369-490a-a6cc-2d16db3cc8f0', 'capacities': '{ unit: 1 }', 'label_allocations': '{ bdf: 0000:e2:0d.1, mac: 0E:4F:18:21:9F:35, ipv4: 10.145.7.2, vlan: 2103, local_name: HundredGigE0/0/0/9, device_name: amst-data-sw}', 'labels': '{ bdf: 0000:e2:0d.1, mac: 0E:4F:18:21:9F:35, ipv4: 10.145.7.2, vlan: 2103, local_name: HundredGigE0/0/0/9, device_name: amst-data-sw}', 'name': 'node-node-nic0-p1', 'node_map': "('bbf6a0a7-8981-4613-b797-0960e7e8ea9d', 'port+amst-data-sw:HundredGigE0/0/0/9')", 'type': 'ServicePort'}

                                  Reservation ID: d88639a0-3062-43d7-83ed-ccbac797ef29 Slice ID: 7e6a41e5-ab21-4cc5-9582-4c37e54dc2d8
                                  Resource Type: FABNetv6 Notices: Reservation d88639a0-3062-43d7-83ed-ccbac797ef29 (Slice mtu@AMST(7e6a41e5-ab21-4cc5-9582-4c37e54dc2d8) Graph Id:98452967-6246-4517-a030-7d76d7044d05 Owner:shijunxiao@arizona.edu) is in state (Active,None_)
                                  Start: 2023-09-22 19:56:48 +0000 End: 2023-09-23 19:56:47 +0000 Requested End: 2023-09-23 19:56:47 +0000
                                  Units: 1 State: Active Pending State: None_
                                  Predecessors
                                  1f8599bc-68cb-450a-9c49-962d2f5a5b4f
                                  Sliver: {'node_id': '8a23d2b8-af6e-4a60-a34a-a9c913b21f30', 'gateway': 'IPv6: 2602:fcfb:1f:2::/64 GW: 2602:fcfb:1f:2::1', 'layer': 'L3', 'name': 'net6', 'node_map': "('bbf6a0a7-8981-4613-b797-0960e7e8ea9d', 'node+amst-data-sw:ip+192.168.42.3-ipv6-ns')", 'reservation_info': '{"error_message": "", "reservation_id": "d88639a0-3062-43d7-83ed-ccbac797ef29", "reservation_state": "Active"}', 'site': 'AMST', 'type': 'FABNetv6', 'user_data': '{"fablib_data": {"instantiated": "False", "mode": "manual"}}'}
                                  IFS: {'node_id': 'b9f78791-6ef0-4e81-9fee-b081a9485676', 'capacities': '{ unit: 1 }', 'label_allocations': '{ bdf: 0000:e2:08.6, mac: 0A:B1:A5:3F:0F:02, ipv6: 2602:fcfb:1f:2::2, vlan: 2068, local_name: HundredGigE0/0/0/9, device_name: amst-data-sw}', 'labels': '{ bdf: 0000:e2:08.6, mac: 0A:B1:A5:3F:0F:02, ipv6: 2602:fcfb:1f:2::2, vlan: 2068, local_name: HundredGigE0/0/0/9, device_name: amst-data-sw}', 'name': 'node-node-nic1-p1', 'node_map': "('bbf6a0a7-8981-4613-b797-0960e7e8ea9d', 'port+amst-data-sw:HundredGigE0/0/0/9')", 'type': 'ServicePort'}

                                  Thanks,

                                  Komal

                                Viewing 15 posts - 286 through 300 (of 416 total)