1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 301 through 315 (of 372 total)
  • Author
    Posts
  • in reply to: Problems creating new slices #4305
    Komal Thareja
    Participant

      Hi Nagmat,

      I don’t see any failures for this slice either. It seems like this slice was deleted on 2023-05-16 17:31:42 UTC.

      However, I do see that, another slice created by you yesterday failed with “Insufficient resources error” which would explain the notebook behavior.

      
      Reservation ID: fb7939b5-8f8b-4517-a975-3da648a190c3 Slice ID: 55c64491-4edc-44cc-885f-264f9fa4cbc2
      Resource Type: VM Notices: Reservation fb7939b5-8f8b-4517-a975-3da648a190c3 (Slice basic_nagm01(55c64491-4edc-44cc-885f-264f9fa4cbc2) Graph Id:348882a5-575c-438e-a3aa-18c1c933794e Owner:nagmat@nevada.unr.edu) is in state (Closed,None_)  (Last ticket update: Insufficient resources : Component of type: ConnectX-5 not available in graph node: 8QTDZC3)
      Start: 2023-05-22 19:28:49 +0000 End: 2023-05-23 19:28:47 +0000 Requested End: 2023-05-23 19:28:47 +0000
      

      Thanks,
      Komal

      in reply to: Problems creating new slices #4303
      Komal Thareja
      Participant

        Hi Nagmat,

        Could you please share slice id for the slice(s) if possible? Also, you could check in Portal under Experiment->Slices, please select “Include Dead/Closing” slices. Any errors that occurred in those slices should be displayed when you view the slice on the portal.

        Thanks,
        Komal

        Komal Thareja
        Participant

          Maintenance was completed!

          Komal Thareja
          Participant

            The maintenance has been completed.

            in reply to: Management IP invalid while uploading file. #4248
            Komal Thareja
            Participant

              Apologies I missed the screenshot, I see that you are trying to upload to the node without doing a get on it.

              Could you please try adding the following statements before upload to ensure that node object has all the information from the slice?

              
              s1 = slice.get_node(name="s1")  
              s3 = slice.get_node(name="s3")
              

              The above statements should be added after the cell which does a get_slice().

              Also regarding the failed VM: h2 – unfortunately it is again because of disk un-availability. s2 and h2 are both requested on MICH and each VM is requesting 500G. Based on current allocation of MICH, only one can be provisioned.

              This is because of a known issue: availability discrepancy between software and infrastructure. We are working on the plans to resolve this, until then if possible request VMs with smaller disks.

              Thanks,
              Komal

              in reply to: Management IP invalid while uploading file. #4246
              Komal Thareja
              Participant

                Hi Nagmat,

                You have 4 active slices:

                
                Slice Name: mri_ded3 Slice ID: d54fd501-ca14-49ce-b217-50c593bd0927 Project ID: 527832fc-c273-4254-b988-16e5c2923bf9 Project Name: in-network caching
                
                Slice Name: mri1 Slice ID: 6ce8648d-26a9-47b8-8349-4e3e724085a0 Project ID: 527832fc-c273-4254-b988-16e5c2923bf9 Project Name: in-network caching
                
                Slice Name: mri3 Slice ID: fa527279-b9bb-4dac-bb68-7a46b63c8ad1 Project ID: 527832fc-c273-4254-b988-16e5c2923bf9 Project Name: in-network caching
                
                Slice Name: Nagm_P4Test01 Slice ID: 140247fe-cb45-47da-a797-dca5592487dd Project ID: 527832fc-c273-4254-b988-16e5c2923bf9 Project Name: in-network caching
                

                The last slice Nagm_P4Test01 has two VM slivers on FIU and the VM slivers are in ActiveTicketed state – which means a pending Renew. We have been observing network issues with FIU and working to resolve that since yesterday. FIU was moved to maintenance as well. I do see management IPs for these VMs but it is possible that they are not accessible because of the network issue.

                All other slices have all the VMs and Networks in Active state. If you are observing failure with these slices, please do a slice = fablib.get_slice(slice_name) and then try to upload the file.

                Thanks,
                Komal

                in reply to: Management IP invalid while uploading file. #4245
                Komal Thareja
                Participant

                  Thank you Paul!

                  @Nagmat,
                  Could you please share the slice id for your slice?

                  Thanks,
                  Komal

                  Komal Thareja
                  Participant

                    Hi Nagmat,

                    I looked at your slice and the error reported above, your slice is requesting more than one VM on FIU and they are being allocated to fiu-w3.fabric-testbed.net. All the VMs are of the flavor: fabric.c8.m16.d500. Current disk availability on fiu-w3.fabric-testbed.net can only accomodate one such VM so all others fail.

                    We do have known discrepancy in the disk availability between the software and the infrastructure. We have plans to address that soon. Could you please use a different site instead of FIU for now?

                    Appreciate your feedback!

                    Thanks,
                    Komal

                    Komal Thareja
                    Participant

                      Maintenance is complete.

                      in reply to: Slice stuck in “Configuring” since February #4125
                      Komal Thareja
                      Participant

                        Thank you for sharing this! I would clean this up from the backend and would work on fixing this bug. Appreciate your feedback.

                        Komal Thareja
                        Participant

                          Maintenance has been lifted, testbed is open for use.

                          Komal Thareja
                          Participant

                            Maintenance is complete. Testbed has been opened for use.

                            Thanks,
                            Komal

                            Komal Thareja
                            Participant

                              Dear Experimenters,

                              The testbed is in a brief maintenance. We are running some performance test and will inform as soon as the testbed is back online.

                              Thanks,
                              Komal

                              Komal Thareja
                              Participant

                                I can confirm it’s the same issue on MASS and UCSD as well based on the logs. Also, I noticed the slice request was for 1000G disk but it gets mapped to a flavor with 2000G as we don’t have any 1000G flavor. None of the sites/workers mentioned have that much disk available and hence the provisioning fails. Unfortunately, the error message returned by openstack is not specific and not very helpful.

                                Thanks again for letting us know. We will work on fixing the mismatch and address this.

                                Thanks,
                                Komal

                                Komal Thareja
                                Participant

                                  Hi Nagmat,

                                  Thank you for sharing your observation! The requested slice asks for a VM with 2000GB disk on FIU rack (fiu-w4.fabric-testbed.net). However, the disk isn’t available and hence VM provisioning fails. We have a known issue that there is a mismatch in the disk availability as seen by software and the actual infrastructure. We are working to address that. But for now, I would request to create a slice with smaller disk.

                                  Also, was this issue only observed on FIU rack? If not, Could you please share the other sites where you observed this error?

                                  Thanks,
                                  Komal

                                  1 user thanked author for this post.
                                Viewing 15 posts - 301 through 315 (of 372 total)