1. Komal Thareja

Komal Thareja

Forum Replies Created

Viewing 15 posts - 166 through 180 (of 402 total)
  • Author
    Posts
  • in reply to: Slices not showing up on portal #7077
    Komal Thareja
    Participant

      Thanks,

      Komal

       

      • This reply was modified 8 months, 2 weeks ago by Komal Thareja.
      in reply to: Slices not showing up on portal #7076
      Komal Thareja
      Participant

        Hi Nirmala,

        Are you a member of multiple projects? If so, could you please try the following and see if this helps?

        From the portal Go to Experiments -> Projects and Slices, choose the specific project, click on Slices under that project.

        Please let us know if this is still an issue.

        Alternatively, you could renew the slices from Jupyter Hub with one of the following options:

        Option A: Slice commander

        • Open a terminal, type slice-commander
        • type ls to list your slices
        • cd to your slice and then type renew <days>

        Option B: Notebook

        • List your slices using notebook Start Here -> List All Slices (available under Managing Slices)
        • Renew your slices using notebook Start Here -> Extending a Slice Reservation (available under Managing Slices)

         

        Thanks,

        Komal

         

        in reply to: Links not showing up on ip link command #7073
        Komal Thareja
        Participant

          Could you please share your slice id? ID you shared earlier is your Project ID.

          Thanks,

          Komal

          1 user thanked author for this post.
          in reply to: Links not showing up on ip link command #7071
          Komal Thareja
          Participant

            Hi Garegin,

            I suspect you are using ubuntu image for your VMs. Please note for ubuntu, the interfaces are not up by default.

            Please install net-tools using the following command:

            apt install net-tools

            You can then verify the interfaces via the command: ifconfig -a

            Thanks,

            Komal

            in reply to: Slice creation fails #7065
            Komal Thareja
            Participant

              Hi Laura,

              Please check your slice on Portal. I can confirm all the resources requested by your slice are provisioned and are in Active State. I suspect /home/fabric/work/fabric_config/ssh_config is not setup correctly.

              Could you please check if you see any errors in /tmp/fablib/fablib.log ?

              Also, please try following steps:

              • Remove the file /home/fabric/work/fabric_config/ssh_config
              • Run the notebook jupter-examples-1.6.1/configure_and_validate.ipynb
              • Recreate your slice

              Thanks,

              Komal

              in reply to: Persist JupyterLab settings #7063
              Komal Thareja
              Participant

                Hi Sunjay,

                As you correctly pointed, only the contents of the work directory persist across container restarts. I would recommend setting up your local Jupyter Environment for any customized experience as indicated here: https://learn.fabric-testbed.net/knowledge-base/install-the-python-api/#install-jupyter-in-the-virtual-environment

                We are working on providing a containerized access to Jupyter as well where users can launch the container on their desktop/laptop and use that. This should be available soon and would enable user environment customization.

                Thanks,

                Komal

                in reply to: TACC always failing with insufficient resources:Disk# #7060
                Komal Thareja
                Participant

                  Portal currently display overall site TACC disk usage which is combined disk space on all the hosts at TACC. Control Framework determines possible candidate nodes for your VM on the basis of the resources requested.

                  Your slice is requesting for a VM with CX5, which in case of TACC are only available on tacc-w4 and hence it is trying to allocate it on tacc-w4 but fails due to no disk space. I would recommend using a different site than TACC.

                  Also, we are working on improving the Resource Usage display to show per Host level information.

                  Thanks,

                  Komal

                   

                  in reply to: TACC always failing with insufficient resources:Disk# #7058
                  Komal Thareja
                  Participant

                    Hi Nishanth,

                    The error Insufficient resources : [disk] implies that there is not enough disk available on the host on which your VM is being requested. Looking at your slice, following VM requesting a ConnectX5 is being rejected as it maps to tacc-w4  There is not enough disk available on tacc-w4 to accomodate your VM hence the failure.


                    Reservation ID: 478b2a91-5a02-4cf0-9bcd-de04c3b873ea Slice ID: 30f9fb42-37be-420f-899e-082a41bfb735
                    Resource Type: VM Notices: Reservation 478b2a91-5a02-4cf0-9bcd-de04c3b873ea (Slice Traffic Listening Demo TACC(30f9fb42-37be-420f-899e-082a41bfb735) Graph Id:a58b7bc7-55d6-42e9-b457-5a8a32ebebc9 Owner:nshyamkumar@iit.edu) is in state (Closed,None_) (Last ticket update: Insufficient resources : ['disk'])
                    Start: 2024-06-05 17:55:24 +0000 End: 2024-06-06 17:55:23 +0000 Requested End: 2024-06-06 17:55:23 +0000
                    Units: 1 State: Closed Pending State: None_
                    Predecessors
                    Sliver: {'node_id': '9a579143-79b2-44fb-bacb-e6a5db4da3bf', 'capacities': '{ core: 2 , ram: 8 G, disk: 1 G}', 'capacity_hints': '{ instance_type: fabric.c2.m8.d10}', 'image_ref': 'default_ubuntu_20', 'image_type': 'qcow2', 'name': 'TACC_node4', 'reservation_info': '{"reservation_id": "478b2a91-5a02-4cf0-9bcd-de04c3b873ea", "reservation_state": "Closed"}', 'site': 'TACC', 'type': 'VM', 'user_data': '{"fablib_data": {"instantiated": "False", "run_update_commands": "False", "post_boot_commands": [], "post_update_commands": []}}'}
                    Component: {'node_id': '670d117f-19ac-477b-bff7-36ac4e90107a', 'details': 'Mellanox ConnectX-5 Dual Port 10/25GbE', 'model': 'ConnectX-5', 'name': 'TACC_node4-pmnic_2', 'type': 'SmartNIC', 'user_data': '{}'}
                    NS: {'node_id': 'adeede90-a808-45a6-8e1e-8c8de7a4ee6e', 'layer': 'L2', 'name': 'TACC_node4-TACC_node4-pmnic_2-l2ovs', 'site': 'TACC', 'type': 'OVS'}
                    IFS: {'node_id': '2f9a52b4-3108-48f2-b0f9-e0ccd7716cdc', 'capacities': '{ bw: 25 Gbps, unit: 1 }', 'labels': '{ local_name: p1}', 'name': 'TACC_node4-pmnic_2-p1', 'type': 'DedicatedPort', 'user_data': '{"fablib_data": {"mode": "config"}}'}
                    IFS: {'node_id': 'b6c42c3e-a570-4ed1-b633-607e90777f34', 'capacities': '{ bw: 25 Gbps, unit: 1 }', 'labels': '{ local_name: p2}', 'name': 'TACC_node4-pmnic_2-p2', 'type': 'DedicatedPort', 'user_data': '{"fablib_data": {"mode": "config"}}'}

                    Thanks,

                    Komal

                    in reply to: How to use long-lived tokens in experiments #7057
                    Komal Thareja
                    Participant

                      Hi Nishanth,

                      This issue has been fixed for a while now but is only available in Beyond Bleeding Edge Container.

                      Could you please use that? This should be available in the pypi with the next release.

                      Thanks,

                      Komal

                      in reply to: Do we have UEFI firmware boot mode option for nodes? #7039
                      Komal Thareja
                      Participant

                        Hi Acheme,

                        We investigated the possibility of enabling UEFI mode for users but encountered issues where GPUs do not function in that mode. Consequently, we have opted to maintain updated firmware to mitigate these errors for users. Could you please rerun your experiment and inform us if the error persists? I am available to collaborate with you on upgrading the firmware and addressing the issue.

                        Thanks,

                        Komal

                        Komal Thareja
                        Participant

                          The updated network model has been deployed and the maintenance is complete.

                          in reply to: How can we restore our files from deleted nodes #7023
                          Komal Thareja
                          Participant

                            Hi Emmanuel,

                            It is not possible to recover a deleted slice. Apologies we may not be able to recover your data. However, you should be able to request renewal of an expired project though.

                            Thanks,

                            Komal

                            Komal Thareja
                            Participant

                              To clarify, requesting two VMs is acceptable. However, requesting VMs with GPUs and SmartNICs in the mentioned slice is invalid because none of the hosts have SmartNICs and GPUs available on the same host.

                              Komal Thareja
                              Participant

                                Hello Khawar,

                                Your slice is requesting 2 VMs. This is unsupported configuration. On UTAH, we have two hosts each with 3 GPUs but none of them have a dedicated CX-6. So your slice configuration is seemed unsupported.

                                Also, I checked, all 6 RTX-6000 GPUs are in use. Please note that the resource usage displayed on the portal may be outdated by 30 minutes.

                                • n1 – with RTX-6000 GPU and dedicated NIC CX-6
                                • n2 – with two RTX-600 GPU

                                We do have ongoing work for users to identify such invalid slice configurations using fablib API. This should be available soon with the upcoming Release 1.7. We also plan to provide host level resource usage details to user in 1.7 that may help with this too. Hope this helps!

                                Thanks,

                                Komal

                                in reply to: Assigned addresses lost in reserved slices #7008
                                Komal Thareja
                                Participant

                                  @Nirmala – Maintenance has been completed.

                                Viewing 15 posts - 166 through 180 (of 402 total)
                                FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

                                KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)
                                FABRIC invites nominations for four awards recognizing innovative uses of FABRIC resources—Best Published Paper, Best FABRIC Matrix, Best FABRIC Experiment, and Best Classroom Use of FABRIC — submissions due by **Monday, February 24 at 11:59 PM ET**, and winners announced at KNIT10. [>>>Submit Form](https://docs.google.com/forms/d/e/1FAIpQLSeTp3i2iDhB7bHgN8ryMxZci8ya87yjeQd7_JMZImUodNinVA/viewform)

                                KNIT10 Call for Demos Now Open! Submit your demo by **February 24**. [>>>Submit Demo](https://docs.google.com/forms/d/e/1FAIpQLScRIWqHliNP3DFWBCnalYN_fBXJXVM0PpP9YWWJdSebC95TvA/viewform)