1. issue with creating chameleon server using the notebook chameleon_facility_port

issue with creating chameleon server using the notebook chameleon_facility_port

Home Forums FABRIC General Questions and Discussion issue with creating chameleon server using the notebook chameleon_facility_port

Viewing 15 posts - 1 through 15 (of 15 total)
  • Author
    Posts
  • #6592
    Sanjana Das
    Participant

      I wanted to run an experiment spanning chameleon and fabric and i was jsut running the chameleon_facility_port_fabnetv4 notebook but I am getting an error while trying to create the chameleon server. The error is as follows:

      ---------------------------------------------------------------------------
      ResourceFailure                           Traceback (most recent call last)
      Cell In[6], line 19
           17 for server in servers:
           18     print(f'Waiting for server: {server.name}')
      ---> 19     chi.server.wait_for_active(server.id)
           20 print('Done!')
      
      File /opt/conda/lib/python3.10/site-packages/chi/server.py:514, in wait_for_active(server_id, timeout)
          512 compute = connection().compute
          513 server = compute.get_server(server_id)
      --> 514 return compute.wait_for_server(server, wait=timeout)
      
      File /opt/conda/lib/python3.10/site-packages/openstack/compute/v2/_proxy.py:2510, in Proxy.wait_for_server(self, server, status, failures, interval, wait, callback)
         2482 """Wait for a server to be in a particular status.
         2483 
         2484 :param server: The :class:~openstack.compute.v2.server.Server to wait
         (...)
         2507     status attribute.
         2508 """
         2509 failures = ['ERROR'] if failures is None else failures
      -> 2510 return resource.wait_for_status(
         2511     self,
         2512     server,
         2513     status,
         2514     failures,
         2515     interval,
         2516     wait,
         2517     callback=callback,
         2518 )
      
      File /opt/conda/lib/python3.10/site-packages/openstack/resource.py:2409, in wait_for_status(session, resource, status, failures, interval, wait, attribute, callback)
         2407     return resource
         2408 elif normalized_status in failures:
      -> 2409     raise exceptions.ResourceFailure(
         2410         "{name} transitioned to failure state {status}".format(
         2411             name=name, status=new_status
         2412         )
         2413     )
         2415 LOG.debug(
         2416     'Still waiting for resource %s to reach state %s, '
         2417     'current state is %s',
         (...)
         2420     new_status,
         2421 )
         2423 if callback:
      
      ResourceFailure: Server:343a728a-3f4f-4edf-a8c5-9e8e9d7f9b9c transitioned to failure state ERROR
      #6594
      Sanjana Das
      Participant

        I am still having this issue so I manually created a server in Chameleon, go its ip, and went on to run the notebook but at the final step the fabric node was not able to ping the Chameleon server.

         

        Moreover, I have another question. From the fabric portal, I know that we can create a chameleon facility port and connect it to a fabric slice but can we replicate this exact experiment (creating a node(s) in chameleon and also in fabric and connecting them) using the fabric portal?

        #6597
        Komal Thareja
        Participant

          @Sanjana – Chameleon team would be better equipped to help you regarding the failure observed while creating Node on Chameleon. Jupyter Notebook referred in your post uses chameleon python API.

          FABRIC portal doesn’t provide support to provision resources on Chameleon. You would have to use Chameleon Portal to use their Graphical Interface.

          Thanks,

          Komal

          #6601
          Komal Thareja
          Participant

            Just realized you also had problem with network reachability between Chameleon and Fabric nodes. Could you please share your slice ID for FABRIC?

            Also, please check the interface and routes are setup correctly on Chameleon node.

            #6605
            Sanjana Das
            Participant

              Hello. There are two slices being created in that notebook so I will send both of them! The first one is “tacc_stitch” with the id 2ff552d5-2965-4c1f-bac5-7b4221869cc5 and the second one is “MyFabricNodes” with the id 5f9a95b6-b933-4582-a650-5ac28af8ef9e. In the Chameleon facility port l3fabnetv4 notebook, the node from “MyFabricNodes”  is pinging the Chameleon server with the ip 10.130.164.13.

              I think the interfaces and routes are correctly set up since I did not change anything in the notebook. Moreover, I was successfully able to create the server from the notebook itself so that issue was also resolved.

              #6619
              Sanjana Das
              Participant

                Hello, I just realized the lease of those slices ended. I created new ones and their slice ids are: f43ae1fb-459f-4947-b41d-2c859ed81ffc and ae103574-56cf-4859-a75b-54308ff94570

                MScreenshot-2024-02-27-at-10.35.17 PM

                Moreover, this is the output of pinging the chameleon server:

                #6622
                Komal Thareja
                Participant

                  Hi Sanjana,

                  I am able to reproduce this issue on MASS. But I was able to get this to work on other sites like SEAT, PSC. Could you please use a different site like SEAT or PSC while we investigate this issue. I will keep you updated with the findings for MASS.

                  Thank you for sharing your observations and helping us make the testbed better.

                  Thanks,

                  Komal

                  #6630
                  Sanjana Das
                  Participant

                    Hi,

                    I am still facing the same issue on the other sites as well. Do we need to explicitly add routes from the nodes of those sites so that they can reach the fabnetv4 subnet created at tacc?

                    #6631
                    Komal Thareja
                    Participant

                      MASS is working as well. We checked your FABRIC nodes, Fabnet services seems to be connected properly and we can ping the gateway. FABRIC VMs in your slice can ping each other too.

                      Not sure how your Chameleon Server is setup.

                      You should see routes and interface setup something similar to below on your Chameleon Node:

                      
                      cc@kthare10-fabric-stitch-server-1:~$ ip route list
                      default via 10.130.163.2 dev eno1np0 proto dhcp src 10.130.163.10 metric 100
                      10.128.0.0/10 via 10.130.163.1 dev eno1np0 proto dhcp src 10.130.163.10 metric 100
                      10.130.163.0/24 dev eno1np0 proto kernel scope link src 10.130.163.10
                      169.254.169.254 via 10.130.163.3 dev eno1np0 proto dhcp src 10.130.163.10 metric 100
                      cc@kthare10-fabric-stitch-server-1:~$
                      cc@kthare10-fabric-stitch-server-1:~$
                      cc@kthare10-fabric-stitch-server-1:~$
                      cc@kthare10-fabric-stitch-server-1:~$ ifconfig
                      eno1np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
                      inet 10.130.163.10 netmask 255.255.255.0 broadcast 10.130.163.255
                      inet6 fe80::be97:e1ff:fec4:8e0 prefixlen 64 scopeid 0x20<link>
                      ether bc:97:e1:c4:08:e0 txqueuelen 1000 (Ethernet)
                      RX packets 4937 bytes 1058216 (1.0 MB)
                      RX errors 0 dropped 0 overruns 0 frame 0
                      TX packets 4804 bytes 410390 (410.3 KB)
                      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
                      

                      P.S: I did execute the cell indicated as “(Optionally) Add a Router and Attach it to the Subnet”.

                       

                      #6637
                      Sanjana Das
                      Participant

                        Hi,

                        Thanks for that! Would it be possible for you to share the code snippet of how you set up the server? This is how I did it and I can see it up and running on the Chameleon GUI. I can also see that it was successfully assigned the ip address from the fabnetv4 pool of ip addresses. Also, I pinged the fabric gateway (10.30.162.1) from a node in psc and that worked but it failed for the chameleon gateway (10.130.162.2).

                         

                        Create the lease

                        BLAZAR_TIME_FORMAT = ‘%Y-%m-%d %H:%M’

                        # Set start/end date for lease
                        # Start one minute into future to avoid Blazar thinking lease is in past
                        # due to rounding to closest minute.
                        start_date = (datetime.now(tz=tz.tzutc()) + timedelta(minutes=1)).strftime(BLAZAR_TIME_FORMAT)
                        end_date = (datetime.now(tz=tz.tzutc()) + timedelta(days=1)).strftime(BLAZAR_TIME_FORMAT)

                        # Build list of reservations (in this case there is only one reservation)
                        reservation_list = [
                        {
                        “resource_type”: “network”,
                        “network_name”: chameleon_network_name,
                        “network_properties”: “”,
                        “resource_properties”: json.dumps(
                        [“==”, “$stitch_provider”, ‘fabric’]
                        ),
                        },

                        {‘resource_type’: ‘physical:host’,
                        ‘resource_properties’: ‘[“==”, “$node_type”, “compute_skylake”]’,
                        ‘hypervisor_properties’: ”, ‘min’: 1, ‘max’: 1}]

                        chameleon_lease = chi.lease.create_lease(chameleon_lease_name,
                        reservations=reservation_list,
                        start_date=start_date,
                        end_date=end_date)

                        #Print the lease info
                        chameleon_network_reservation_id = [reservation for reservation in chameleon_lease[‘reservations’] if reservation[‘resource_type’] == ‘network’][0][‘id’]
                        print(f”chameleon_network_reservation_id: {chameleon_network_reservation_id}”)
                        chameleon_server_reservation_id = [reservation for reservation in chameleon_lease[‘reservations’] if reservation[‘resource_type’] == ‘physical:host’][0][‘id’]
                        print(f”chameleon_node_reservation_id: {chameleon_server_reservation_id}”)

                        Configure chameleon network and routes

                        chameleon_subnet = chi.network.create_subnet(chameleon_subnet_name, chameleon_network_id,
                        cidr=str(subnet),
                        allocation_pool_start=chameleon_allocation_pool_start,
                        allocation_pool_end=chameleon_allocation_pool_end,
                        gateway_ip=chameleon_gateway_ip)

                        chi.neutron().update_subnet(subnet=chameleon_subnet[‘id’] ,
                        body={
                        “subnet”: {
                        “host_routes”: [
                        {
                        “destination”: f”{fablib.FABNETV4_SUBNET}”,
                        “nexthop”: f”{fabric_gateway_ip}”
                        }
                        ]
                        }
                        })

                        print(f”subnet name : {chameleon_subnet[‘name’]}”)
                        print(f”subnet : {chameleon_subnet[‘cidr’]}”)
                        print(f”gateway_ip : {chameleon_subnet[‘gateway_ip’]}”)

                        for starting server

                        import chi.server

                        servers = []

                        for i in range(chameleon_server_count):
                        server_name=f”{chameleon_server_name}_{i+1}”

                        # Create the server
                        servers.append(chi.server.create_server(server_name,
                        reservation_id= chameleon_server_reservation_id,
                        network_name=chameleon_network_name,
                        image_name=chameleon_image_name,
                        key_name=chameleon_key_name
                        ))

                        # Wait until the server is active
                        for server in servers:
                        print(f’Waiting for server: {server.name}’)
                        chi.server.wait_for_active(server.id)
                        print(‘Done!’)

                        #6638
                        Komal Thareja
                        Participant
                          #6642
                          Sanjana Das
                          Participant

                            Hi,

                            What did you do for the Chameleon server reservation ID? Since in the notebook, there aren’t steps to reserve a server. That is where I tried reserving a server using the chi api and got the reservation id.

                            #6646
                            Komal Thareja
                            Participant

                              Please create a Lease to reserve a host on Chameleon via Project -> Reservations -> Leases -> Create Lease.

                              Once the lease is created, click on the lease, you will Reservation section on it, Copy the Id from there.

                              This is the Id you need to use in the notebook. Hope this helps.

                              If you create the Server on Chameleon manually. Please set the IP address and the routes on the server as below:

                              ip addr add 10.130.162.2/24 dev eth1

                              Add route: route add -net 10.130.162.0/24 dev eth1

                              Change the IP and interface as per your FabNet subnet.

                              #6651
                              Komal Thareja
                              Participant

                                Attaching the screenshot for Chameleon Lease

                                #6672
                                Sanjana Das
                                Participant

                                  Thank you so much for all your help! Really appreciate it.

                                Viewing 15 posts - 1 through 15 (of 15 total)
                                • You must be logged in to reply to this topic.