1. Sunjay Cauligi

Sunjay Cauligi

Forum Replies Created

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • in reply to: Lost SSH access to some nodes after node.os_reboot() #8562
    Sunjay Cauligi
    Participant

      Hi Mert,
      I was following the NUMA steps based on this example notebook: iperf3_optimized.ipynb
      which really just boiled down to running the following:

      for node in slc.get_nodes():
          node.pin_cpu(component_name='einic')
      # one node failed to pin cpu here, investigated that with .get_cpu_info()
      for node in slc.get_nodes():
          node.numa_tune()
      # a couple nodes failed to pin memory here, investigated that with .get_numa_info()
      for node in slc.get_nodes():
          node.os_reboot()
      

      The slice I ran this on is a long-running project slice that has been up for the last two weeks. Instead of a notebook, I use a wrapper library and Python scripts directly from my local machine to provision/manage slices. I can email you my library code and an example script of my typical usage if you would like to take a look.

      This slice was provisioned with three nodes each at three different sites; each node itself is provisioned identically with a FABNETv4 (“internal”, with a 10.* IP address) NIC and a FABNETv4Ext (“public”, with a publicly routable IP address) NIC.
      I install identical software and run identical code on each node as well.

      in reply to: INDI site, can’t route IPv4 #8371
      Sunjay Cauligi
      Participant

        node2d1 is at site INDI

        ubuntu@node2d1:~$ ip -br a
        lo UNKNOWN 127.0.0.1/8 ::1/128
        enp3s0 UP 10.30.6.120/23 metric 100 2001:18e8:fff0:3:f816:3eff:fe69:e64a/64 fe80::f816:3eff:fe69:e64a/64
        enp7s0 UP 23.134.233.130/28 fe80::44d:c7ff:fe39:ca17/64
        enp8s0 UP 10.140.9.3/24 fe80::8c1:aff:fefb:e8c6/64
        ubuntu@node2d1:~$ route -n
        Kernel IP routing table
        Destination Gateway Genmask Flags Metric Ref Use Iface
        0.0.0.0 23.134.233.129 0.0.0.0 UG 0 0 0 enp7s0
        10.30.6.0 0.0.0.0 255.255.254.0 U 100 0 0 enp3s0
        10.30.6.11 0.0.0.0 255.255.255.255 UH 100 0 0 enp3s0
        10.128.0.0 10.140.9.1 255.192.0.0 UG 0 0 0 enp8s0
        10.140.9.0 0.0.0.0 255.255.255.0 U 0 0 0 enp8s0
        23.134.233.128 0.0.0.0 255.255.255.240 U 0 0 0 enp7s0
        169.254.169.254 10.30.6.11 255.255.255.255 UGH 100 0 0 enp3s0
        ubuntu@node2d1:~$ ping -c1 1.1.1.1
        PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
        
        --- 1.1.1.1 ping statistics ---
        1 packets transmitted, 0 received, 100% packet loss, time 0ms
        
        ubuntu@node2d1:~$ ip route get 1.1.1.1
        1.1.1.1 via 23.134.233.129 dev enp7s0 src 23.134.233.130 uid 1000
        cache
        in reply to: INDI site, can’t route IPv4 #8369
        Sunjay Cauligi
        Participant

          Yes, all the routes are set up; this is a long-running slice that was still working as of yesterday, and I have not made any changes since.

          Sunjay Cauligi
          Participant

            Hi Komal, that doesn’t seem like the same network object?

            >>> fablib.get_slice('qbss-tester').get_network('netSALT').get_sliver().sliver_id
            '7dafe47e-3058-4282-9d09-66fe74588e1c'
            
            Sunjay Cauligi
            Participant

              My nodes have been working since then, but now I’m running into the same problem again with some new nodes while setting up a new slice.

              Slice ID: 1c057890-bc44-450f-b109-995fef665216

              Site: SALT

              My nodes in SALT can reach each other, but no other nodes can ping them (nor can they ping other nodes).

              >>> slc.get_network('netSALT').get_public_ips()
              [IPv4Address('23.134.232.194'), IPv4Address('23.134.232.195'), IPv4Address('23.134.232.196')]
              >>> ping_test()
              [
                  ('100% LOSS', '   ** Ping 4d1 -> 6d1 (23.134.233.117 -> 23.134.232.194) **'),
                  ('100% LOSS', '   ** Ping 4d1 -> 6d2 (23.134.233.117 -> 23.134.232.195) **'),
                  ('100% LOSS', '   ** Ping 4d1 -> 6d3 (23.134.233.117 -> 23.134.232.196) **'),
                  ('100% LOSS', '   ** Ping 4d2 -> 6d1 (23.134.233.118 -> 23.134.232.194) **'),
                  ('100% LOSS', '   ** Ping 4d2 -> 6d2 (23.134.233.118 -> 23.134.232.195) **'),
                  ('100% LOSS', '   ** Ping 4d2 -> 6d3 (23.134.233.118 -> 23.134.232.196) **'),
                  ('100% LOSS', '   ** Ping 4d3 -> 6d1 (23.134.233.119 -> 23.134.232.194) **'),
                  ('100% LOSS', '   ** Ping 4d3 -> 6d2 (23.134.233.119 -> 23.134.232.195) **'),
                  ('100% LOSS', '   ** Ping 4d3 -> 6d3 (23.134.233.119 -> 23.134.232.196) **'),
                  ('100% LOSS', '   ** Ping 5d1 -> 6d1 (23.134.232.226 -> 23.134.232.194) **'),
                  ('100% LOSS', '   ** Ping 5d1 -> 6d2 (23.134.232.226 -> 23.134.232.195) **'),
                  ('100% LOSS', '   ** Ping 5d1 -> 6d3 (23.134.232.226 -> 23.134.232.196) **'),
                  ('100% LOSS', '   ** Ping 5d2 -> 6d1 (23.134.232.227 -> 23.134.232.194) **'),
                  ('100% LOSS', '   ** Ping 5d2 -> 6d2 (23.134.232.227 -> 23.134.232.195) **'),
                  ('100% LOSS', '   ** Ping 5d2 -> 6d3 (23.134.232.227 -> 23.134.232.196) **'),
                  ('100% LOSS', '   ** Ping 5d3 -> 6d1 (23.134.232.228 -> 23.134.232.194) **'),
                  ('100% LOSS', '   ** Ping 5d3 -> 6d2 (23.134.232.228 -> 23.134.232.195) **'),
                  ('100% LOSS', '   ** Ping 5d3 -> 6d3 (23.134.232.228 -> 23.134.232.196) **'),
                  ('100% LOSS', '   ** Ping 6d1 -> 4d1 (23.134.232.194 -> 23.134.233.117) **'),
                  ('100% LOSS', '   ** Ping 6d1 -> 4d2 (23.134.232.194 -> 23.134.233.118) **'),
                  ('100% LOSS', '   ** Ping 6d1 -> 4d3 (23.134.232.194 -> 23.134.233.119) **'),
                  ('100% LOSS', '   ** Ping 6d1 -> 5d1 (23.134.232.194 -> 23.134.232.226) **'),
                  ('100% LOSS', '   ** Ping 6d1 -> 5d2 (23.134.232.194 -> 23.134.232.227) **'),
                  ('100% LOSS', '   ** Ping 6d1 -> 5d3 (23.134.232.194 -> 23.134.232.228) **'),
                  ('100% LOSS', '   ** Ping 6d1 -> monitor (23.134.232.194 -> 10.132.1.2) **'),
                  ('100% LOSS', '   ** Ping 6d2 -> 4d1 (23.134.232.195 -> 23.134.233.117) **'),
                  ('100% LOSS', '   ** Ping 6d2 -> 4d2 (23.134.232.195 -> 23.134.233.118) **'),
                  ('100% LOSS', '   ** Ping 6d2 -> 4d3 (23.134.232.195 -> 23.134.233.119) **'),
                  ('100% LOSS', '   ** Ping 6d2 -> 5d1 (23.134.232.195 -> 23.134.232.226) **'),
                  ('100% LOSS', '   ** Ping 6d2 -> 5d2 (23.134.232.195 -> 23.134.232.227) **'),
                  ('100% LOSS', '   ** Ping 6d2 -> 5d3 (23.134.232.195 -> 23.134.232.228) **'),
                  ('100% LOSS', '   ** Ping 6d2 -> monitor (23.134.232.195 -> 10.132.1.2) **'),
                  ('100% LOSS', '   ** Ping 6d3 -> 4d1 (23.134.232.196 -> 23.134.233.117) **'),
                  ('100% LOSS', '   ** Ping 6d3 -> 4d2 (23.134.232.196 -> 23.134.233.118) **'),
                  ('100% LOSS', '   ** Ping 6d3 -> 4d3 (23.134.232.196 -> 23.134.233.119) **'),
                  ('100% LOSS', '   ** Ping 6d3 -> 5d1 (23.134.232.196 -> 23.134.232.226) **'),
                  ('100% LOSS', '   ** Ping 6d3 -> 5d2 (23.134.232.196 -> 23.134.232.227) **'),
                  ('100% LOSS', '   ** Ping 6d3 -> 5d3 (23.134.232.196 -> 23.134.232.228) **'),
                  ('100% LOSS', '   ** Ping 6d3 -> monitor (23.134.232.196 -> 10.132.1.2) **'),
                  ('100% LOSS', '   ** Ping monitor -> 6d1 (10.132.1.2 -> 23.134.232.194) **'),
                  ('100% LOSS', '   ** Ping monitor -> 6d2 (10.132.1.2 -> 23.134.232.195) **'),
                  ('100% LOSS', '   ** Ping monitor -> 6d3 (10.132.1.2 -> 23.134.232.196) **'),
                  ('OK', '   ** Ping 4d1 -> 4d2 (23.134.233.117 -> 23.134.233.118) **'),
                  ('OK', '   ** Ping 4d1 -> 4d3 (23.134.233.117 -> 23.134.233.119) **'),
                  ('OK', '   ** Ping 4d1 -> 5d1 (23.134.233.117 -> 23.134.232.226) **'),
                  ('OK', '   ** Ping 4d1 -> 5d2 (23.134.233.117 -> 23.134.232.227) **'),
                  ('OK', '   ** Ping 4d1 -> 5d3 (23.134.233.117 -> 23.134.232.228) **'),
                  ('OK', '   ** Ping 4d1 -> monitor (23.134.233.117 -> 10.132.1.2) **'),
                  ('OK', '   ** Ping 4d2 -> 4d1 (23.134.233.118 -> 23.134.233.117) **'),
                  ('OK', '   ** Ping 4d2 -> 4d3 (23.134.233.118 -> 23.134.233.119) **'),
                  ('OK', '   ** Ping 4d2 -> 5d1 (23.134.233.118 -> 23.134.232.226) **'),
                  ('OK', '   ** Ping 4d2 -> 5d2 (23.134.233.118 -> 23.134.232.227) **'),
                  ('OK', '   ** Ping 4d2 -> 5d3 (23.134.233.118 -> 23.134.232.228) **'),
                  ('OK', '   ** Ping 4d2 -> monitor (23.134.233.118 -> 10.132.1.2) **'),
                  ('OK', '   ** Ping 4d3 -> 4d1 (23.134.233.119 -> 23.134.233.117) **'),
                  ('OK', '   ** Ping 4d3 -> 4d2 (23.134.233.119 -> 23.134.233.118) **'),
                  ('OK', '   ** Ping 4d3 -> 5d1 (23.134.233.119 -> 23.134.232.226) **'),
                  ('OK', '   ** Ping 4d3 -> 5d2 (23.134.233.119 -> 23.134.232.227) **'),
                  ('OK', '   ** Ping 4d3 -> 5d3 (23.134.233.119 -> 23.134.232.228) **'),
                  ('OK', '   ** Ping 4d3 -> monitor (23.134.233.119 -> 10.132.1.2) **'),
                  ('OK', '   ** Ping 5d1 -> 4d1 (23.134.232.226 -> 23.134.233.117) **'),
                  ('OK', '   ** Ping 5d1 -> 4d2 (23.134.232.226 -> 23.134.233.118) **'),
                  ('OK', '   ** Ping 5d1 -> 4d3 (23.134.232.226 -> 23.134.233.119) **'),
                  ('OK', '   ** Ping 5d1 -> 5d2 (23.134.232.226 -> 23.134.232.227) **'),
                  ('OK', '   ** Ping 5d1 -> 5d3 (23.134.232.226 -> 23.134.232.228) **'),
                  ('OK', '   ** Ping 5d1 -> monitor (23.134.232.226 -> 10.132.1.2) **'),
                  ('OK', '   ** Ping 5d2 -> 4d1 (23.134.232.227 -> 23.134.233.117) **'),
                  ('OK', '   ** Ping 5d2 -> 4d2 (23.134.232.227 -> 23.134.233.118) **'),
                  ('OK', '   ** Ping 5d2 -> 4d3 (23.134.232.227 -> 23.134.233.119) **'),
                  ('OK', '   ** Ping 5d2 -> 5d1 (23.134.232.227 -> 23.134.232.226) **'),
                  ('OK', '   ** Ping 5d2 -> 5d3 (23.134.232.227 -> 23.134.232.228) **'),
                  ('OK', '   ** Ping 5d2 -> monitor (23.134.232.227 -> 10.132.1.2) **'),
                  ('OK', '   ** Ping 5d3 -> 4d1 (23.134.232.228 -> 23.134.233.117) **'),
                  ('OK', '   ** Ping 5d3 -> 4d2 (23.134.232.228 -> 23.134.233.118) **'),
                  ('OK', '   ** Ping 5d3 -> 4d3 (23.134.232.228 -> 23.134.233.119) **'),
                  ('OK', '   ** Ping 5d3 -> 5d1 (23.134.232.228 -> 23.134.232.226) **'),
                  ('OK', '   ** Ping 5d3 -> 5d2 (23.134.232.228 -> 23.134.232.227) **'),
                  ('OK', '   ** Ping 5d3 -> monitor (23.134.232.228 -> 10.132.1.2) **'),
                  ('OK', '   ** Ping 6d1 -> 6d2 (23.134.232.194 -> 23.134.232.195) **'),
                  ('OK', '   ** Ping 6d1 -> 6d3 (23.134.232.194 -> 23.134.232.196) **'),
                  ('OK', '   ** Ping 6d2 -> 6d1 (23.134.232.195 -> 23.134.232.194) **'),
                  ('OK', '   ** Ping 6d2 -> 6d3 (23.134.232.195 -> 23.134.232.196) **'),
                  ('OK', '   ** Ping 6d3 -> 6d1 (23.134.232.196 -> 23.134.232.194) **'),
                  ('OK', '   ** Ping 6d3 -> 6d2 (23.134.232.196 -> 23.134.232.195) **'),
                  ('OK', '   ** Ping monitor -> 4d1 (10.132.1.2 -> 23.134.233.117) **'),
                  ('OK', '   ** Ping monitor -> 4d2 (10.132.1.2 -> 23.134.233.118) **'),
                  ('OK', '   ** Ping monitor -> 4d3 (10.132.1.2 -> 23.134.233.119) **'),
                  ('OK', '   ** Ping monitor -> 5d1 (10.132.1.2 -> 23.134.232.226) **'),
                  ('OK', '   ** Ping monitor -> 5d2 (10.132.1.2 -> 23.134.232.227) **'),
                  ('OK', '   ** Ping monitor -> 5d3 (10.132.1.2 -> 23.134.232.228) **')
              ]
              
              in reply to: How to ensure IP addresses are free? #7400
              Sunjay Cauligi
              Participant

                Hi Komal, is there no way to check which IP addresses are already in use at a given site(‘s network)? Otherwise if I understand correctly, it seems like a user would have to call make_ip_publicly_routable() -> modify() -> get_public_ips() in a loop until they finally receive public IP addresses, which may take a not-insignificant amount of time due to modify().

                in reply to: How to ensure IP addresses are free? #7397
                Sunjay Cauligi
                Participant

                  Hi Komal, I’m not entirely sure how the code you posted is supposed to help?
                  The .get_public_ips() call only checks the addresses listed in the NetworkService’s FIM user data, and doesn’t seem to check at all what the state of the shared site subnet is.

                  (I updated to fabrictestbed_extensions 1.7.3 before running the following)

                   

                  >>> print(fablib.get_slice('devscript').get_network('netPRIN').get_public_ips())
                  [IPv4Address('23.134.233.114'), IPv4Address('23.134.233.115'), IPv4Address('23.134.233.116')]
                  >>> print(fablib.get_slice('qbss-tester').get_network('netPRIN').get_public_ips())
                  None

                  Additionally, I still don’t think I understand how the IP allocation functions are supposed to be used.
                  For example, after a slice modify request, the list of allocated IPs seems to be reset?

                  >>> [(net.get_name(), net.get_allocated_ips()) for net in slc.get_networks()]
                  [
                      (
                          'netINDI',
                          [IPv4Address('23.134.233.129'), IPv4Address('23.134.233.130'), IPv4Address('23.134.233.131'), IPv4Address('23.134.233.132')]
                      ),
                      (
                          'netPRIN',
                          [IPv4Address('23.134.233.113'), IPv4Address('23.134.233.114'), IPv4Address('23.134.233.115'), IPv4Address('23.134.233.116')]
                      ),
                      (
                          'netCERN',
                          [IPv4Address('23.134.233.209'), IPv4Address('23.134.233.210'), IPv4Address('23.134.233.211'), IPv4Address('23.134.233.212')]
                      )
                  ]
                  >>> slc.modify()
                  Waiting for slice . Slice state: StableOK
                  Waiting for ssh in slice . ssh successful
                  Running post boot config ... Running post boot config threads ...
                  Post boot config node3d2, Done! (8 sec)
                  Post boot config node3d3, Done! (8 sec)
                  Post boot config node3d1, Done! (8 sec)
                  Post boot config node1d1, Done! (9 sec)
                  Post boot config node1d3, Done! (9 sec)
                  Post boot config node1d2, Done! (9 sec)
                  Post boot config node2d2, Done! (13 sec)
                  Post boot config node2d1, Done! (13 sec)
                  Post boot config node2d3, Done! (13 sec)
                  Saving fablib data...  Done!
                  Done!
                  '0e651854-b2fa-4fa2-a4e6-073f76c77fdd'
                  >>> [(net.get_name(), net.get_allocated_ips()) for net in slc.get_networks()]
                  [('netINDI', [IPv4Address('23.134.233.129')]), ('netPRIN', [IPv4Address('23.134.233.113')]), ('netCERN', [IPv4Address('23.134.233.209')])]
                  in reply to: How to ensure IP addresses are free? #7383
                  Sunjay Cauligi
                  Participant

                    Possibly related, I can’t seem to modify my slice with new public IP addresses:

                    >>> slc = fablib.get_slice('qbss-tester')
                    >>> [(net.get_name(), net.get_public_ips()) for net in slc.get_networks()]
                    [('netPRIN', None), ('netGPN', None), ('netSALT', None)]
                    >>> for net in slc.get_networks():
                    ...     net.make_ip_publicly_routable(ipv4=[str(iface.get_ip_addr()) for iface in net.get_interfaces()])
                    ...
                    >>> [(net.get_name(), net.get_public_ips()) for net in slc.get_networks()]
                    [
                        ('netPRIN', [IPv4Address('23.134.233.117'), IPv4Address('23.134.233.118'), IPv4Address('23.134.233.119')]),
                        ('netGPN', [IPv4Address('23.134.232.226'), IPv4Address('23.134.232.227'), IPv4Address('23.134.232.228')]),
                        ('netSALT', [IPv4Address('23.134.232.196'), IPv4Address('23.134.232.194'), IPv4Address('23.134.232.195')])
                    ]
                    >>> slc.modify()
                    ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
                    │ in :1                                                                                    │
                    │                                                                                                  │
                    │ /Users/scauligi/research/icsi/fabric/ei-fablib/venv/lib/python3.10/site-packages/fabrictestbed_e │
                    │ xtensions/fablib/slice.py:2745 in modify                                                         │
                    │                                                                                                  │
                    │   2742 │   │   │   slice_id=self.slice_id, slice_graph=slice_graph                               │
                    │   2743 │   │   )                                                                                 │
                    │   2744 │   │   if return_status != Status.OK:                                                    │
                    │ ❱ 2745 │   │   │   raise Exception(                                                              │
                    │   2746 │   │   │   │   "Failed to submit modify slice: {}, {}".format(                           │
                    │   2747 │   │   │   │   │   return_status, slice_reservations                                     │
                    │   2748 │   │   │   │   )                                                                         │
                    ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
                    Exception: Failed to submit modify slice: Status.FAILURE, (500)
                    Reason: INTERNAL SERVER ERROR
                    HTTP response headers: HTTPHeaderDict({'Server': 'nginx/1.21.6', 'Date': 'Tue, 06 Aug 2024 21:20:26 GMT', 'Content-Type': 'text/html;
                    charset=utf-8', 'Content-Length': '219', 'Connection': 'keep-alive', 'Access-Control-Allow-Credentials': 'true',
                    'Access-Control-Allow-Headers': 'DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range, Authorization',
                    'Access-Control-Allow-Methods': 'GET, POST, PUT, PATCH, DELETE, OPTIONS', 'Access-Control-Allow-Origin': '*',
                    'Access-Control-Expose-Headers': 'Content-Length, Content-Range, X-Error', 'X-Error': "'NoneType' object has no attribute 'get_type'"})
                    HTTP response body: b'{\n    "errors": [\n        {\n            "details": "\'NoneType\' object has no attribute \'get_type\'",\n
                    "message": "Internal Server Error"\n        }\n    ],\n    "size": 1,\n    "status": 500,\n    "type": "error"\n}'
                    
                    
                    Sunjay Cauligi
                    Participant

                      Awesome! I was planning to set up a small monitor to periodically check the external reachability of the nodes; I’ll post again if I see anything go down, but great to hear it’s probably good to go for now!

                    Viewing 9 posts - 1 through 9 (of 9 total)