1. Management IP Invalid: None when running Python code in Jupyter

Management IP Invalid: None when running Python code in Jupyter

Home Forums FABRIC General Questions and Discussion Management IP Invalid: None when running Python code in Jupyter

Viewing 9 posts - 16 through 24 (of 24 total)
  • Author
    Posts
  • #2317
    Paul Ruth
    Keymaster

      It works for me but it didn’t work the first time I tried it. The error I got the first time might be your problem too.

      The first time I ran it I got this:

      pruth@pruth-laptop Desktop % python3 hello_edited.py
      Name CPUs Cores RAM (G) Disk (G) Basic (100 Gbps NIC) ConnectX-6 (100 Gbps x2 NIC) ConnectX-5 (25 Gbps x2 NIC) P4510 (NVMe 1TB) Tesla T4 (GPU) RTX6000 (GPU)
      ------ ------ ------- --------- ------------- ---------------------- ------------------------------ ----------------------------- ------------------ ---------------- ---------------
      MICH 6 190/192 1530/1536 60590/60600 381/381 0/2 2/2 10/10 2/2 3/3
      UTAH 10 320/320 2560/2560 116400/116400 635/635 2/2 4/4 16/16 4/4 5/5
      TACC 10 238/320 2328/2560 115590/116400 632/635 2/2 4/4 16/16 4/4 6/6
      WASH 6 188/192 1520/1536 60580/60600 379/381 2/2 2/2 10/10 2/2 3/3
      NCSA 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
      DALL 6 190/192 1528/1536 60590/60600 381/381 2/2 2/2 10/10 2/2 3/3
      MAX 10 290/320 2452/2560 116190/116400 619/635 1/2 4/4 16/16 4/4 6/6
      MASS 4 120/128 992/1024 55700/55800 254/254 1/2 0/0 6/6 0/0 3/3
      SALT 6 184/192 1504/1536 60500/60600 380/381 2/2 2/2 10/10 2/2 3/3
      STAR 12 368/384 3008/3072 121060/121200 757/762 2/2 6/6 20/20 6/6 4/6
      Exception: Failed to submit slice: Status.FAILURE, (500)
      Reason: INTERNAL SERVER ERROR
      HTTP response headers: HTTPHeaderDict({'Server': 'nginx/1.21.6', 'Date': 'Fri, 15 Jul 2022 15:08:55 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '28', 'Connection': 'keep-alive', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Headers': 'DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range', 'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, OPTIONS', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'Content-Length, Content-Range, X-Error', 'X-Error': 'Slice MySlice already exists'})
      HTTP response body: Slice MySlice already exists
      
      Exception: 'NoneType' object has no attribute 'slice_name'
      ----------------- --------------------------------------------------------------------------------------------------------------------
      ID
      Name Node1
      Cores
      RAM
      Disk
      Image default_rocky_8
      Image Type qcow2
      Host
      Site UTAH
      Management IP
      Reservation State
      Error Message
      SSH Command ssh -i /Users/pruth/work/fabric_config/slice-private-key -J pruth_0031379841@bastion-1.fabric-testbed.net rocky@None
      ----------------- --------------------------------------------------------------------------------------------------------------------
      Exception: node.execute: Management IP Invalid: None
      Exception: Failed to delete slice: Status.INVALID_ARGUMENTS, Invalid arguments
      pruth@pruth-laptop Desktop %
      

      Notice the error in the middle that says “HTTP response body: Slice MySlice already exists”.  This is because I already had a slice called “MySlice”.   I deleted that slice and re-ran your script and it worked.  This was the result:

      pruth@pruth-laptop Desktop % python3 hello_edited.py
      Name      CPUs  Cores    RAM (G)    Disk (G)       Basic (100 Gbps NIC)    ConnectX-6 (100 Gbps x2 NIC)    ConnectX-5 (25 Gbps x2 NIC)    P4510 (NVMe 1TB)    Tesla T4 (GPU)    RTX6000 (GPU)
      ------  ------  -------  ---------  -------------  ----------------------  ------------------------------  -----------------------------  ------------------  ----------------  ---------------
      MICH         6  190/192  1530/1536  60590/60600    381/381                 0/2                             2/2                            10/10               2/2               3/3
      UTAH        10  320/320  2560/2560  116400/116400  635/635                 2/2                             4/4                            16/16               4/4               5/5
      TACC        10  238/320  2328/2560  115590/116400  632/635                 2/2                             4/4                            16/16               4/4               6/6
      WASH         6  188/192  1520/1536  60580/60600    379/381                 2/2                             2/2                            10/10               2/2               3/3
      NCSA         6  192/192  1536/1536  60600/60600    381/381                 2/2                             2/2                            10/10               2/2               3/3
      DALL         6  190/192  1528/1536  60590/60600    381/381                 2/2                             2/2                            10/10               2/2               3/3
      MAX         10  290/320  2452/2560  116190/116400  619/635                 1/2                             4/4                            16/16               4/4               6/6
      MASS         4  120/128  992/1024   55700/55800    254/254                 1/2                             0/0                            6/6                 0/0               3/3
      SALT         6  184/192  1504/1536  60500/60600    380/381                 2/2                             2/2                            10/10               2/2               3/3
      STAR        12  368/384  3008/3072  121060/121200  757/762                 2/2                             6/6                            20/20               6/6               4/6
      
      Waiting for slice ........... Slice state: StableOK
      Waiting for ssh in slice .. ssh successful
      Running post boot config ... Done!
      ---------------  ------------------------------------
      Slice Name       MySlice
      Slice ID         fba02fd7-423e-4309-9954-c3cbff38870a
      Slice State      StableOK
      Lease End (UTC)  2022-07-16 15:11:53 +0000
      ---------------  ------------------------------------
      -----------------  ------------------------------------------------------------------------------------------------------------------------------------------------------
      ID                 59eda82a-b9b7-4670-b830-40cff59e18cc
      Name               Node1
      Cores              2
      RAM                8
      Disk               10
      Image              default_rocky_8
      Image Type         qcow2
      Host               dall-w3.fabric-testbed.net
      Site               DALL
      Management IP      2001:400:a100:3000:f816:3eff:fe7e:5477
      Reservation State  Active
      Error Message
      SSH Command        ssh -i /Users/pruth/work/fabric_config/slice-private-key -J pruth_0031379841@bastion-1.fabric-testbed.net rocky@2001:400:a100:3000:f816:3eff:fe7e:5477
      -----------------  ------------------------------------------------------------------------------------------------------------------------------------------------------
      Hello, FABRIC from node 59eda82a-b9b7-4670-b830-40cff59e18cc-node1
      
      pruth@pruth-laptop Desktop % 
      

      Is this your issue too?
       

      • This reply was modified 2 years, 5 months ago by Paul Ruth.
      • This reply was modified 2 years, 5 months ago by Paul Ruth.
      #2320

      I don’t think so. I get a different error. I made sure that I didn’t have an opened slice called MySlice, then when I ran it I got this:

      (base) fabric@jupyter-xweintra-40purdue-2eedu:~/work$ python hello.py
      Name CPUs Cores RAM (G) Disk (G) Basic (100 Gbps NIC) ConnectX-6 (100 Gbps x2 NIC) ConnectX-5 (25 Gbps x2 NIC) P4510 (NVMe 1TB) Tesla T4 (GPU) RTX6000 (GPU)
      —— —— ——- ——— ————- ———————- —————————— —————————– —————— —————- —————
      MICH 6 190/192 1530/1536 60590/60600 381/381 0/2 2/2 10/10 2/2 3/3
      UTAH 10 320/320 2560/2560 116400/116400 635/635 2/2 4/4 16/16 4/4 5/5
      TACC 10 238/320 2328/2560 115590/116400 632/635 2/2 4/4 16/16 4/4 6/6
      WASH 6 188/192 1520/1536 60580/60600 379/381 2/2 2/2 10/10 2/2 3/3
      NCSA 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
      DALL 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
      MAX 10 290/320 2452/2560 116190/116400 619/635 1/2 4/4 16/16 4/4 6/6
      MASS 4 120/128 992/1024 55700/55800 254/254 1/2 0/0 6/6 0/0 3/3
      SALT 6 184/192 1504/1536 60500/60600 380/381 2/2 2/2 10/10 2/2 3/3
      STAR 12 368/384 3008/3072 121060/121200 757/762 2/2 6/6 20/20 6/6 4/6
      Running post boot config … Exception: node.execute: Management IP Invalid: None
      ———– ————————————
      Slice Name MySlice
      Slice ID c26d5e3b-6e81-48f1-b12d-f68a6fbc1ea6
      Slice State Configuring
      Lease End 2022-07-16 15:22:29 +0000
      ———– ————————————
      —————– ———————————————————————————————-
      ID
      Name Node1
      Cores
      RAM
      Disk
      Image default_rocky_8
      Image Type qcow2
      Host
      Site NCSA
      Management IP
      Reservation State
      Error Message
      SSH Command ssh -i /home/fabric/.ssh/id_rsa -J xweintra_0000014567@bastion-1.fabric-testbed.net rocky@None
      —————– ———————————————————————————————-
      Exception: node.execute: Management IP Invalid: None
      Exception: Failed to delete slice: Status.FAILURE, (500)
      Reason: INTERNAL SERVER ERROR
      HTTP response headers: HTTPHeaderDict({‘Server’: ‘nginx/1.21.6’, ‘Date’: ‘Fri, 15 Jul 2022 15:22:31 GMT’, ‘Content-Type’: ‘text/html; charset=utf-8’, ‘Content-Length’: ‘100’, ‘Connection’: ‘keep-alive’, ‘Access-Control-Allow-Credentials’: ‘true’, ‘Access-Control-Allow-Headers’: ‘DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range’, ‘Access-Control-Allow-Methods’: ‘GET, POST, PUT, DELETE, OPTIONS’, ‘Access-Control-Allow-Origin’: ‘*’, ‘Access-Control-Expose-Headers’: ‘Content-Length, Content-Range, X-Error’, ‘X-Error’: ‘Unable to delete Slice# c26d5e3b-6e81-48f1-b12d-f68a6fbc1ea6 that is not yet stable, try again later’})
      HTTP response body: Unable to delete Slice# c26d5e3b-6e81-48f1-b12d-f68a6fbc1ea6 that is not yet stable, try again later

      As you can see, the error is “Management IP Invalid: None” just after running post boot config. Does it also work for you if you try to run the script from Jupyter? That’s where I ran it from.

      I haven’t gotten fabric to work properly from my local computer yet, I get this error, which I have a feeling might be because I’m trying to run it from Windows? I have no clue:

      Failed to get slice topology: Status.FAILURE, Error [Unable to read graph C:\Users\xwein\AppData\Local\Temp\tmprkqs64qf-graphml] importing graph

      Side note, how do I do the quote segment with overflow? I don’t know how to use this markup very well.

      #2326
      Ilya Baldin
      Participant

        Can you provide a fuller stack trace to the most recent error with reading/importing the graph?

        #2327
        Paul Ruth
        Keymaster

          This might be Windows issue. I’m going to have to have some other people look at it. Is there any way you could reproduce that graphml error and include a full stack trace? That might help us track this down.

          Re: Code in a forum post. Clickt the “Text” tab next to the “Visual” tab that in top right of the box that you are typing in. The click the “CODE” button and it will insert a then add your code, then click the “/CODE” button to insert another.  Anything between the `s will be in the box that my code was in.

          • This reply was modified 2 years, 5 months ago by Paul Ruth.
          #2334

          Yeah, here you go:

          
          Traceback (most recent call last):
          File "D:\Research\FABRIC\fabric-scripts\hello_fabric.py", line 37, in
          slice.submit(wait=False)
          File "C:\Users\xwein\AppData\Local\Programs\Python\Python39\lib\site-packages\fabrictestbed_extensions\fablib\slice.py", line 1217, in submit
          self.update()
          File "C:\Users\xwein\AppData\Local\Programs\Python\Python39\lib\site-packages\fabrictestbed_extensions\fablib\slice.py", line 325, in update
          self.update_topology()
          File "C:\Users\xwein\AppData\Local\Programs\Python\Python39\lib\site-packages\fabrictestbed_extensions\fablib\slice.py", line 278, in update_topology
          raise Exception("Failed to get slice topology: {}, {}".format(return_status, new_topo))
          Exception: Failed to get slice topology: Status.FAILURE, Error [Unable to read graph C:\Users\xwein\AppData\Local\Temp\tmpw2z0kyuu-graphml] importing graph
          
          #3329
          Mami Hayashida
          Participant

            I am having the same issue when running hello_fabric.ipynb.
            `

            —————————————————————————
            Exception Traceback (most recent call last)
            /tmp/ipykernel_279/774655997.py in <module>
            —-> 1 slice.wait_jupyter(timeout=1000, interval=60)

            /opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py in wait_jupyter(self, timeout, interval)
            1174
            1175 print(“Running post_boot_config … “, end=””)
            -> 1176 self.post_boot_config()
            1177 print(f”Time to post boot config {time.time() – start:.0f} seconds”)
            1178

            /opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py in post_boot_config(self)
            1112
            1113 for node_thread in node_threads:
            -> 1114 node_thread.result()
            1115
            1116 for interface in self.get_interfaces():

            /opt/conda/lib/python3.9/concurrent/futures/_base.py in result(self, timeout)
            436 raise CancelledError()
            437 elif self._state == FINISHED:
            –> 438 return self.__get_result()
            439
            440 self._condition.wait(timeout)

            /opt/conda/lib/python3.9/concurrent/futures/_base.py in __get_result(self)
            388 if self._exception:
            389 try:
            –> 390 raise self._exception
            391 finally:
            392 # Break a reference cycle with the exception in self._exception

            /opt/conda/lib/python3.9/concurrent/futures/thread.py in run(self)
            50
            51 try:
            —> 52 result = self.fn(*self.args, **self.kwargs)
            53 except BaseException as exc:
            54 self.future.set_exception(exc)

            /opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py in network_manager_stop(self)
            1206 except Exception as e:
            1207 logging.warning(f”Failed to stop network manager: {e}”)
            -> 1208 raise e
            1209
            1210 def network_manager_start(self):

            /opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py in network_manager_stop(self)
            1190 # logging.info(f”No conn for device. conn: ‘{conn}'”)
            1191
            -> 1192 stdout, stderr = self.execute(f”sudo systemctl stop NetworkManager”)
            1193 logging.info(f”Stopped NetworkManager with ‘sudo systemctl stop ”
            1194 f”NetworkManager’: stdout: {stdout}\nstderr: {stderr}”)

            /opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py in execute(self, command, retry, retry_interval, username, private_key_file, private_key_passphrase)
            655 src_addr = (‘0:0:0:0:0:0:0:0’, 22)
            656 else:
            –> 657 raise Exception(f”node.execute: Management IP Invalid: {management_ip}”)
            658 dest_addr = (management_ip, 22)
            659

            Exception: node.execute: Management IP Invalid: None
            `

            #3330
            Paul Ruth
            Keymaster

              This is likely a bug that happens when the testbed is busy and a bit slow.   What happens is that the slice becomes “StableOK” before the management IP is set on the node.   Usually this happens so fast that the management IP is ready when you need it but occasionally there is enough of a delay to trigger this error.

              There are a few ways to work around this.

              One option is to wait a few seconds after the failure and then call fablib.get_slice(“<slice_name>”) again and it will pull a new copy the slice information that will have the management IP.   Depending on when you do this ,you may need to re-call “post_boot_config” on the slice as well.

              Another option is to install a new pre-release version of fablib which has a permanent fix for this.  There are a bunch of bug fixes and some extra features too. Try:

              pip install fabrictestbed-extensions==1.3.2rc3 --user

               

              • This reply was modified 2 years, 2 months ago by Paul Ruth.
              • This reply was modified 2 years, 2 months ago by Paul Ruth.
              #3331
              Sean Cummings
              Participant

                Hi. I just wanted to let you know I am currently having the same issue with management ip not being assigned to nodes.

                #3335
                Mami Hayashida
                Participant

                  Paul, it looks like calling fablib.get_slice(“<slice_name>”) worked. I got the management IP for the (one and only) node in that slice and I can ssh into it.

                Viewing 9 posts - 16 through 24 (of 24 total)
                • You must be logged in to reply to this topic.