Home › Forums › FABRIC General Questions and Discussion › Management IP Invalid: None when running Python code in Jupyter
- This topic has 23 replies, 5 voices, and was last updated 2 years, 2 months ago by Mami Hayashida.
-
AuthorPosts
-
July 15, 2022 at 10:19 am #2317
It works for me but it didn’t work the first time I tried it. The error I got the first time might be your problem too.
The first time I ran it I got this:
pruth@pruth-laptop Desktop % python3 hello_edited.py Name CPUs Cores RAM (G) Disk (G) Basic (100 Gbps NIC) ConnectX-6 (100 Gbps x2 NIC) ConnectX-5 (25 Gbps x2 NIC) P4510 (NVMe 1TB) Tesla T4 (GPU) RTX6000 (GPU) ------ ------ ------- --------- ------------- ---------------------- ------------------------------ ----------------------------- ------------------ ---------------- --------------- MICH 6 190/192 1530/1536 60590/60600 381/381 0/2 2/2 10/10 2/2 3/3 UTAH 10 320/320 2560/2560 116400/116400 635/635 2/2 4/4 16/16 4/4 5/5 TACC 10 238/320 2328/2560 115590/116400 632/635 2/2 4/4 16/16 4/4 6/6 WASH 6 188/192 1520/1536 60580/60600 379/381 2/2 2/2 10/10 2/2 3/3 NCSA 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3 DALL 6 190/192 1528/1536 60590/60600 381/381 2/2 2/2 10/10 2/2 3/3 MAX 10 290/320 2452/2560 116190/116400 619/635 1/2 4/4 16/16 4/4 6/6 MASS 4 120/128 992/1024 55700/55800 254/254 1/2 0/0 6/6 0/0 3/3 SALT 6 184/192 1504/1536 60500/60600 380/381 2/2 2/2 10/10 2/2 3/3 STAR 12 368/384 3008/3072 121060/121200 757/762 2/2 6/6 20/20 6/6 4/6 Exception: Failed to submit slice: Status.FAILURE, (500) Reason: INTERNAL SERVER ERROR HTTP response headers: HTTPHeaderDict({'Server': 'nginx/1.21.6', 'Date': 'Fri, 15 Jul 2022 15:08:55 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '28', 'Connection': 'keep-alive', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Headers': 'DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range', 'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, OPTIONS', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'Content-Length, Content-Range, X-Error', 'X-Error': 'Slice MySlice already exists'}) HTTP response body: Slice MySlice already exists Exception: 'NoneType' object has no attribute 'slice_name' ----------------- -------------------------------------------------------------------------------------------------------------------- ID Name Node1 Cores RAM Disk Image default_rocky_8 Image Type qcow2 Host Site UTAH Management IP Reservation State Error Message SSH Command ssh -i /Users/pruth/work/fabric_config/slice-private-key -J pruth_0031379841@bastion-1.fabric-testbed.net rocky@None ----------------- -------------------------------------------------------------------------------------------------------------------- Exception: node.execute: Management IP Invalid: None Exception: Failed to delete slice: Status.INVALID_ARGUMENTS, Invalid arguments pruth@pruth-laptop Desktop %
Notice the error in the middle that says “HTTP response body: Slice MySlice already exists”. This is because I already had a slice called “MySlice”. I deleted that slice and re-ran your script and it worked. This was the result:
pruth@pruth-laptop Desktop % python3 hello_edited.py Name CPUs Cores RAM (G) Disk (G) Basic (100 Gbps NIC) ConnectX-6 (100 Gbps x2 NIC) ConnectX-5 (25 Gbps x2 NIC) P4510 (NVMe 1TB) Tesla T4 (GPU) RTX6000 (GPU) ------ ------ ------- --------- ------------- ---------------------- ------------------------------ ----------------------------- ------------------ ---------------- --------------- MICH 6 190/192 1530/1536 60590/60600 381/381 0/2 2/2 10/10 2/2 3/3 UTAH 10 320/320 2560/2560 116400/116400 635/635 2/2 4/4 16/16 4/4 5/5 TACC 10 238/320 2328/2560 115590/116400 632/635 2/2 4/4 16/16 4/4 6/6 WASH 6 188/192 1520/1536 60580/60600 379/381 2/2 2/2 10/10 2/2 3/3 NCSA 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3 DALL 6 190/192 1528/1536 60590/60600 381/381 2/2 2/2 10/10 2/2 3/3 MAX 10 290/320 2452/2560 116190/116400 619/635 1/2 4/4 16/16 4/4 6/6 MASS 4 120/128 992/1024 55700/55800 254/254 1/2 0/0 6/6 0/0 3/3 SALT 6 184/192 1504/1536 60500/60600 380/381 2/2 2/2 10/10 2/2 3/3 STAR 12 368/384 3008/3072 121060/121200 757/762 2/2 6/6 20/20 6/6 4/6 Waiting for slice ........... Slice state: StableOK Waiting for ssh in slice .. ssh successful Running post boot config ... Done! --------------- ------------------------------------ Slice Name MySlice Slice ID fba02fd7-423e-4309-9954-c3cbff38870a Slice State StableOK Lease End (UTC) 2022-07-16 15:11:53 +0000 --------------- ------------------------------------ ----------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ ID 59eda82a-b9b7-4670-b830-40cff59e18cc Name Node1 Cores 2 RAM 8 Disk 10 Image default_rocky_8 Image Type qcow2 Host dall-w3.fabric-testbed.net Site DALL Management IP 2001:400:a100:3000:f816:3eff:fe7e:5477 Reservation State Active Error Message SSH Command ssh -i /Users/pruth/work/fabric_config/slice-private-key -J pruth_0031379841@bastion-1.fabric-testbed.net rocky@2001:400:a100:3000:f816:3eff:fe7e:5477 ----------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ Hello, FABRIC from node 59eda82a-b9b7-4670-b830-40cff59e18cc-node1 pruth@pruth-laptop Desktop %
Is this your issue too?
- This reply was modified 2 years, 5 months ago by Paul Ruth.
- This reply was modified 2 years, 5 months ago by Paul Ruth.
July 15, 2022 at 10:40 am #2320I don’t think so. I get a different error. I made sure that I didn’t have an opened slice called MySlice, then when I ran it I got this:
(base) fabric@jupyter-xweintra-40purdue-2eedu:~/work$ python hello.py
Name CPUs Cores RAM (G) Disk (G) Basic (100 Gbps NIC) ConnectX-6 (100 Gbps x2 NIC) ConnectX-5 (25 Gbps x2 NIC) P4510 (NVMe 1TB) Tesla T4 (GPU) RTX6000 (GPU)
—— —— ——- ——— ————- ———————- —————————— —————————– —————— —————- —————
MICH 6 190/192 1530/1536 60590/60600 381/381 0/2 2/2 10/10 2/2 3/3
UTAH 10 320/320 2560/2560 116400/116400 635/635 2/2 4/4 16/16 4/4 5/5
TACC 10 238/320 2328/2560 115590/116400 632/635 2/2 4/4 16/16 4/4 6/6
WASH 6 188/192 1520/1536 60580/60600 379/381 2/2 2/2 10/10 2/2 3/3
NCSA 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
DALL 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
MAX 10 290/320 2452/2560 116190/116400 619/635 1/2 4/4 16/16 4/4 6/6
MASS 4 120/128 992/1024 55700/55800 254/254 1/2 0/0 6/6 0/0 3/3
SALT 6 184/192 1504/1536 60500/60600 380/381 2/2 2/2 10/10 2/2 3/3
STAR 12 368/384 3008/3072 121060/121200 757/762 2/2 6/6 20/20 6/6 4/6
Running post boot config … Exception: node.execute: Management IP Invalid: None
———– ————————————
Slice Name MySlice
Slice ID c26d5e3b-6e81-48f1-b12d-f68a6fbc1ea6
Slice State Configuring
Lease End 2022-07-16 15:22:29 +0000
———– ————————————
—————– ———————————————————————————————-
ID
Name Node1
Cores
RAM
Disk
Image default_rocky_8
Image Type qcow2
Host
Site NCSA
Management IP
Reservation State
Error Message
SSH Command ssh -i /home/fabric/.ssh/id_rsa -J xweintra_0000014567@bastion-1.fabric-testbed.net rocky@None
—————– ———————————————————————————————-
Exception: node.execute: Management IP Invalid: None
Exception: Failed to delete slice: Status.FAILURE, (500)
Reason: INTERNAL SERVER ERROR
HTTP response headers: HTTPHeaderDict({‘Server’: ‘nginx/1.21.6’, ‘Date’: ‘Fri, 15 Jul 2022 15:22:31 GMT’, ‘Content-Type’: ‘text/html; charset=utf-8’, ‘Content-Length’: ‘100’, ‘Connection’: ‘keep-alive’, ‘Access-Control-Allow-Credentials’: ‘true’, ‘Access-Control-Allow-Headers’: ‘DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range’, ‘Access-Control-Allow-Methods’: ‘GET, POST, PUT, DELETE, OPTIONS’, ‘Access-Control-Allow-Origin’: ‘*’, ‘Access-Control-Expose-Headers’: ‘Content-Length, Content-Range, X-Error’, ‘X-Error’: ‘Unable to delete Slice# c26d5e3b-6e81-48f1-b12d-f68a6fbc1ea6 that is not yet stable, try again later’})
HTTP response body: Unable to delete Slice# c26d5e3b-6e81-48f1-b12d-f68a6fbc1ea6 that is not yet stable, try again laterAs you can see, the error is “Management IP Invalid: None” just after running post boot config. Does it also work for you if you try to run the script from Jupyter? That’s where I ran it from.
I haven’t gotten fabric to work properly from my local computer yet, I get this error, which I have a feeling might be because I’m trying to run it from Windows? I have no clue:
Failed to get slice topology: Status.FAILURE, Error [Unable to read graph C:\Users\xwein\AppData\Local\Temp\tmprkqs64qf-graphml] importing graph
Side note, how do I do the quote segment with overflow? I don’t know how to use this markup very well.
July 15, 2022 at 11:38 am #2326Can you provide a fuller stack trace to the most recent error with reading/importing the graph?
July 15, 2022 at 11:40 am #2327This might be Windows issue. I’m going to have to have some other people look at it. Is there any way you could reproduce that graphml error and include a full stack trace? That might help us track this down.
Re: Code in a forum post. Clickt the “Text” tab next to the “Visual” tab that in top right of the box that you are typing in. The click the “CODE” button and it will insert a then add your code, then click the “/CODE” button to insert another. Anything between the `s will be in the box that my code was in.
- This reply was modified 2 years, 5 months ago by Paul Ruth.
July 15, 2022 at 12:13 pm #2334Yeah, here you go:
Traceback (most recent call last): File "D:\Research\FABRIC\fabric-scripts\hello_fabric.py", line 37, in slice.submit(wait=False) File "C:\Users\xwein\AppData\Local\Programs\Python\Python39\lib\site-packages\fabrictestbed_extensions\fablib\slice.py", line 1217, in submit self.update() File "C:\Users\xwein\AppData\Local\Programs\Python\Python39\lib\site-packages\fabrictestbed_extensions\fablib\slice.py", line 325, in update self.update_topology() File "C:\Users\xwein\AppData\Local\Programs\Python\Python39\lib\site-packages\fabrictestbed_extensions\fablib\slice.py", line 278, in update_topology raise Exception("Failed to get slice topology: {}, {}".format(return_status, new_topo)) Exception: Failed to get slice topology: Status.FAILURE, Error [Unable to read graph C:\Users\xwein\AppData\Local\Temp\tmpw2z0kyuu-graphml] importing graph
October 19, 2022 at 2:43 pm #3329I am having the same issue when running
hello_fabric.ipynb
.
`
—————————————————————————
Exception Traceback (most recent call last)
/tmp/ipykernel_279/774655997.py in <module>
—-> 1 slice.wait_jupyter(timeout=1000, interval=60)/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py in wait_jupyter(self, timeout, interval)
1174
1175 print(“Running post_boot_config … “, end=””)
-> 1176 self.post_boot_config()
1177 print(f”Time to post boot config {time.time() – start:.0f} seconds”)
1178/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py in post_boot_config(self)
1112
1113 for node_thread in node_threads:
-> 1114 node_thread.result()
1115
1116 for interface in self.get_interfaces():/opt/conda/lib/python3.9/concurrent/futures/_base.py in result(self, timeout)
436 raise CancelledError()
437 elif self._state == FINISHED:
–> 438 return self.__get_result()
439
440 self._condition.wait(timeout)/opt/conda/lib/python3.9/concurrent/futures/_base.py in __get_result(self)
388 if self._exception:
389 try:
–> 390 raise self._exception
391 finally:
392 # Break a reference cycle with the exception in self._exception/opt/conda/lib/python3.9/concurrent/futures/thread.py in run(self)
50
51 try:
—> 52 result = self.fn(*self.args, **self.kwargs)
53 except BaseException as exc:
54 self.future.set_exception(exc)/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py in network_manager_stop(self)
1206 except Exception as e:
1207 logging.warning(f”Failed to stop network manager: {e}”)
-> 1208 raise e
1209
1210 def network_manager_start(self):/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py in network_manager_stop(self)
1190 # logging.info(f”No conn for device. conn: ‘{conn}'”)
1191
-> 1192 stdout, stderr = self.execute(f”sudo systemctl stop NetworkManager”)
1193 logging.info(f”Stopped NetworkManager with ‘sudo systemctl stop ”
1194 f”NetworkManager’: stdout: {stdout}\nstderr: {stderr}”)/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py in execute(self, command, retry, retry_interval, username, private_key_file, private_key_passphrase)
655 src_addr = (‘0:0:0:0:0:0:0:0’, 22)
656 else:
–> 657 raise Exception(f”node.execute: Management IP Invalid: {management_ip}”)
658 dest_addr = (management_ip, 22)
659Exception: node.execute: Management IP Invalid: None
`
October 19, 2022 at 4:05 pm #3330This is likely a bug that happens when the testbed is busy and a bit slow. What happens is that the slice becomes “StableOK” before the management IP is set on the node. Usually this happens so fast that the management IP is ready when you need it but occasionally there is enough of a delay to trigger this error.
There are a few ways to work around this.
One option is to wait a few seconds after the failure and then call fablib.get_slice(“<slice_name>”) again and it will pull a new copy the slice information that will have the management IP. Depending on when you do this ,you may need to re-call “post_boot_config” on the slice as well.
Another option is to install a new pre-release version of fablib which has a permanent fix for this. There are a bunch of bug fixes and some extra features too. Try:
pip install fabrictestbed-extensions==1.3.2rc3 --user
- This reply was modified 2 years, 2 months ago by Paul Ruth.
- This reply was modified 2 years, 2 months ago by Paul Ruth.
October 19, 2022 at 4:28 pm #3331Hi. I just wanted to let you know I am currently having the same issue with management ip not being assigned to nodes.
October 19, 2022 at 5:21 pm #3335Paul, it looks like calling
fablib.get_slice(“<slice_name>”)
worked. I got the management IP for the (one and only) node in that slice and I can ssh into it. -
AuthorPosts
- You must be logged in to reply to this topic.