Forum Replies Created
-
AuthorPosts
-
July 27, 2022 at 2:36 pm in reply to: When Creating a Slice, Sometimes Fails to Get NIC Components Correctly #2545
I’ll try that, but I just want to make clear that it’s the NICs that are failing to be gotten correctly. The GPUs work fine.
July 27, 2022 at 1:20 pm in reply to: When Creating a Slice, Sometimes Fails to Get NIC Components Correctly #2543Hi, I know it’s been a while since I posted this but I wanted to update because this problem seems to have gotten worse (or maybe I’m just getting unlucky?) and I finally found the log file. I ran my slice setup and got this output in the notebook:
--------------- ------------------------------------ Slice Name TestModel Slice ID 29726f95-fb45-4c94-81a8-01d5e89d32ef Slice State StableOK Lease End (UTC) 2022-07-28 18:10:30 +0000 --------------- ------------------------------------ Retry: 12, Time: 140 sec ID Name Site Host Cores RAM Disk Image Management IP State Error ------------------------------------ ------ ------ -------------------------- ------- ----- ------ ----------------- -------------------------------------- ------- ------- 3d40f9a1-0d3c-4e31-b727-883d3331bda9 Node1 STAR star-w2.fabric-testbed.net 2 8 100 default_ubuntu_20 2001:400:a100:3030:f816:3eff:fe6f:5e32 Active 09f6a983-004e-4239-b27a-8fda35ae7597 Node2 STAR star-w2.fabric-testbed.net 2 8 100 default_ubuntu_20 2001:400:a100:3030:f816:3eff:feec:63f8 Active Time to stable 140 seconds Running post_boot_config ... Time to post boot config 148 seconds Name Node Network Bandwidth VLAN MAC Physical OS Interface OS Interface ------------- ------ --------- ----------- ------ ----- ----------------------- -------------- Node1-nic1-p1 Node1 net1 0 Node2-nic2-p1 Node2 net1 0 Time to print interfaces 153 seconds
I checked the logs, and here’s what they say from the time I ran my code:
[18:10:29] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py:144} INFO - Adding node: Node1, slice: TestModel, site: STAR [18:10:29] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py:144} INFO - Adding node: Node2, slice: TestModel, site: STAR [18:10:29] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/network_service.py:295} INFO - Create Network Service: Slice: TestModel, Network Name: net1, Type: L2Bridge [18:10:29] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/network_service.py:590} WARNING - Failed to get reservation_id: 'NoneType' object has no attribute 'reservation_id' [18:12:55] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:1120} INFO - post_boot_config: slice_name: TestModel, slice_id 29726f95-fb45-4c94-81a8-01d5e89d32ef [18:12:55] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:1124} INFO - Starting thread: Node1_network_manager_stop [18:12:55] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:1124} INFO - Starting thread: Node2_network_manager_stop [18:12:56] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py:1220} INFO - Stopped NetworkManager with 'sudo systemctl stop NetworkManager': stdout: stderr: Failed to stop NetworkManager.service: Unit NetworkManager.service not loaded. [18:12:56] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/node.py:1220} INFO - Stopped NetworkManager with 'sudo systemctl stop NetworkManager': stdout: stderr: Failed to stop NetworkManager.service: Unit NetworkManager.service not loaded. [18:13:06] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:163} INFO - Starting get network name thread for iface Node1-nic1-p1 [18:13:06] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:167} INFO - Starting get node name thread for iface Node1-nic1-p1 [18:13:06] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:170} INFO - Starting get physical_os_interface_name_threads for iface Node1-nic1-p1 [18:13:06] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:173} INFO - Starting get get_os_interface_threads for iface Node1-nic1-p1 [18:13:06] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:163} INFO - Starting get network name thread for iface Node2-nic2-p1 [18:13:06] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:167} INFO - Starting get node name thread for iface Node2-nic2-p1 [18:13:06] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:170} INFO - Starting get physical_os_interface_name_threads for iface Node2-nic2-p1 [18:13:06] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:173} INFO - Starting get get_os_interface_threads for iface Node2-nic2-p1 [18:13:08] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:182} INFO - Getting results from get network name thread for iface Node1-nic1-p1 [18:13:08] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:189} INFO - Getting results from get node name thread for iface Node1-nic1-p1 [18:13:08] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:182} INFO - Getting results from get network name thread for iface Node2-nic2-p1 [18:13:08] {/opt/conda/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py:189} INFO - Getting results from get node name thread for iface Node2-nic2-p1
That worked! Thank you
July 15, 2022 at 12:13 pm in reply to: Management IP Invalid: None when running Python code in Jupyter #2334Yeah, here you go:
Traceback (most recent call last): File "D:\Research\FABRIC\fabric-scripts\hello_fabric.py", line 37, in slice.submit(wait=False) File "C:\Users\xwein\AppData\Local\Programs\Python\Python39\lib\site-packages\fabrictestbed_extensions\fablib\slice.py", line 1217, in submit self.update() File "C:\Users\xwein\AppData\Local\Programs\Python\Python39\lib\site-packages\fabrictestbed_extensions\fablib\slice.py", line 325, in update self.update_topology() File "C:\Users\xwein\AppData\Local\Programs\Python\Python39\lib\site-packages\fabrictestbed_extensions\fablib\slice.py", line 278, in update_topology raise Exception("Failed to get slice topology: {}, {}".format(return_status, new_topo)) Exception: Failed to get slice topology: Status.FAILURE, Error [Unable to read graph C:\Users\xwein\AppData\Local\Temp\tmpw2z0kyuu-graphml] importing graph
July 15, 2022 at 10:40 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2320I don’t think so. I get a different error. I made sure that I didn’t have an opened slice called MySlice, then when I ran it I got this:
(base) fabric@jupyter-xweintra-40purdue-2eedu:~/work$ python hello.py
Name CPUs Cores RAM (G) Disk (G) Basic (100 Gbps NIC) ConnectX-6 (100 Gbps x2 NIC) ConnectX-5 (25 Gbps x2 NIC) P4510 (NVMe 1TB) Tesla T4 (GPU) RTX6000 (GPU)
—— —— ——- ——— ————- ———————- —————————— —————————– —————— —————- —————
MICH 6 190/192 1530/1536 60590/60600 381/381 0/2 2/2 10/10 2/2 3/3
UTAH 10 320/320 2560/2560 116400/116400 635/635 2/2 4/4 16/16 4/4 5/5
TACC 10 238/320 2328/2560 115590/116400 632/635 2/2 4/4 16/16 4/4 6/6
WASH 6 188/192 1520/1536 60580/60600 379/381 2/2 2/2 10/10 2/2 3/3
NCSA 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
DALL 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
MAX 10 290/320 2452/2560 116190/116400 619/635 1/2 4/4 16/16 4/4 6/6
MASS 4 120/128 992/1024 55700/55800 254/254 1/2 0/0 6/6 0/0 3/3
SALT 6 184/192 1504/1536 60500/60600 380/381 2/2 2/2 10/10 2/2 3/3
STAR 12 368/384 3008/3072 121060/121200 757/762 2/2 6/6 20/20 6/6 4/6
Running post boot config … Exception: node.execute: Management IP Invalid: None
———– ————————————
Slice Name MySlice
Slice ID c26d5e3b-6e81-48f1-b12d-f68a6fbc1ea6
Slice State Configuring
Lease End 2022-07-16 15:22:29 +0000
———– ————————————
—————– ———————————————————————————————-
ID
Name Node1
Cores
RAM
Disk
Image default_rocky_8
Image Type qcow2
Host
Site NCSA
Management IP
Reservation State
Error Message
SSH Command ssh -i /home/fabric/.ssh/id_rsa -J xweintra_0000014567@bastion-1.fabric-testbed.net rocky@None
—————– ———————————————————————————————-
Exception: node.execute: Management IP Invalid: None
Exception: Failed to delete slice: Status.FAILURE, (500)
Reason: INTERNAL SERVER ERROR
HTTP response headers: HTTPHeaderDict({‘Server’: ‘nginx/1.21.6’, ‘Date’: ‘Fri, 15 Jul 2022 15:22:31 GMT’, ‘Content-Type’: ‘text/html; charset=utf-8’, ‘Content-Length’: ‘100’, ‘Connection’: ‘keep-alive’, ‘Access-Control-Allow-Credentials’: ‘true’, ‘Access-Control-Allow-Headers’: ‘DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range’, ‘Access-Control-Allow-Methods’: ‘GET, POST, PUT, DELETE, OPTIONS’, ‘Access-Control-Allow-Origin’: ‘*’, ‘Access-Control-Expose-Headers’: ‘Content-Length, Content-Range, X-Error’, ‘X-Error’: ‘Unable to delete Slice# c26d5e3b-6e81-48f1-b12d-f68a6fbc1ea6 that is not yet stable, try again later’})
HTTP response body: Unable to delete Slice# c26d5e3b-6e81-48f1-b12d-f68a6fbc1ea6 that is not yet stable, try again laterAs you can see, the error is “Management IP Invalid: None” just after running post boot config. Does it also work for you if you try to run the script from Jupyter? That’s where I ran it from.
I haven’t gotten fabric to work properly from my local computer yet, I get this error, which I have a feeling might be because I’m trying to run it from Windows? I have no clue:
Failed to get slice topology: Status.FAILURE, Error [Unable to read graph C:\Users\xwein\AppData\Local\Temp\tmprkqs64qf-graphml] importing graph
Side note, how do I do the quote segment with overflow? I don’t know how to use this markup very well.
July 15, 2022 at 10:00 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2311Let’s try this
July 15, 2022 at 9:29 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2299I don’t think I have permissions to upload files
- This reply was modified 2 years, 4 months ago by Xander Maddox Weintraut. Reason: Not allowed to upload .py files apparently. You'll have to resave this as a .py before you can run it
- This reply was modified 2 years, 4 months ago by Xander Maddox Weintraut. Reason: Can't upload files
July 14, 2022 at 3:11 pm in reply to: Management IP Invalid: None when running Python code in Jupyter #2286Right, that’s what I did.
First I made sure the “Hello, FABRIC notebook ran correctly.
Then I made a python script with all of the code cells copy/pasted directly back-to-back.
When I ran that script from the terminal, this was the output:
Name CPUs Cores RAM (G) Disk (G) Basic (100 Gbps NIC) ConnectX-6 (100 Gbps x2 NIC) ConnectX-5 (25 Gbps x2 NIC) P4510 (NVMe 1TB) Tesla T4 (GPU) RTX6000 (GPU)
—— —— ——- ——— ————- ———————- —————————— —————————– —————— —————- —————
MICH 6 188/192 1522/1536 60580/60600 381/381 0/2 2/2 10/10 2/2 3/3
UTAH 10 316/320 2544/2560 116380/116400 634/635 2/2 4/4 16/16 4/4 5/5
TACC 10 220/320 2256/2560 115390/116400 630/635 2/2 4/4 16/16 4/4 5/6
WASH 6 188/192 1520/1536 60580/60600 379/381 2/2 2/2 10/10 2/2 3/3
NCSA 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
DALL 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
MAX 10 254/320 2332/2560 115920/116400 594/635 0/2 2/4 16/16 4/4 6/6
MASS 4 118/128 984/1024 55690/55800 253/254 1/2 0/0 6/6 0/0 3/3
SALT 6 192/192 1536/1536 60600/60600 381/381 2/2 2/2 10/10 2/2 3/3
STAR 12 366/384 3000/3072 121090/121200 760/762 2/2 6/6 20/20 6/6 6/6
Running post boot config … Exception: node.execute: Management IP Invalid: None
———– ————————————
Slice Name MySlice
Slice ID b73f5090-e56a-474f-997a-16f6f7681952
Slice State Configuring
Lease End 2022-07-15 20:03:36 +0000
———– ————————————
—————– ———————————————————————————————-
ID
Name Node1
Cores
RAM
Disk
Image default_rocky_8
Image Type qcow2
Host
Site TACC
Management IP
Reservation State
Error Message
SSH Command ssh -i /home/fabric/.ssh/id_rsa -J xweintra_0000014567@bastion-1.fabric-testbed.net rocky@None
—————– ———————————————————————————————-
Exception: node.execute: Management IP Invalid: None
Exception: Failed to delete slice: Status.FAILURE, (500)
Reason: INTERNAL SERVER ERROR
HTTP response headers: HTTPHeaderDict({‘Server’: ‘nginx/1.21.6’, ‘Date’: ‘Thu, 14 Jul 2022 20:03:39 GMT’, ‘Content-Type’: ‘text/html; charset=utf-8’, ‘Content-Length’: ‘100’, ‘Connection’: ‘keep-alive’, ‘Access-Control-Allow-Credentials’: ‘true’, ‘Access-Control-Allow-Headers’: ‘DNT, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range’, ‘Access-Control-Allow-Methods’: ‘GET, POST, PUT, DELETE, OPTIONS’, ‘Access-Control-Allow-Origin’: ‘*’, ‘Access-Control-Expose-Headers’: ‘Content-Length, Content-Range, X-Error’, ‘X-Error’: ‘Unable to delete Slice# b73f5090-e56a-474f-997a-16f6f7681952 that is not yet stable, try again later’})
HTTP response body: Unable to delete Slice# b73f5090-e56a-474f-997a-16f6f7681952 that is not yet stable, try again laterThe Errors after the “Running post boot config…” line are because the submit() call throws an exception before it finishes, so the later calls are trying to act on a slice that is not stable yet.
The slice does eventually reach StableOK state, but it has no nodes.
July 14, 2022 at 8:42 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2281I am not having any issues running the notebook. Only with running .py scripts
July 13, 2022 at 9:03 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2272I’m in ULTIMA. I don’t think we need any more tags at the moment, but will let you know as the need arises. However, all of the Networking examples after “Create a Local Ethernet (Layer 2)” require the Slice.Multisite tag to run.
Regardless, I’m fairly sure that permissions tags aren’t the issue here.
July 12, 2022 at 9:53 am in reply to: Management IP Invalid: None when running Python code in Jupyter #2264The notebook runs just fine. The only notebooks that have failed have been ones that require project tags I don’t have.
July 11, 2022 at 4:24 pm in reply to: Management IP Invalid: None when running Python code in Jupyter #2262I just went through the list of sites, and was able to reproduce the issue with every site.
-
AuthorPosts