Forum Replies Created
-
AuthorPosts
-
Yeah strangely I can connect to all of them right now, so it must be intermittent. I may change my fabric ssh config to use a specific bastion and see if that changes how things work. I’ll update the debug level to see if I can catch it in the act also.
-
This reply was modified 8 hours, 33 minutes ago by
Ilya Baldin.
My ip is 136.61.60.222
I do not have any IPv6 on my home network so it isn’t surprising. I’m using a DNS proxy, but even if I ask 8.8.8.8 directly I get:
$ dig @8.8.8.8 bastion.fabric-testbed.net ; <<>> DiG 9.10.6 <<>> @8.8.8.8 bastion.fabric-testbed.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15505 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;bastion.fabric-testbed.net. IN A ;; ANSWER SECTION: bastion.fabric-testbed.net. 3600 IN A 23.134.235.242 bastion.fabric-testbed.net. 3600 IN A 128.163.180.149 bastion.fabric-testbed.net. 3600 IN A 141.142.140.10 bastion.fabric-testbed.net. 3600 IN A 152.54.15.12
The log is full of the following messages
[21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:09:13] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:09:13] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:14:12] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:14:12] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect f ailed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')This is what I see (this is from my home on Google Fiber):
nslookup bastion.fabric-testbed.net
Server: 192.168.1.1
Address: 192.168.1.1#53Non-authoritative answer:
Name: bastion.fabric-testbed.net
Address: 128.163.180.149
Name: bastion.fabric-testbed.net
Address: 23.134.235.242
Name: bastion.fabric-testbed.net
Address: 141.142.140.10
Name: bastion.fabric-testbed.net
Address: 152.54.15.12I also noticed that some commands sent to VMs over SSH via my laptop-local notebook don’t happen or are very delayed, which I suspect is part of the same issue. Strangely all these are reachable via ssh.
Perfect, back in business, thank you!
Also – feature request – make the error message from the Credential Manager more informative 🙂
Assuming all your slices have their own instances of FABNetv4 network, simply adding a route pointed at the FABNetv4 gateway to the entire FABNetv4 subnet will let your nodes in different slices talk to each other:
full_subnet = ipaddress.IPv4Network('10.128.0.0/10') node.ip_route_add(subnet=full_subnet, gateway=site_net.get_gateway())# List available images (this step is optional)
available_images = fablib.get_image_names()print(f’Available images are: {available_images}’)
or in the portal create slice view (attached).
Just to add – I stood up the same slice on WASH with no problems. It does also show
BusMaster-so this may have been a red herring.The slice came up, thank you, Komal!
We are narrowing it down to PCI (or PCI passthrough) issue. From inside the VM we see this:
ubuntu@LB-node:~/esnet-smartnic-fw/sn-stack$ sudo lspci -Dd 10ee: -vv
0000:1f:00.0 Network controller: Xilinx Corporation Device 903f
Subsystem: Xilinx Corporation Device 0007
Physical Slot: 0-30
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
….
In the Control: line it says BusMaster- when it should instead be BusMaster+ (not sure what is visible from the server side)Perhaps a cold-reboot of the server is needed (?). Not sure.
-
This reply was modified 1 year ago by
Ilya Baldin.
Ah ok. I’ll try later. I thought it may mean some kind of resource exhaustion.
I think this example creates multiple slices to test MTU between sites
I have this artifact notebook that shows how to do (2), but again, for your example I wouldn’t worry about this
https://artifacts.fabric-testbed.net/artifacts/e1771f8d-ca7a-42fc-b6ec-542df83168a8
At least in my experience you are not likely to succeed getting a single slice this large in one shot. One of two things is a better approach:
- Build separate slices (if you are using FABNetv4 or FABNetv4Ext it is easy to get them all to communicate with each other)
- Build up a single slice by growing it via ‘modify’ (if a modify fails on a given site because it is out of resources, you move on to the next to get more nodes)
I am not sure (2) is worth the trouble for what you are describing.
-
This reply was modified 1 year, 1 month ago by
Ilya Baldin.
I think this is operator error, apologies.
-
This reply was modified 8 hours, 33 minutes ago by
-
AuthorPosts