Forum Replies Created
-
AuthorPosts
-
OK thanks for getting to the bottom of this. Yes, I’m using an EXT network with both IPv4 and IPv6 public configuration to talk to ESnet. I’m assuming you put enp7s0 back? I’ll work around this for now. This may have to do with the fact that Tom configured it to use the peering at STAR to talk to ESnet, if I’m not mistaken.
Fascinating. I do not have slices at other sites. I have one slice and all of it is in STAR and as far as I can tell all nodes have this problem.
Slice ID is 16c49677-636b-4d3c-b71d-7fff7a75db09
-
This reply was modified 1 month, 1 week ago by
Ilya Baldin.
So for one of the nodes I do something like that:
ssh -i /path/to/slice_key -F ~/path/to/fabric_config ubuntu@2001:400:a100:3030:f816:3eff:fe07:665e
and my fabric_config looks something like this:
UserKnownHostsFile /dev/null
StrictHostKeyChecking no
ServerAliveInterval 120Host bastion-star-1.fabric-testbed.net
User username
ForwardAgent yes
Hostname %h
IdentityFile ~/.ssh/mykey
IdentitiesOnly yesHost * !bastion-star-1.fabric-testbed.net
ProxyJump username@bastion-star-1.fabric-testbed.net:22-
This reply was modified 1 month, 1 week ago by
Ilya Baldin.
That’s suspect. (a) I was not doing anything this morning and (b) if I configure to use bastion-star-1 as my bastion host I cannot login to my slice (still); it works if I configure e.g. bastion-renc-1
I experimentally determined (by manually specifying which bastion to use) that it is bastion-star-1 that is hanging for me.
Yeah strangely I can connect to all of them right now, so it must be intermittent. I may change my fabric ssh config to use a specific bastion and see if that changes how things work. I’ll update the debug level to see if I can catch it in the act also.
-
This reply was modified 1 month, 1 week ago by
Ilya Baldin.
My ip is 136.61.60.222
I do not have any IPv6 on my home network so it isn’t surprising. I’m using a DNS proxy, but even if I ask 8.8.8.8 directly I get:
$ dig @8.8.8.8 bastion.fabric-testbed.net ; <<>> DiG 9.10.6 <<>> @8.8.8.8 bastion.fabric-testbed.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15505 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;bastion.fabric-testbed.net. IN A ;; ANSWER SECTION: bastion.fabric-testbed.net. 3600 IN A 23.134.235.242 bastion.fabric-testbed.net. 3600 IN A 128.163.180.149 bastion.fabric-testbed.net. 3600 IN A 141.142.140.10 bastion.fabric-testbed.net. 3600 IN A 152.54.15.12
The log is full of the following messages
[21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:09:13] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:09:13] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:14:12] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:14:12] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect f ailed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed') [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')This is what I see (this is from my home on Google Fiber):
nslookup bastion.fabric-testbed.net
Server: 192.168.1.1
Address: 192.168.1.1#53Non-authoritative answer:
Name: bastion.fabric-testbed.net
Address: 128.163.180.149
Name: bastion.fabric-testbed.net
Address: 23.134.235.242
Name: bastion.fabric-testbed.net
Address: 141.142.140.10
Name: bastion.fabric-testbed.net
Address: 152.54.15.12I also noticed that some commands sent to VMs over SSH via my laptop-local notebook don’t happen or are very delayed, which I suspect is part of the same issue. Strangely all these are reachable via ssh.
Perfect, back in business, thank you!
Also – feature request – make the error message from the Credential Manager more informative 🙂
Assuming all your slices have their own instances of FABNetv4 network, simply adding a route pointed at the FABNetv4 gateway to the entire FABNetv4 subnet will let your nodes in different slices talk to each other:
full_subnet = ipaddress.IPv4Network('10.128.0.0/10') node.ip_route_add(subnet=full_subnet, gateway=site_net.get_gateway())# List available images (this step is optional)
available_images = fablib.get_image_names()print(f’Available images are: {available_images}’)
or in the portal create slice view (attached).
Just to add – I stood up the same slice on WASH with no problems. It does also show
BusMaster-so this may have been a red herring.The slice came up, thank you, Komal!
We are narrowing it down to PCI (or PCI passthrough) issue. From inside the VM we see this:
ubuntu@LB-node:~/esnet-smartnic-fw/sn-stack$ sudo lspci -Dd 10ee: -vv
0000:1f:00.0 Network controller: Xilinx Corporation Device 903f
Subsystem: Xilinx Corporation Device 0007
Physical Slot: 0-30
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
….
In the Control: line it says BusMaster- when it should instead be BusMaster+ (not sure what is visible from the server side)Perhaps a cold-reboot of the server is needed (?). Not sure.
-
This reply was modified 1 year, 1 month ago by
Ilya Baldin.
-
This reply was modified 1 month, 1 week ago by
-
AuthorPosts