1. Ilya Baldin

Ilya Baldin

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 29 total)
  • Author
    Posts
  • in reply to: Perhaps one of the bastion hosts is out #9416
    Ilya Baldin
    Participant

      OK thanks for getting to the bottom of this. Yes, I’m using an EXT network with both IPv4 and IPv6 public configuration to talk to ESnet. I’m assuming you put enp7s0 back? I’ll work around this for now. This may have to do with the fact that Tom configured it to use the peering at STAR to talk to ESnet, if I’m not mistaken.

      in reply to: Perhaps one of the bastion hosts is out #9413
      Ilya Baldin
      Participant

        Fascinating. I do not have slices at other sites. I have one slice and all of it is in STAR and as far as I can tell all nodes have this problem.

        Slice ID is 16c49677-636b-4d3c-b71d-7fff7a75db09

        • This reply was modified 1 month, 1 week ago by Ilya Baldin.
        in reply to: Perhaps one of the bastion hosts is out #9409
        Ilya Baldin
        Participant

          So for one of the nodes I do something like that:

          ssh -i /path/to/slice_key -F ~/path/to/fabric_config ubuntu@2001:400:a100:3030:f816:3eff:fe07:665e

          and my fabric_config looks something like this:

          UserKnownHostsFile /dev/null
          StrictHostKeyChecking no
          ServerAliveInterval 120

          Host bastion-star-1.fabric-testbed.net
          User username
          ForwardAgent yes
          Hostname %h
          IdentityFile ~/.ssh/mykey
          IdentitiesOnly yes

          Host * !bastion-star-1.fabric-testbed.net
          ProxyJump username@bastion-star-1.fabric-testbed.net:22

          • This reply was modified 1 month, 1 week ago by Ilya Baldin.
          in reply to: Perhaps one of the bastion hosts is out #9407
          Ilya Baldin
          Participant

            That’s suspect. (a) I was not doing anything this morning and (b) if I configure to use bastion-star-1 as my bastion host I cannot login to my slice (still); it works if I configure e.g. bastion-renc-1

            in reply to: Perhaps one of the bastion hosts is out #9404
            Ilya Baldin
            Participant

              I experimentally determined (by manually specifying which bastion to use) that it is bastion-star-1 that is hanging for me.

              in reply to: Perhaps one of the bastion hosts is out #9400
              Ilya Baldin
              Participant

                Yeah strangely I can connect to all of them right now, so it must be intermittent. I may change my fabric ssh config to use a specific bastion and see if that changes how things work. I’ll update the debug level to see if I can catch it in the act also.

                • This reply was modified 1 month, 1 week ago by Ilya Baldin.
                in reply to: Perhaps one of the bastion hosts is out #9398
                Ilya Baldin
                Participant

                  My ip is 136.61.60.222

                  I do not have any IPv6 on my home network so it isn’t surprising. I’m using a DNS proxy, but even if I ask 8.8.8.8  directly I get:

                  $ dig @8.8.8.8 bastion.fabric-testbed.net 
                  
                  ; <<>> DiG 9.10.6 <<>> @8.8.8.8 bastion.fabric-testbed.net
                  ; (1 server found)
                  ;; global options: +cmd
                  ;; Got answer:
                  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15505
                  ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
                  
                  ;; OPT PSEUDOSECTION:
                  ; EDNS: version: 0, flags:; udp: 512
                  ;; QUESTION SECTION:
                  ;bastion.fabric-testbed.net. IN A
                  
                  ;; ANSWER SECTION:
                  bastion.fabric-testbed.net. 3600 IN A 23.134.235.242
                  bastion.fabric-testbed.net. 3600 IN A 128.163.180.149
                  bastion.fabric-testbed.net. 3600 IN A 141.142.140.10
                  bastion.fabric-testbed.net. 3600 IN A 152.54.15.12

                  The log is full of the following messages

                  [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:09:13] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [21:09:13] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:14:12] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [21:14:12] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect f
                  ailed
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
                  in reply to: Perhaps one of the bastion hosts is out #9393
                  Ilya Baldin
                  Participant

                    This is what I see (this is from my home on Google Fiber):

                    nslookup bastion.fabric-testbed.net
                    Server: 192.168.1.1
                    Address: 192.168.1.1#53

                    Non-authoritative answer:
                    Name: bastion.fabric-testbed.net
                    Address: 128.163.180.149
                    Name: bastion.fabric-testbed.net
                    Address: 23.134.235.242
                    Name: bastion.fabric-testbed.net
                    Address: 141.142.140.10
                    Name: bastion.fabric-testbed.net
                    Address: 152.54.15.12

                    I also noticed that some commands sent to VMs over SSH via my laptop-local notebook don’t happen or are very delayed, which I suspect is part of the same issue. Strangely all these are reachable via ssh.

                    in reply to: Slice gone and token issues #9344
                    Ilya Baldin
                    Participant

                      Perfect, back in business, thank you!

                       

                      Also – feature request – make the error message from the Credential Manager more informative 🙂

                      in reply to: Component.FPGA #8446
                      Ilya Baldin
                      Participant

                        Please check the bottom of this help page:

                         

                        Using Xilinx U280 FPGAs on FABRIC

                        in reply to: Inter-AS intra-AS routing test #8404
                        Ilya Baldin
                        Participant

                          Assuming all your slices have their own instances of FABNetv4 network, simply adding a route pointed at the FABNetv4 gateway to the entire FABNetv4 subnet will let your nodes in different slices talk to each other:

                          full_subnet = ipaddress.IPv4Network('10.128.0.0/10')
                          
                          node.ip_route_add(subnet=full_subnet, gateway=site_net.get_gateway())

                           

                          in reply to: List of OS supported #8319
                          Ilya Baldin
                          Participant

                            # List available images (this step is optional)
                            available_images = fablib.get_image_names()

                            print(f’Available images are: {available_images}’)

                            or in the portal create slice view (attached).

                             

                            in reply to: STAR probably needs reflashing to P4 workflow #8074
                            Ilya Baldin
                            Participant

                              Just to add – I stood up the same slice on WASH with no problems. It does also show BusMaster- so this may have been a red herring.

                              in reply to: WASH FPGA slice issue #8073
                              Ilya Baldin
                              Participant

                                The slice came up, thank you, Komal!

                                in reply to: STAR probably needs reflashing to P4 workflow #8051
                                Ilya Baldin
                                Participant

                                  We are narrowing it down to PCI (or PCI passthrough) issue. From inside the VM we see this:

                                  ubuntu@LB-node:~/esnet-smartnic-fw/sn-stack$ sudo lspci -Dd 10ee: -vv
                                  0000:1f:00.0 Network controller: Xilinx Corporation Device 903f
                                  Subsystem: Xilinx Corporation Device 0007
                                  Physical Slot: 0-30
                                  Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
                                  ….
                                  In the Control: line it says BusMaster- when it should instead be BusMaster+  (not sure what is visible from the server side)

                                  Perhaps a cold-reboot of the server is needed (?). Not sure.

                                  • This reply was modified 1 year, 1 month ago by Ilya Baldin.
                                Viewing 15 posts - 1 through 15 (of 29 total)