1. Ilya Baldin

Ilya Baldin

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 24 total)
  • Author
    Posts
  • in reply to: Perhaps one of the bastion hosts is out #9400
    Ilya Baldin
    Participant

      Yeah strangely I can connect to all of them right now, so it must be intermittent. I may change my fabric ssh config to use a specific bastion and see if that changes how things work. I’ll update the debug level to see if I can catch it in the act also.

      • This reply was modified 8 hours, 33 minutes ago by Ilya Baldin.
      in reply to: Perhaps one of the bastion hosts is out #9398
      Ilya Baldin
      Participant

        My ip is 136.61.60.222

        I do not have any IPv6 on my home network so it isn’t surprising. I’m using a DNS proxy, but even if I ask 8.8.8.8  directly I get:

        $ dig @8.8.8.8 bastion.fabric-testbed.net 
        
        ; <<>> DiG 9.10.6 <<>> @8.8.8.8 bastion.fabric-testbed.net
        ; (1 server found)
        ;; global options: +cmd
        ;; Got answer:
        ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15505
        ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
        
        ;; OPT PSEUDOSECTION:
        ; EDNS: version: 0, flags:; udp: 512
        ;; QUESTION SECTION:
        ;bastion.fabric-testbed.net. IN A
        
        ;; ANSWER SECTION:
        bastion.fabric-testbed.net. 3600 IN A 23.134.235.242
        bastion.fabric-testbed.net. 3600 IN A 128.163.180.149
        bastion.fabric-testbed.net. 3600 IN A 141.142.140.10
        bastion.fabric-testbed.net. 3600 IN A 152.54.15.12

        The log is full of the following messages

        [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:02:48] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:09:13] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [21:09:13] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:14:12] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [21:14:12] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:43:49] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [21:47:35] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect f
        ailed
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/paramiko/transport.py:1944} ERROR - Secsh channel 0 open FAILED: Connection timed out: Connect failed
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        [22:11:24] {/Users/baldin/venv/fabric/lib/python3.12/site-packages/fabrictestbed_extensions/fablib/node.py:1600} WARNING - Attempt 1 failed: ChannelException(2, 'Connect failed')
        in reply to: Perhaps one of the bastion hosts is out #9393
        Ilya Baldin
        Participant

          This is what I see (this is from my home on Google Fiber):

          nslookup bastion.fabric-testbed.net
          Server: 192.168.1.1
          Address: 192.168.1.1#53

          Non-authoritative answer:
          Name: bastion.fabric-testbed.net
          Address: 128.163.180.149
          Name: bastion.fabric-testbed.net
          Address: 23.134.235.242
          Name: bastion.fabric-testbed.net
          Address: 141.142.140.10
          Name: bastion.fabric-testbed.net
          Address: 152.54.15.12

          I also noticed that some commands sent to VMs over SSH via my laptop-local notebook don’t happen or are very delayed, which I suspect is part of the same issue. Strangely all these are reachable via ssh.

          in reply to: Slice gone and token issues #9344
          Ilya Baldin
          Participant

            Perfect, back in business, thank you!

             

            Also – feature request – make the error message from the Credential Manager more informative 🙂

            in reply to: Component.FPGA #8446
            Ilya Baldin
            Participant

              Please check the bottom of this help page:

               

              Using Xilinx U280 FPGAs on FABRIC

              in reply to: Inter-AS intra-AS routing test #8404
              Ilya Baldin
              Participant

                Assuming all your slices have their own instances of FABNetv4 network, simply adding a route pointed at the FABNetv4 gateway to the entire FABNetv4 subnet will let your nodes in different slices talk to each other:

                full_subnet = ipaddress.IPv4Network('10.128.0.0/10')
                
                node.ip_route_add(subnet=full_subnet, gateway=site_net.get_gateway())

                 

                in reply to: List of OS supported #8319
                Ilya Baldin
                Participant

                  # List available images (this step is optional)
                  available_images = fablib.get_image_names()

                  print(f’Available images are: {available_images}’)

                  or in the portal create slice view (attached).

                   

                  in reply to: STAR probably needs reflashing to P4 workflow #8074
                  Ilya Baldin
                  Participant

                    Just to add – I stood up the same slice on WASH with no problems. It does also show BusMaster- so this may have been a red herring.

                    in reply to: WASH FPGA slice issue #8073
                    Ilya Baldin
                    Participant

                      The slice came up, thank you, Komal!

                      in reply to: STAR probably needs reflashing to P4 workflow #8051
                      Ilya Baldin
                      Participant

                        We are narrowing it down to PCI (or PCI passthrough) issue. From inside the VM we see this:

                        ubuntu@LB-node:~/esnet-smartnic-fw/sn-stack$ sudo lspci -Dd 10ee: -vv
                        0000:1f:00.0 Network controller: Xilinx Corporation Device 903f
                        Subsystem: Xilinx Corporation Device 0007
                        Physical Slot: 0-30
                        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
                        ….
                        In the Control: line it says BusMaster- when it should instead be BusMaster+  (not sure what is visible from the server side)

                        Perhaps a cold-reboot of the server is needed (?). Not sure.

                        in reply to: Unable to create EDC slice #7954
                        Ilya Baldin
                        Participant

                          Ah ok. I’ll try later. I thought it may mean some kind of resource exhaustion.

                          in reply to: Inter-AS intra-AS routing test #7921
                          Ilya Baldin
                          Participant
                            in reply to: Inter-AS intra-AS routing test #7917
                            Ilya Baldin
                            Participant

                              I have this artifact notebook that shows how to do (2), but again, for your example I wouldn’t worry about this

                              https://artifacts.fabric-testbed.net/artifacts/e1771f8d-ca7a-42fc-b6ec-542df83168a8

                               

                              in reply to: Inter-AS intra-AS routing test #7915
                              Ilya Baldin
                              Participant

                                At least in my experience you are not likely to succeed getting a single slice this large in one shot. One of two things is a better approach:

                                1. Build separate slices (if you are using FABNetv4 or FABNetv4Ext it is easy to get them all to communicate with each other)
                                2. Build up a single slice by growing it via ‘modify’ (if a modify fails on a given site because it is out of resources, you move on to the next to get more nodes)

                                I am not sure (2) is worth the trouble for what you are describing.

                                • This reply was modified 1 year, 1 month ago by Ilya Baldin.
                                in reply to: FABNetv4 packet losses from STAR #7913
                                Ilya Baldin
                                Participant

                                  I think this is operator error, apologies.

                                Viewing 15 posts - 1 through 15 (of 24 total)