1. yoursunny

yoursunny

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 48 total)
  • Author
    Posts
  • in reply to: failed lease update – all units failed in priming #6856
    yoursunny
    Participant

      On the IPv4 sites, number of VMs that can be provisioned is limited with the available IPv4 addresses in the subnet.

      Can “management IP address” show up as a resource on Fabric Portal – Resources page?
      This would allow the experimenter to avoid this limitation.

      in reply to: Fabric Testbed is open and ready for use! #6270
      yoursunny
      Participant

        Oversubscription support – EDC and EDUKY sites have been enabled to support CPU over subscription.

        I remember the CPU core capacity of STAR site was 384.
        It’s now 768.
        Did this site receive new hardware or is it oversubscription?

        in reply to: Not able to create slice #6236
        yoursunny
        Participant

          The entire FABRIC is red this week: https://portal.fabric-testbed.net/resources/all

          See notice here: https://learn.fabric-testbed.net/forums/topic/multi-day-fabric-maintenance-january-1-5-2024/

          Look at this part:

          On other sites, slices will continue running, and slivers will be accessible during the maintenance, however, we will place the testbed in maintenance mode between Jan 1-5, therefore it will not be possible to perform slice operations (create, extend, delete).

          in reply to: Multi-day FABRIC maintenance (January 1-5, 2024) #6188
          yoursunny
          Participant

            Will the FABNetv4Ext peering point (located in WASH) be affected?

            yoursunny
            Participant

              channel 0: open failed: connect failed: No route to host

              This error typically means the VM is turned off / deleted, or is otherwise unreachable via management netif.

              The fastest solution is simply deleting the experiment slice and creating a new one.

              in reply to: error when attempting to numa_tune #6040
              yoursunny
              Participant

                Upper limit for a VM connected with only one component would map to a single Numa Node.

                What happens if there are multiple components that are on distinct NUMA sockets?
                Is it possible to specify how much RAM to pin to each NUMA socket?

                Max limit on memory for a numa node is 64G so exceeding that limit would not work.

                If we pin a CPU core or certain amount of RAM onto a NUMA socket, does it prevent other VMs from using the same CPU core or RAM capacity?

                in reply to: Communication between nodes on same site #5434
                yoursunny
                Participant

                  Is there a way where I can convert my rspec into a FABRIC native object?

                  This would make a nice hackathon project. It should be feasible to make a converter that covers the topology, links, IP assignments, and startup scripts.

                  It’s even better if FABRIC platform can directly accept Request RSpec and return Manifest RSpec, so that existing tooling for Emulab can still work.

                  in reply to: Communication between nodes on same site #5431
                  yoursunny
                  Participant

                    Management IP cannot communicate with each other, to prevent abuse.

                    You should create FABNetv4 or FABNetv6 network service in each slice. They can all communicate with each other, regardless of whether the slides are on the same or different sites.

                    in reply to: FABNetv4/FABNetv6 gateway is not IP address #5425
                    yoursunny
                    Participant

                      NetworkService in the affected slice:

                          <node id="73" labels=":GraphNode:NetworkService">
                            <data key="d0">:GraphNode:NetworkService</data>
                            <data key="d1">["bbf6a0a7-8981-4613-b797-0960e7e8ea9d", "node+amst-data-sw:ip+192.168.42.3-ipv4-ns"]</data>
                            <data key="d21">L3</data>
                            <data key="d8">{"fablib_data": {"instantiated": "False", "mode": "manual"}}</data>
                            <data key="d22">{"ipv4": "10.145.7.1", "ipv4_subnet": "10.145.7.0/24"}</data>
                            <data key="d3">{"error_message": "", "reservation_id": "4c9b702b-1346-4fe5-b61e-f5cb7790e75f", "reservation_state": "Active"}</data>
                            <data key="d15">NetworkService</data>
                            <data key="d10">false</data>
                            <data key="d11">AMST</data>
                            <data key="d13">FABNetv4</data>
                            <data key="d14">8</data>
                            <data key="d9">net4</data>
                            <data key="d16">be2e2e72-5bdd-4301-98aa-bb9e3fe23a56</data>
                            <data key="d17">98452967-6246-4517-a030-7d76d7044d05</data>
                          </node>

                      NetworkService in a “normal” slice:

                          <node id="18" labels=":GraphNode:NetworkService">
                            <data key="d0">:GraphNode:NetworkService</data>
                            <data key="d4">{"fablib_data": {"instantiated": "True", "mode": "manual", "subnet": {"subnet": "10.138.131.0/24", "allocated_ips": ["10.138.131.1"], "gateway": "10.138.131.1"}}}</data>
                            <data key="d22">{"ipv4": "10.138.131.1", "ipv4_subnet": "10.138.131.0/24"}</data>
                            <data key="d2">["bbf6a0a7-8981-4613-b797-0960e7e8ea9d", "node+atla-data-sw:ip+192.168.33.3-ipv4-ns"]</data>
                            <data key="d3">{"error_message": "", "reservation_id": "17d077ce-b66e-48e0-aafb-a48033a02ff1", "reservation_state": "Active"}</data>
                            <data key="d10">LAN</data>
                            <data key="d11">false</data>
                            <data key="d12">ATLA</data>
                            <data key="d21">L3</data>
                            <data key="d13">FABNetv4</data>
                            <data key="d15">NetworkService</data>
                            <data key="d16">c0950a90-1312-4ac4-a7e9-a87aa52d8fda</data>
                            <data key="d17">553597ac-26ea-4729-97cd-f9f1b6a23ea9</data>
                            <data key="d14">5</data>
                          </node>

                      I think instantiated: False is the problem. network.get_gateway() would not return the IP in this case.

                      in reply to: FABNetv4/FABNetv6 gateway is not IP address #5423
                      yoursunny
                      Participant

                        GraphML file here: https://cdn1.frocdn.ch/oq8kXrhxQRTtAR4.graphml

                        I can definitely see the IP addresses in the GraphML file, but it’s not showing up in the list_node_and_networks.ipynb notebook or the net.get_gateway() function call.

                        I’m using the “default 08/22/2023” JupyterHub environment, with these package versions:

                        fabric@fall:work-10%$ pip list | grep fabric
                        fabric 3.2.2
                        fabric-credmgr-client 1.5.2
                        fabric_fim 1.5.5
                        fabric_fss_utils 1.5.1
                        fabric-orchestrator-client 1.5.5
                        fabrictestbed 1.5.6
                        fabrictestbed-extensions 1.5.4
                        fabrictestbed-mflib 1.0.3
                        
                        • This reply was modified 7 months, 2 weeks ago by yoursunny.
                        in reply to: STAR site power loss, connectivity losses #5352
                        yoursunny
                        Participant

                          FABNetv4Ext establishment is working, but I’m see connectivity issues to many destinations.

                          ubuntu@v4gateway:~$ mtr -4bwz -c4 --tcp -P 6363 hobo.cs.arizona.edu
                          Start: 2023-09-20T16:06:38+0000
                          HOST: v4gateway                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
                            1. AS398900 23.134.233.81                                                          0.0%     4    0.5   0.5   0.5   0.5   0.0
                            2. AS???    10.133.0.141                                                           0.0%     4   13.2  13.2  13.1  13.2   0.0
                            3. AS11537  hundredge-0-0-0-28.1000.core1.wash.net.internet2.edu (198.71.45.162)   0.0%     4   15.4  15.1  14.5  15.7   0.5
                            4. AS???    ???                                                                   100.0     4    0.0   0.0   0.0   0.0   0.0
                          
                          ubuntu@v4gateway:~$ mtr -4bwz -c4 --tcp -P 5201 ash.speedtest.clouvider.net
                          Start: 2023-09-20T16:07:49+0000
                          HOST: v4gateway              Loss%   Snt   Last   Avg  Best  Wrst StDev
                            1. AS398900 23.134.233.81   0.0%     4    0.5   0.5   0.5   0.5   0.0
                            2. AS???    10.133.0.141    0.0%     4   13.4  13.2  13.1  13.4   0.1
                            3. AS???    ???            100.0     4    0.0   0.0   0.0   0.0   0.0
                          

                          Maybe some routing adjustment is needed too?

                          in reply to: STAR site power loss, connectivity losses #5347
                          yoursunny
                          Participant

                            The STAR outage seems to be affecting the creation of FABNetv4Ext networks. It seems that the control software is trying to access the STAR switch and it times out. This occurs even if the node is in WASH site where the FABNetv4Ext peering connection exists.

                            Slice Exception: Slice Name: v4gateway@1695137544, Slice ID: f20f1cff-11b0-4db9-9ffb-5b265c3653b6: Slice Exception: Slice Name: v4gateway@1695137544, Slice ID: f20f1cff-11b0-4db9-9ffb-5b265c3653b6: Node: gateway, Site: PSC, State: Active,
                            Slice Exception: Slice Name: v4gateway@1695137544, Slice ID: f20f1cff-11b0-4db9-9ffb-5b265c3653b6: Slice Exception: Slice Name: v4gateway@1695137544, Slice ID: f20f1cff-11b0-4db9-9ffb-5b265c3653b6: Node: gateway, Site: PSC, State: Active,

                            failed lease update- all units failed priming: Exception during modify for unit: 5a8383f3-30aa-41d8-9874-46b61ebbe621 Playbook has failed tasks: NSO commit returned JSON-RPC error: type: rpc.method.failed, code: -32000, message: Method failed, data: message: Failed to connect to device star-data-sw: connection refused: NEDCOM CONNECT: The kexTimeout (20000 ms) expired. in new state, internal: jsonrpc_tx_commit357#all units failed priming: Exception during modify for unit: 5a8383f3-30aa-41d8-9874-46b61ebbe621 Playbook has failed tasks: NSO commit returned JSON-RPC error: type: rpc.method.failed, code: -32000, message: Method failed, data: message: Failed to connect to device star-data-sw: connection refused: NEDCOM CONNECT: The kexTimeout (20000 ms) expired. in new state, internal: jsonrpc_tx_commit357#

                            The control software should choose alternate paths to reach the peering port. The control software should skip switches in maintenance, and attempt to re-apply the configuration when the maintenance mode is lifted.

                            yoursunny
                            Participant

                              Instead of having users add hosts entry (which would require changes in every level including inside containers), can the DNS64 server be configured to return this IP?

                              • This reply was modified 8 months ago by yoursunny.
                              yoursunny
                              Participant

                                I’m seeing “Unable to establish SSL connection” error when trying to download from GitHub releases:

                                ubuntu@N0:~$ wget --timeout=10s -v https://github.com/TomWright/dasel/releases/download/v2.3.4/dasel_linux_amd64
                                --2023-09-06 17:24:18-- https://github.com/TomWright/dasel/releases/download/v2.3.4/dasel_linux_amd64
                                Resolving github.com (github.com)... 2600:2701:5000:5001::8c52:7104, 140.82.113.4
                                Connecting to github.com (github.com)|2600:2701:5000:5001::8c52:7104|:443... connected.
                                HTTP request sent, awaiting response... 302 Found
                                Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/297615696/dfe35302-5ee7-42cf-939d-345b67a2091d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230906%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230906T172418Z&X-Amz-Expires=300&X-Amz-Signature=cdb822adb0af2026b86b8fae886e28358b27bb48551182c5ee95e03a946b4353&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=297615696&response-content-disposition=attachment%3B%20filename%3Ddasel_linux_amd64&response-content-type=application%2Foctet-stream [following]
                                --2023-09-06 17:24:18-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/297615696/dfe35302-5ee7-42cf-939d-345b67a2091d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230906%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230906T172418Z&X-Amz-Expires=300&X-Amz-Signature=cdb822adb0af2026b86b8fae886e28358b27bb48551182c5ee95e03a946b4353&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=297615696&response-content-disposition=attachment%3B%20filename%3Ddasel_linux_amd64&response-content-type=application%2Foctet-stream
                                Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 2600:2701:5000:5001::b9c7:6d85, 2600:2701:5000:5001::b9c7:6e85, 2600:2701:5000:5001::b9c7:6c85, ...
                                Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|2600:2701:5000:5001::b9c7:6d85|:443... connected.
                                Unable to establish SSL connection.

                                Is the NAT64 gateway being blocked by GitHub releases download server?

                                tcpdump of the transaction: https://cdn1.frocdn.ch/JTeh94VJIxkXv6P.pcap

                                • This reply was modified 8 months ago by yoursunny.
                                yoursunny
                                Participant

                                  I found an unintended consequence of enabling NAT64:

                                  1. I sometimes want multiple slices to communicate with each other, while each slice can be re-deployed independently.
                                  2. To do so, I’m using FABNetv4 network service, paired with an external domain name that supports dynamic updates.
                                  3. When a “server” slice is re-deployed, it updates the domain name to point to its new FABNetv4 IP address.
                                  4. Previously, this works well: the “client” slice can find the “server” slice by resolving the domain name.
                                  5. Since NAT64 is deployed, the “client” slice would resolve both A and AAAA records on the domain name.
                                  6. If the “client” software tries to connect to the IPv6 address in the AAAA records, it cannot reach the FABNetv4 destination.

                                  My suggestion is to configure the DNS64 server so that it does not return AAAA records if the domain name resolves to an IPv4 address that is part of FABNetv4 or other RFC1918 address.

                                Viewing 15 posts - 1 through 15 (of 48 total)