1. Gregory Daues

Gregory Daues

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 27 total)
  • Author
    Posts
  • in reply to: error when attempting to numa_tune #6062
    Gregory Daues
    Participant

      I created a Slice with 32 GB memory VMs at multiple sites, and I see that the Pin and numa_tune succeeded
      at  GPN, NCSA, EDC, RUTG, but failed at  INDI, STAR,  TACC
      (requested memory 32768 exceeds available: 9323  type error)
      So I suppose this should just be viewed as an issue of the available resources at the site at
      the current time .

      Greg

       

      in reply to: error when attempting to numa_tune #6001
      Gregory Daues
      Participant

        ok I misunderstood the ‘memory fit’, I will try lower such as 32 GB, and make a new Slice.

        Greg

         

        in reply to: error when attempting to numa_tune #5999
        Gregory Daues
        Participant

          Or perhaps eve 256 GB on the VM to ensure that 64 GB is free.

          Greg

           

          in reply to: error when attempting to numa_tune #5998
          Gregory Daues
          Participant

            Hello Komal,

            Yes I thought I had 128 GB memory but indeed it is only 64 GB.  Would it be more likely to succeed
            if the VMs has 128 GB memory ?

            Greg

            in reply to: observing problems with IPv4Ext and IPv6Ext #5033
            Gregory Daues
            Participant

              ok I originally misunderstood the slice.update() comments ;
              Assuming that the “make_ip_publicly_routable”  was actually successful,  I’ve executed the slice.update()
              and proceeded to add routes for the node/network,   and I will test that external access thru IPv4Ext is working,
              As such,  the exception that occurs above is mostly a distraction then.

              in reply to: observing problems with IPv4Ext and IPv6Ext #5027
              Gregory Daues
              Participant

                I do not see that adding

                slice.update()

                within this script will affect the issue. Though  the slice.update() will change ModifyOk to StableOK,  the following resubmit seems to take actions to take things back to the current state  (“Unable to modify Slice# ea1653aa-e881-49a7-b917-6b4f6729493a that is not yet stable, try again later”).

                 

                in reply to: observing problems with IPv4Ext and IPv6Ext #5026
                Gregory Daues
                Participant

                  The second script that runs against the existing Slice in StableOK is (just cutting lines from the example)

                  import json
                  import os
                  import time
                  import traceback
                  
                  from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager
                  try:
                  fablib = fablib_manager()
                  fablib.show_config();
                  except Exception as e:
                  print(f"Exception: {e}")
                  
                  slice_name = 'MySliceAug14D'
                  network1_name='net1'
                  try:
                  slice = fablib.get_slice(name=slice_name)
                  network1 = slice.get_network(name=network1_name)
                  network1_available_ips = network1.get_available_ips()
                  # Enable Public IPv6 make_ip_publicly_routable
                  network1.make_ip_publicly_routable(ipv4=[str(network1_available_ips[0])])
                  slice.submit()
                  
                  except Exception as e:
                  print(f"Exception: {e}")
                  traceback.print_exc()
                  in reply to: observing problems with IPv4Ext and IPv6Ext #5024
                  Gregory Daues
                  Participant

                    With this set

                    > pip3 list | grep fabric
                    fabric-credmgr-client 1.5.1
                    fabric_fim 1.5.4
                    fabric_fss_utils 1.5.1
                    fabric-orchestrator-client 1.5.3
                    fabrictestbed 1.5.4
                    fabrictestbed-extensions 1.5.3

                    I see the issue to occur.   The Slice id  a5019147-99ad-4619-bb26-d468d9cfd82e   is still running.

                    I had not used containers so I will look into that.

                    in reply to: observing problems with IPv4Ext and IPv6Ext #5019
                    Gregory Daues
                    Participant

                      I had “the latest of each”

                      ~> pip3 list | grep fabric
                      fabric-credmgr-client 1.5.2
                      fabric_fim 1.5.5
                      fabric_fss_utils 1.5.1
                      fabric-orchestrator-client 1.5.5
                      fabrictestbed 1.5.6
                      fabrictestbed-extensions 1.5.3

                      but there was a warning about inconsistency (with “fabrictestbed-extensions 1.5.3 requires fabrictestbed==1.5.4, but you have fabrictestbed 1.5.6”) and so I’ll backtrack and try again.

                       

                      in reply to: Disk-to-Disk network transfer files between Fabric nodes #4149
                      Gregory Daues
                      Participant

                        Yes I think there is a lot of flexibility with those ssh keys — FABRIC is doing some setup with the Slice key etc,  so probably just best to not interfere with the Slice key (could lock oneself out of the nodes) . Just make a new ssh key pair somewhere, and then stage them into place onto the nodes, and add to ~/.ssh/authorized_keys .

                        1 user thanked author for this post.
                        in reply to: Disk-to-Disk network transfer files between Fabric nodes #4141
                        Gregory Daues
                        Participant

                          We in CMB-S4 project  have been copying files between Fabric nodes using scp for testing.

                          The elements of that setup include
                          1) install scp (something ‘yum install openssh-clients’, though it depends on the platform)

                          2) create an L3 network on each of  the two nodes

                          3) add a route between these

                          4) setup ssh keys  ( id_rsa id.rsa.pub ) on nodes, add entry to ~/.ssh/authorized_keys

                          We can provide more details on each of these steps if it helps. And some snippets of this setup will be presented in the presentation of Don Petravick et al. Wed Apr 26 in the meeting.

                          Beyond copying files with scp, we are also looking for a more performant way to copy files; some testing with bbcp had some brief success but it has not worked consistently, don’t think it supports IPv6, so we continue to look for performant file transfer approaches.

                           

                           

                           

                           

                           

                          Gregory Daues
                          Participant

                            Hello Komal,

                            That seemed to work !   I observed this matter today for the first time;   is there any change / update
                            that we should be aware of?  Overall I think it is working now; Thanks,

                            Greg

                             

                            in reply to: renew slice did not fully work #3157
                            Gregory Daues
                            Participant

                              Following up with latest test results. I did a very synthetic test, Started up a Slice
                              MySliceSep22A  5e995249-8f5b-45b4-ac11-6b968e9a3f66
                              with a single node at a site (MICH).  No L2/L3 networks added, no additional software installs etc.
                              I was able to log in with

                              ssh -F ~/.ssh/fabric-ssh-config -i ${FABRIC_SLICE_PRIVATE_KEY_FILE}   rocky@2607:f018:110:11:f816:3eff:fe9e:4eb4

                              for the first day. Original enddate was  2022-09-23 10:21:47 , extended enddate 2022-09-25 19:56:40  .
                              This node of the slice is now unreachable

                              > ssh -F ~/.ssh/fabric-ssh-config -i ${FABRIC_SLICE_PRIVATE_KEY_FILE}   rocky@2607:f018:110:11:f816:3eff:fe9e:4eb4

                              Warning: Permanently added ‘bastion-1.fabric-testbed.net,2600:2701:5000:a902::c’ (ECDSA) to the list of known hosts.

                              channel 0: open failed: connect failed: No route to host

                              stdio forwarding failed

                              kex_exchange_identification: Connection closed by remote host

                              Each day I generate a new token from the Fabric credential manager; hopefully this is not any issue  of needing to keep an original token going for the lifetime of the Slice (not even sure if that is possible.)

                               

                              in reply to: renew slice did not fully work #3129
                              Gregory Daues
                              Participant

                                I think that I am still seeing the original issue. I have another Slice  MySliceSep18A  ( 1ae8fdff-9514-4042-a9af-e826d0c4b646 ) that was created yesterday.   The Slice was renewed  and the Lease End now states  2022-09-23 16:23:41 .
                                It is now around the time that the Slice was originally intended to expire,   and I see that I have lost the ability to ssh to the nodes.     The nodes of this Slice have no Docker installation at all, from the beginning.       Can this be examined in any way?

                                 

                                in reply to: renew slice did not fully work #3092
                                Gregory Daues
                                Participant

                                   

                                  Yes, I can delete / let expire this particular Slice, it was just a matter of understanding what had happened to apply that to future Slices.   I will look into the issues with the Docker configuration. Thanks !

                                Viewing 15 posts - 1 through 15 (of 27 total)