Forum Replies Created
-
AuthorPosts
-
I created a Slice with 32 GB memory VMs at multiple sites, and I see that the Pin and numa_tune succeeded
at GPN, NCSA, EDC, RUTG, but failed at INDI, STAR, TACC
(requested memory 32768 exceeds available: 9323 type error)
So I suppose this should just be viewed as an issue of the available resources at the site at
the current time .Greg
ok I misunderstood the ‘memory fit’, I will try lower such as 32 GB, and make a new Slice.
Greg
Or perhaps eve 256 GB on the VM to ensure that 64 GB is free.
Greg
Hello Komal,
Yes I thought I had 128 GB memory but indeed it is only 64 GB. Would it be more likely to succeed
if the VMs has 128 GB memory ?Greg
ok I originally misunderstood the slice.update() comments ;
Assuming that the “make_ip_publicly_routable” was actually successful, I’ve executed the slice.update()
and proceeded to add routes for the node/network, and I will test that external access thru IPv4Ext is working,
As such, the exception that occurs above is mostly a distraction then.I do not see that adding
slice.update()
within this script will affect the issue. Though the slice.update() will change ModifyOk to StableOK, the following resubmit seems to take actions to take things back to the current state (“Unable to modify Slice# ea1653aa-e881-49a7-b917-6b4f6729493a that is not yet stable, try again later”).
The second script that runs against the existing Slice in StableOK is (just cutting lines from the example)
import json import os import time import traceback from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager try: fablib = fablib_manager() fablib.show_config(); except Exception as e: print(f"Exception: {e}") slice_name = 'MySliceAug14D' network1_name='net1' try: slice = fablib.get_slice(name=slice_name) network1 = slice.get_network(name=network1_name) network1_available_ips = network1.get_available_ips() # Enable Public IPv6 make_ip_publicly_routable network1.make_ip_publicly_routable(ipv4=[str(network1_available_ips[0])]) slice.submit() except Exception as e: print(f"Exception: {e}") traceback.print_exc()
With this set
> pip3 list | grep fabric
fabric-credmgr-client 1.5.1
fabric_fim 1.5.4
fabric_fss_utils 1.5.1
fabric-orchestrator-client 1.5.3
fabrictestbed 1.5.4
fabrictestbed-extensions 1.5.3I see the issue to occur. The Slice id a5019147-99ad-4619-bb26-d468d9cfd82e is still running.
I had not used containers so I will look into that.
I had “the latest of each”
~> pip3 list | grep fabric
fabric-credmgr-client 1.5.2
fabric_fim 1.5.5
fabric_fss_utils 1.5.1
fabric-orchestrator-client 1.5.5
fabrictestbed 1.5.6
fabrictestbed-extensions 1.5.3but there was a warning about inconsistency (with “fabrictestbed-extensions 1.5.3 requires fabrictestbed==1.5.4, but you have fabrictestbed 1.5.6”) and so I’ll backtrack and try again.
April 27, 2023 at 4:49 pm in reply to: Disk-to-Disk network transfer files between Fabric nodes #4149Yes I think there is a lot of flexibility with those ssh keys — FABRIC is doing some setup with the Slice key etc, so probably just best to not interfere with the Slice key (could lock oneself out of the nodes) . Just make a new ssh key pair somewhere, and then stage them into place onto the nodes, and add to ~/.ssh/authorized_keys .
1 user thanked author for this post.
April 25, 2023 at 10:51 am in reply to: Disk-to-Disk network transfer files between Fabric nodes #4141We in CMB-S4 project have been copying files between Fabric nodes using scp for testing.
The elements of that setup include
1) install scp (something ‘yum install openssh-clients’, though it depends on the platform)2) create an L3 network on each of the two nodes
3) add a route between these
4) setup ssh keys ( id_rsa id.rsa.pub ) on nodes, add entry to ~/.ssh/authorized_keys
We can provide more details on each of these steps if it helps. And some snippets of this setup will be presented in the presentation of Don Petravick et al. Wed Apr 26 in the meeting.
Beyond copying files with scp, we are also looking for a more performant way to copy files; some testing with bbcp had some brief success but it has not worked consistently, don’t think it supports IPv6, so we continue to look for performant file transfer approaches.
February 14, 2023 at 11:47 am in reply to: tokens downloaded from Credential Manager 02/14 report invalid_grant #3850Hello Komal,
That seemed to work ! I observed this matter today for the first time; is there any change / update
that we should be aware of? Overall I think it is working now; Thanks,Greg
Following up with latest test results. I did a very synthetic test, Started up a Slice
MySliceSep22A 5e995249-8f5b-45b4-ac11-6b968e9a3f66
with a single node at a site (MICH). No L2/L3 networks added, no additional software installs etc.
I was able to log in withssh -F ~/.ssh/fabric-ssh-config -i ${FABRIC_SLICE_PRIVATE_KEY_FILE} rocky@2607:f018:110:11:f816:3eff:fe9e:4eb4
for the first day. Original enddate was 2022-09-23 10:21:47 , extended enddate 2022-09-25 19:56:40 .
This node of the slice is now unreachable> ssh -F ~/.ssh/fabric-ssh-config -i ${FABRIC_SLICE_PRIVATE_KEY_FILE} rocky@2607:f018:110:11:f816:3eff:fe9e:4eb4
Warning: Permanently added ‘bastion-1.fabric-testbed.net,2600:2701:5000:a902::c’ (ECDSA) to the list of known hosts.
channel 0: open failed: connect failed: No route to host
stdio forwarding failed
kex_exchange_identification: Connection closed by remote host
Each day I generate a new token from the Fabric credential manager; hopefully this is not any issue of needing to keep an original token going for the lifetime of the Slice (not even sure if that is possible.)
I think that I am still seeing the original issue. I have another Slice MySliceSep18A ( 1ae8fdff-9514-4042-a9af-e826d0c4b646 ) that was created yesterday. The Slice was renewed and the Lease End now states 2022-09-23 16:23:41 .
It is now around the time that the Slice was originally intended to expire, and I see that I have lost the ability to ssh to the nodes. The nodes of this Slice have no Docker installation at all, from the beginning. Can this be examined in any way?Yes, I can delete / let expire this particular Slice, it was just a matter of understanding what had happened to apply that to future Slices. I will look into the issues with the Docker configuration. Thanks !
-
AuthorPosts