Home › Forums › FABRIC General Questions and Discussion › “Expired refresh token” when starting a new JupyterHub server after timeout
- This topic has 8 replies, 2 voices, and was last updated 1 year, 4 months ago by Ilya Baldin.
-
AuthorPosts
-
August 29, 2023 at 10:10 am #5171
Hi, my students and I occasionally encounter an “expired refresh token” error that requires force restarting the JupyterHub container (File > Hub Control Panel) to resolve. (Example, another example.) Recently I have been running some long-running experiments in FABRIC JupyterHub and I noticed that the following sequence reliably reproduces this error –
- Leave something running in JH for a long time (e.g. overnight).
- Get back and see “JH server no longer running” message.
- See message that says “Server unavailable or unreachable” with option to restart the server. Choose “Restart”.
- Go through the sequence of selecting server, logging in, etc. Get a new, running, JH server – I expect this to now be working normally, right?
- Run
fablib = fablib_manager()
and see this error –
--------------------------------------------------------------------------- SliceManagerException Traceback (most recent call last) Cell In[1], line 2 1 from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager ----> 2 fablib = fablib_manager() 3 conf = fablib.show_config() File /opt/conda/lib/python3.10/site-packages/fabrictestbed_extensions/fablib/fablib.py:796, in FablibManager.__init__(self, fabric_rc, credmgr_host, orchestrator_host, fabric_token, project_id, bastion_username, bastion_key_filename, log_level, log_file, data_dir, output, execute_thread_pool_size, offline) 793 self.facility_ports = None 795 if not offline: --> 796 self.build_slice_manager() File /opt/conda/lib/python3.10/site-packages/fabrictestbed_extensions/fablib/fablib.py:985, in FablibManager.build_slice_manager(self) 982 except Exception as e: 983 # logging.error(f"{e}") 984 logging.error(e, exc_info=True) --> 985 raise e 987 return self.slice_manager File /opt/conda/lib/python3.10/site-packages/fabrictestbed_extensions/fablib/fablib.py:971, in FablibManager.build_slice_manager(self) 961 try: 962 logging.info( 963 f"oc_host={self.orchestrator_host}," 964 f"cm_host={self.credmgr_host}," (...) 968 f"scope='all'" 969 ) --> 971 self.slice_manager = SliceManager( 972 oc_host=self.orchestrator_host, 973 cm_host=self.credmgr_host, 974 project_id=self.project_id, 975 token_location=self.fabric_token, 976 initialize=True, 977 scope="all", 978 ) 980 # Initialize the slice manager 981 self.slice_manager.initialize() File /opt/conda/lib/python3.10/site-packages/fabrictestbed/slice_manager/slice_manager.py:71, in SliceManager.__init__(self, cm_host, oc_host, token_location, project_id, scope, initialize) 67 raise SliceManagerException(f"Invalid initialization parameters: cm_proxy={self.cm_proxy}, " 68 f"oc_proxy={self.oc_proxy}, token_location={self.token_location}, " 69 f"project_id={self.project_id}") 70 if initialize: ---> 71 self.initialize() File /opt/conda/lib/python3.10/site-packages/fabrictestbed/slice_manager/slice_manager.py:80, in SliceManager.initialize(self) 74 """ 75 Initialize the Slice Manager object 76 - Load the tokens 77 - Refresh if needed 78 """ 79 if not self.initialized: ---> 80 self.__load_tokens() 81 self.initialized = True File /opt/conda/lib/python3.10/site-packages/fabrictestbed/slice_manager/slice_manager.py:127, in SliceManager.__load_tokens(self) 125 refresh_token = os.environ.get(Constants.CILOGON_REFRESH_TOKEN) 126 # Renew the tokens to ensure any project_id changes are taken into account --> 127 self.refresh_tokens(refresh_token=refresh_token) File /opt/conda/lib/python3.10/site-packages/fabrictestbed/slice_manager/slice_manager.py:164, in SliceManager.refresh_tokens(self, refresh_token) 162 self.tokens = tokens 163 return tokens.get(CredmgrProxy.ID_TOKEN, None), tokens.get(CredmgrProxy.REFRESH_TOKEN, None) --> 164 raise SliceManagerException(tokens.get(CredmgrProxy.ERROR)) SliceManagerException: b'{\n "errors": [\n {\n "details": "(invalid_grant) expired refresh token",\n "message": "Internal Server Error"\n }\n ],\n "size": 1,\n "status": 500,\n "type": "error"\n}'
and to resolve it, I need to stop and start the server again using File > Hub Control Panel.
Sharing this sequence in case it helps the team debug this type of error…
August 29, 2023 at 10:36 am #5172Fraida,
This is working ‘as designed’. Your regular token consists of two parts
– a short-lived API token we use to authorize user actions (4 hours)
– a longer-lived ‘refresh’ token which allows you to refresh the API token (24 hours)
Any time you refresh the token, you actually get back a tuple of new <API token, refresh token> the latter being good for 24 hours again. This is part of the security posture that allows to minimize damage in case of an API token leakage.
So in general if you run a fablib operation inside Jupyter Hub within a 24 hour period, your token tuple will be auto-refreshed so long as you are logged in.
We are in the process of developing a long-lived token feature reserved for those who want to run unattended experiments. It will be by special permission.
- This reply was modified 1 year, 4 months ago by Ilya Baldin.
August 29, 2023 at 10:41 am #5174Thanks, the part that I consider a “bug” is that when I log in again and start a new server in Step 4, it does not get a new “good” token. Is that behavior expected?
August 29, 2023 at 10:52 am #5175We will look into it – it is likely because the refresh token is expired by then. Generally restarting the container should cure it, but probably because by then your portal cookie has expired you cannot get a new token.
One way to deal with it is to go to the Credential Manager (from the Portal -> Experiments -> Manage Tokens -> Open Credential Manager). Generate a new token (tuple) and save it into your JH under /home/fabric/.tokens.json
August 30, 2023 at 11:24 am #5187Following up on this to share more info –
In Step 4 above, the first JH server I start after the timeout does have a new refresh token. the contents of
.tokens.json
show the token is created when I start the JH server:{ "refresh_token": "XXX", "created_at": "2023-08-30 15:13:26" }
but when I try to use fablib I get that token error, and no ID token.
After stopping the JH server from the Hub Control Panel and starting it again, then it gets another new refresh token –
.tokens.json
has –{ "refresh_token": "XXX", "created_at": "2023-08-30 15:16:47" }
and this one works. When attempting to use fablib, I get an ID token and no error.
Not clear why the first refresh token does not work, even though it is new.
August 30, 2023 at 11:37 am #5189So I’m surprised above your tokens.json only has the refresh token and no API token.
August 30, 2023 at 11:44 am #5190In general though – this is what we recommend – go to Hub Control and restart your server – that reinitializes everything properly. I’m not sure what machinations Jupyter Hub does when you simply do ‘Restart’ but it is different from going into the panel and doing Stop/Start (or sometimes just Start, because it detects the server has stopped).
August 30, 2023 at 11:53 am #5192Perhaps the instructions at https://learn.fabric-testbed.net/knowledge-base/obtaining-and-using-fabric-api-tokens/#using-tokens-within-the-jupyter-hub can be updated. Currently, it says to generate a new token and upload to JH when you get a “Refresh Token: (invalid grant)” error. But at least for this instance of that error it doesn’t work (and, my students say that solution also has not worked for them when they encounter this error) – whatever is not initialized properly fails even with a new token. It only works if the JH is stopped and restarted from the Hub Control Panel.
August 30, 2023 at 1:56 pm #5197I updated the section. For me at least regeneration has worked in the past, but Stop/Start is probably more reliable, although more disruptive since you have to pull up all the tabs with your notebooks again.
-
AuthorPosts
- You must be logged in to reply to this topic.