1. “Expired refresh token” when starting a new JupyterHub server after timeout

“Expired refresh token” when starting a new JupyterHub server after timeout

Home Forums FABRIC General Questions and Discussion “Expired refresh token” when starting a new JupyterHub server after timeout

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • #5171
    Fraida Fund
    Participant

      Hi, my students and I occasionally encounter an “expired refresh token” error that requires force restarting the JupyterHub container (File > Hub Control Panel) to resolve. (Example, another example.) Recently I have been running some long-running experiments in FABRIC JupyterHub and I noticed  that the following sequence reliably reproduces this error –

      1. Leave something running in JH for a long time (e.g. overnight).
      2. Get back and see “JH server no longer running” message.
      3. See message that says “Server unavailable or unreachable” with option to restart the server. Choose “Restart”.
      4. Go through the sequence of selecting server, logging in, etc. Get a new, running, JH server – I expect this to now be working normally, right?
      5. Run fablib = fablib_manager() and see this error –
      ---------------------------------------------------------------------------
      SliceManagerException                     Traceback (most recent call last)
      Cell In[1], line 2
            1 from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager
      ----> 2 fablib = fablib_manager() 
            3 conf = fablib.show_config()
      
      File /opt/conda/lib/python3.10/site-packages/fabrictestbed_extensions/fablib/fablib.py:796, in FablibManager.__init__(self, fabric_rc, credmgr_host, orchestrator_host, fabric_token, project_id, bastion_username, bastion_key_filename, log_level, log_file, data_dir, output, execute_thread_pool_size, offline)
          793 self.facility_ports = None
          795 if not offline:
      --> 796     self.build_slice_manager()
      
      File /opt/conda/lib/python3.10/site-packages/fabrictestbed_extensions/fablib/fablib.py:985, in FablibManager.build_slice_manager(self)
          982 except Exception as e:
          983     # logging.error(f"{e}")
          984     logging.error(e, exc_info=True)
      --> 985     raise e
          987 return self.slice_manager
      
      File /opt/conda/lib/python3.10/site-packages/fabrictestbed_extensions/fablib/fablib.py:971, in FablibManager.build_slice_manager(self)
          961 try:
          962     logging.info(
          963         f"oc_host={self.orchestrator_host},"
          964         f"cm_host={self.credmgr_host},"
         (...)
          968         f"scope='all'"
          969     )
      --> 971     self.slice_manager = SliceManager(
          972         oc_host=self.orchestrator_host,
          973         cm_host=self.credmgr_host,
          974         project_id=self.project_id,
          975         token_location=self.fabric_token,
          976         initialize=True,
          977         scope="all",
          978     )
          980     # Initialize the slice manager
          981     self.slice_manager.initialize()
      
      File /opt/conda/lib/python3.10/site-packages/fabrictestbed/slice_manager/slice_manager.py:71, in SliceManager.__init__(self, cm_host, oc_host, token_location, project_id, scope, initialize)
           67     raise SliceManagerException(f"Invalid initialization parameters: cm_proxy={self.cm_proxy}, "
           68                                 f"oc_proxy={self.oc_proxy}, token_location={self.token_location}, "
           69                                 f"project_id={self.project_id}")
           70 if initialize:
      ---> 71     self.initialize()
      
      File /opt/conda/lib/python3.10/site-packages/fabrictestbed/slice_manager/slice_manager.py:80, in SliceManager.initialize(self)
           74 """
           75 Initialize the Slice Manager object
           76 - Load the tokens
           77 - Refresh if needed
           78 """
           79 if not self.initialized:
      ---> 80     self.__load_tokens()
           81     self.initialized = True
      
      File /opt/conda/lib/python3.10/site-packages/fabrictestbed/slice_manager/slice_manager.py:127, in SliceManager.__load_tokens(self)
          125     refresh_token = os.environ.get(Constants.CILOGON_REFRESH_TOKEN)
          126 # Renew the tokens to ensure any project_id changes are taken into account
      --> 127 self.refresh_tokens(refresh_token=refresh_token)
      
      File /opt/conda/lib/python3.10/site-packages/fabrictestbed/slice_manager/slice_manager.py:164, in SliceManager.refresh_tokens(self, refresh_token)
          162     self.tokens = tokens
          163     return tokens.get(CredmgrProxy.ID_TOKEN, None), tokens.get(CredmgrProxy.REFRESH_TOKEN, None)
      --> 164 raise SliceManagerException(tokens.get(CredmgrProxy.ERROR))
      
      SliceManagerException: b'{\n    "errors": [\n        {\n            "details": "(invalid_grant) expired refresh token",\n            "message": "Internal Server Error"\n        }\n    ],\n    "size": 1,\n    "status": 500,\n    "type": "error"\n}'
      

       

      and to resolve it, I need to stop and start the server again using File > Hub Control Panel.

      Sharing this sequence in case it helps the team debug this type of error…

      #5172
      Ilya Baldin
      Participant

        Fraida,

        This is working ‘as designed’. Your regular token consists of two parts

        – a short-lived API token we use to authorize user actions (4 hours)

        – a longer-lived ‘refresh’ token which allows you to refresh the API token (24 hours)

        Any time you refresh the token, you actually get back a tuple of new <API token, refresh token> the latter being good for 24 hours again. This is part of the security posture that allows to minimize damage in case of an API token leakage.

        So in general if you run a fablib operation inside Jupyter Hub within a 24 hour period, your token tuple will be auto-refreshed so long as you are logged in.

        We are in the process of developing a long-lived token feature reserved for those who want to run unattended experiments. It will be by special permission.

        • This reply was modified 1 year, 4 months ago by Ilya Baldin.
        #5174
        Fraida Fund
        Participant

          Thanks, the part that I consider a “bug” is that when I log in again and start a new server in Step 4, it does not get a new “good” token.  Is that behavior expected?

          #5175
          Ilya Baldin
          Participant

            We will look into it – it is likely because the refresh token is expired by then. Generally restarting the container should cure it, but probably because by then your portal cookie has expired you cannot get a new token.

            One way to deal with it is to go to the Credential Manager (from the Portal -> Experiments -> Manage Tokens -> Open Credential Manager). Generate a new token (tuple) and save it into your JH under /home/fabric/.tokens.json

            #5187
            Fraida Fund
            Participant

              Following up on this to share more info –

              In Step 4 above, the first JH server I start after the timeout does have a new refresh token. the contents of .tokens.json show the token is created when I start the JH server:

              {
                  "refresh_token": "XXX",
                  "created_at": "2023-08-30 15:13:26"
              }
              

              but when I try to use fablib I get that token error, and no ID token.

              After stopping the JH server from the Hub Control Panel and starting it again, then it gets another new refresh token – .tokens.json has –

              {
                  "refresh_token": "XXX",
                  "created_at": "2023-08-30 15:16:47"
              }
              

              and this one works. When attempting to use fablib, I get an ID token and no error.

              Not clear why the first refresh token does not work, even though it is new.

              #5189
              Ilya Baldin
              Participant

                So I’m surprised above your tokens.json only has the refresh token and no API token.

                #5190
                Ilya Baldin
                Participant

                  In general though – this is what we recommend – go to Hub Control and restart your server – that reinitializes everything properly. I’m not sure what machinations Jupyter Hub does when you simply do ‘Restart’ but it is different from going into the panel and doing Stop/Start (or sometimes just Start, because it detects the server has stopped).

                  #5192
                  Fraida Fund
                  Participant

                    Perhaps the instructions at https://learn.fabric-testbed.net/knowledge-base/obtaining-and-using-fabric-api-tokens/#using-tokens-within-the-jupyter-hub can be updated. Currently, it says to generate a new token and upload to JH when you get a “Refresh Token: (invalid grant)” error. But at least for this instance of that error it doesn’t work (and, my students say that solution also has not worked for them when they encounter this error) – whatever is not initialized properly fails even with a new token. It only works if the JH is stopped and restarted from the Hub Control Panel.

                    #5197
                    Ilya Baldin
                    Participant

                      I updated the section. For me at least regeneration has worked in the past, but Stop/Start is probably more reliable, although more disruptive since you have to pull up all the tabs with your notebooks again.

                    Viewing 9 posts - 1 through 9 (of 9 total)
                    • You must be logged in to reply to this topic.