Home › Forums › FABRIC General Questions and Discussion › Tofino bf_switchd process gets killed.
- This topic has 3 replies, 3 voices, and was last updated 6 days, 21 hours ago by
Nishanth Shyamkumar.
-
AuthorPosts
-
May 2, 2025 at 12:23 pm #8458
I am using a variation of the example in https://github.com/fabric-testbed/jupyter-examples/blob/main/fabric_examples/fablib_api/fabric_p4_tofino_l2_network/fabric_p4_tofino_l2_network.ipynb
Everything works fine initially, however after a few cycles of sending packets in a batch to the switch, and reading updates using the bfshell, a SIGHUP is sent to the bf_switchd process and it is killed.
This happens only if the bf_switchd is started as a background process, via the switch.execute_thread() command.
Below is the stack trace for the above issue. On a return from a system call, it services the SIGHUP signal which causes it to exit.exit_signals+1 do_exit+336 do_group_exit+45 get_signal+2410 arch_do_signal_or_restart+62 exit_to_user_mode_prepare+405 syscall_exit_to_user_mode+23 do_syscall_64+103 entry_SYSCALL_64_after_hwframe+100 If I run the bf_switchd as a foreground process, by ssh’ing into the switch node and starting it, then the SIGHUP issue doesn’t happen.
So for now, just using the notebook alone, it’s behaviour is unstable as the bf_switchd can get killed and all program state is wiped out.
May 2, 2025 at 1:47 pm #8460Hi Nishanth,
Thank you for sharing this.
Please note that the current implementation of
execute_thread
maintains the process only for the duration of the specified timeout. As you correctly observed, for longer-running processes, directly accessing the switch via SSH allows you to manually launchswitchd
.We will work on enhancing
execute_thread
to better support this use case and will keep you informed once the update is available.Thanks,
Komal
May 2, 2025 at 1:53 pm #8461As said in the quoted notebook:
In this example, the switch daemon automatically terminates after 5 minutes, which may cause the ping to stop working beyond this duration. This is expected behavior.
The timeout is passed as a parameter to execute_thread:
("sleep infinity", r"bf-sde>", 300)
This tuple sends “sleep infinity” command to the switch and waits 300 seconds for “bf-sde>” prompt. Since the prompt never appears, the timeout arrives and shuts down the SSH connection.
In Unix, a disconnected SSH connection triggers SIGHUP, hence the process is killed.
May 2, 2025 at 2:52 pm #8463@yoursunny, Yes, the SIGHUP is sent when the user closes the terminal.
I was confused because I certainly wasn’t doing anything, so how was it getting generated. Now it makes sense, the node.execute_thread for this specific interactive mode, has an SSHClientInteraction which terminates if it doesn’t see the prompt after the timeout.@Komal, I think just informing the user that bf_switchd will exit after 300seconds / timeout seconds, by adding extra information to the comment
# Keep the session open to prevent exit
should be enough guidance for us to increase the timeout as required.
-
AuthorPosts
- You must be logged in to reply to this topic.