Forum Replies Created
-
AuthorPosts
-
Completed!
July 14, 2023 at 2:09 pm in reply to: Receiving multicast frames on a NIC_Basic interface on an L2Bridge #4676Fraida,
I’ve added this to the list of issues to look at w.r.t. other behaviors of SharedNICs. We have a ticket opened with NVidia/Mellanox trying to figure this out (the documentation seems to suggest it should work – we can’t figure out if it is a firmware bug or we are doing something wrong). Thanks for you patience.
PSC is back in service.
I believe this has been resolved now. We ran into some scaling issues with the Kubernetes cluster hosting the Hub. Thank you for reporting it.
Just to close this thread, notebooks starting with jupyter examples 1.5.0 have an updated GPU notebook – a single one for all GPU types, that properly installs the drivers from the NVidia site and also deals with IPv6 sites.
Please indicate which image you are using. Standard images in FABRIC do not require a root password to execute commands via ‘sudo’.
Dear experimenters,
We want to share an update regarding the maintenance previously mentioned. We wish to confirm that the maintenance will indeed occur between June 12 and June 16. We understand the importance of this process and its impact on your work, and we want to assure you that we have carefully planned the updates to minimize any disruption to your activities.
To ensure a smooth transition, we have divided the sites into two groups. The first group, primarily consisting of Phase 2 sites such as UCSD, GPN, FIU, CLEM, GATECH, LOSA, NEWY, ATLA, SEAT, INDI, and CERN, will be available by June 16. The second group, comprising Phase 1 sites, may require a bit more time. However, we are committed to reopening the facility on June 16, enabling you to resume your experiments with the available sites while we bring back the remaining sites.
Following the maintenance window, FABRIC will emerge with updated software, new capabilities and improved performance. These enhancements will provide you with an even more robust and performant research environment.
As a reminder, we request that you halt all experiments prior to June 12. Unfortunately, we will be unable to recover any experiments after the maintenance period. However, we want to reassure you that your SSH keys, data on persistent volumes (with some exceptions), and experiment notebooks will remain unaffected. We apologize for any inconvenience this may cause.
Thank you for your patience as we work diligently to optimize the FABRIC platform.
FABRIC Tutorials project membership cleanup has been completed.
June 1, 2023 at 12:52 pm in reply to: Building a network topology with l2bridge (switch) and router #4441Hello,
Assuming you have gone through a QuickStart Guide pinned at the top of this forum https://learn.fabric-testbed.net/forums/topic/quick-start-guide/ you can find additional notebooks in the Jupyter Hub that show examples of what is possible.
Hello,
That is not possible, at least not in the way you are trying. Each Jupyter Hub container is individual to the user and other users cannot access its contents. We are working on a capability for artifact sharing that will allow you to share your notebooks with others, expected to be deployed in beta form in a few weeks.
In the meantime you can:
- Try using git/GitHub to achieve what you are trying to do.
- Try using Google Colab (we are investigating this ourselves and do not have detailed instructions – part of the problem may be the compatibility of the version of Python it uses to what our libraries require)
Yep so you can modify your notebook as follows:
1. Before the GPU PCI Device add these two cells:
command = "sudo dnf upgrade -q -y" stdout, stderr = node.execute(command)
that’s to upgrade all packages and then next one to reboot (it’s exactly the same as the reboot below):
reboot = 'sudo reboot' print(reboot) node.execute(reboot) slice.wait_ssh(timeout=360,interval=10,progress=True) print("Now testing SSH abilites to reconnect...",end="") slice.update() slice.test_ssh() print("Reconnected!")
2. I changed the commands in the ‘Install Nvidia Drivers’ section (although I am not sure that’s needed – this is just the latest ‘official’ NVidia workflow):
commands = [ 'sudo dnf install -q -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm', 'sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo', 'sudo dnf clean expire-cache', 'sudo dnf module install -q -y nvidia-driver:latest-dkms', 'sudo dnf install -q -y cuda' ]
Then of course these commands need to be executed in order and a reboot. After that things should work.
I will patch up the notebooks so this will appear in the next release.
1 user thanked author for this post.
I’ll try a clean slice with adding
sudo dnf -y upgrade
as part of that notebook.I think something changed in the NVidia install. I was able to load NVidia drivers by doing
sudo dnf -y upgrade
to basically update everything to the latest. This was after I installed NVidia stuff. After I didsudo /sbin/reboot
the nvidia drivers were already loaded andnvidia-smi
worked:[rocky@2d7fd3c5-c433-4a2b-94c5-6b74d4ecc014-rtx ~]$ nvidia-smi Wed May 31 21:49:40 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Quadro RTX 6000 Off| 00000000:00:07.0 Off | 0 | | N/A 26C P0 23W / 250W| 0MiB / 23040MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
I just re-ran the notebook you are using – I’m seeing the same thing. Something is not quite right with the installation process – there are no errors, but nvidia modules are not installed, I’ll investigate.
Any header that is not valid will not pass. Without knowing more about what and how you are modifying I cannot answer. Valid packets should pass through without problems. You can use wireshark or similar to look at your packet traces to see if it flags anything between unmodified and modified frames.
-
AuthorPosts