Home › Forums › FABRIC General Questions and Discussion › Issue with NVIDIA driver on basic_gpu_devices
- This topic has 20 replies, 3 voices, and was last updated 1 year, 6 months ago by Ilya Baldin.
-
AuthorPosts
-
May 31, 2023 at 4:28 pm #4403
There are no errors when I run the three install commands from the notebook, but the last command doesnt output anything in addition to the lsmod command still not outputing. Does this mean I’m missing a command for the install, or have an out of date command?
- This reply was modified 1 year, 7 months ago by Sarah Maxwell.
May 31, 2023 at 5:03 pm #4422I just re-ran the notebook you are using – I’m seeing the same thing. Something is not quite right with the installation process – there are no errors, but nvidia modules are not installed, I’ll investigate.
May 31, 2023 at 5:50 pm #4423I think something changed in the NVidia install. I was able to load NVidia drivers by doing
sudo dnf -y upgrade
to basically update everything to the latest. This was after I installed NVidia stuff. After I didsudo /sbin/reboot
the nvidia drivers were already loaded andnvidia-smi
worked:[rocky@2d7fd3c5-c433-4a2b-94c5-6b74d4ecc014-rtx ~]$ nvidia-smi Wed May 31 21:49:40 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Quadro RTX 6000 Off| 00000000:00:07.0 Off | 0 | | N/A 26C P0 23W / 250W| 0MiB / 23040MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
May 31, 2023 at 5:51 pm #4424I’ll try a clean slice with adding
sudo dnf -y upgrade
as part of that notebook.May 31, 2023 at 6:21 pm #4425Yep so you can modify your notebook as follows:
1. Before the GPU PCI Device add these two cells:
command = "sudo dnf upgrade -q -y" stdout, stderr = node.execute(command)
that’s to upgrade all packages and then next one to reboot (it’s exactly the same as the reboot below):
reboot = 'sudo reboot' print(reboot) node.execute(reboot) slice.wait_ssh(timeout=360,interval=10,progress=True) print("Now testing SSH abilites to reconnect...",end="") slice.update() slice.test_ssh() print("Reconnected!")
2. I changed the commands in the ‘Install Nvidia Drivers’ section (although I am not sure that’s needed – this is just the latest ‘official’ NVidia workflow):
commands = [ 'sudo dnf install -q -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm', 'sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo', 'sudo dnf clean expire-cache', 'sudo dnf module install -q -y nvidia-driver:latest-dkms', 'sudo dnf install -q -y cuda' ]
Then of course these commands need to be executed in order and a reboot. After that things should work.
I will patch up the notebooks so this will appear in the next release.
1 user thanked author for this post.
June 30, 2023 at 9:58 am #4618Just to close this thread, notebooks starting with jupyter examples 1.5.0 have an updated GPU notebook – a single one for all GPU types, that properly installs the drivers from the NVidia site and also deals with IPv6 sites.
-
AuthorPosts
- The topic ‘Issue with NVIDIA driver on basic_gpu_devices’ is closed to new replies.