I’m trying to run clouderizer inside Google Colab but whenever I check for the GPU with Nvidia-smi I get this issue : “Failed to initialize NVML: Driver/library version mismatch”. I tried installing with and without the Cuda option, Purging and reinstalling the Nvidia driver before running the clouderizer and while running the clouderizer, running a new google colab project, running it on another google account but I still get the same issue. How can I solve it ?
I am suspecting some CUDA driver changes on Google’s host machine where Colab docker containers are run.
Can you try setting this env variable?
After this in same terminal session try nvidia-smi. Let me know if this works. I will then try to incorporate this in Clouderizer init itself on Colab.
That worked seamlessly, thank you very much !
I’ve had another issue with torch not recognizing CUDA (‘CUDA module initialization failed’) maybe it was something from my part. But i searched a bit and found a solution on the nvidia forums, it was fixed by adding some symlinks (and creating the mentioned file with a little modification) because it seems that the cuda installer didn’t create them propretly. And thanks to the startup script the process is now automated so everything works smoothly.
Again, thank you very much !
Can you post the changes you did for torch to work. It might help others looking for this issue?
First thing I added a symlink for the cuda library like so :
ln -s libcuda.so.1 libcuda.so
and then I added a config file “nvidia-lib64.conf” that was missing inside the folder “/etc/ld.so.conf.d” that contains the path to nvidia’s library by running this code (in my case it was the lib64):
echo /usr/lib64-nvidia/ > /etc/ld.so.conf.d/nvidia-lib64.conf
and I realoaded the cache and links to the libraries to detect the changes made :