I have an old system with a GTX 770M that's gathering dust. It's still supported by FAH (just barely), but FAH uses a version of CUDA that it does not support, so it has to fall back to OpenCL.
How much efficiency loss is there? And is it a loss in utilization efficiency (more CUDA cores will be idle due to bad resource allocation) or in power efficiency (more power is used to get the same amount of work done)?
CUDA vs OpenCL efficiency on Kepler
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 1661
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: CUDA vs OpenCL efficiency on Kepler
Asking very interesting question no one can answer anymore 
Re: CUDA vs OpenCL efficiency on Kepler
It looks like the first part of my question is answered in https://foldingathome.org/2020/09/28/fo ... a-support/ (probably 20-40% for my card). For the second part, I guess I'll find out soon when I measure power usage!
-
- Posts: 129
- Joined: Fri Apr 16, 2010 11:43 pm
- Hardware configuration: AMD 5800X3D Asus ROG Strix X570-E Gaming WiFi II bios 5031 G-Skill TridentZ Neo 3600mhz Asrock Tachi RX 7900XTX Corsair rm850x psu Asus PG32UQXR EK Elite 360 D-rgb aio Win 11pro/Kubuntu 2404.2 LTS Kernel 6.11.x HWE LowLatency UPS BX1500G
- Location: Galifrey
Re: CUDA vs OpenCL efficiency on Kepler
I remember the day of that announcement and the results. I'm hoping Hip provides a similar boost on the AMD side.
-
- Posts: 1661
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: CUDA vs OpenCL efficiency on Kepler
Early tests show AMD is gaining much more than Nvidia. But Nvidia opencl was stronger in the beginning
-
- Site Admin
- Posts: 8103
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: CUDA vs OpenCL efficiency on Kepler
Nvidia pretty much stuck at OpenCL 1.2 until they recently decided to support a higher version. AMD had gone to OpenCL 2+ but the F@h code was written for the 1.2 lowest common denominator. So any optimizations for 2 weren't used.
-
- Posts: 1661
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: CUDA vs OpenCL efficiency on Kepler
At the time of CUDA move, Nvidia opencl was much better than AMD.
We've done a comparison of Nvidia opencl>CUDA speed up Vs and hip speed up and AMD had much higher speed up
We've done a comparison of Nvidia opencl>CUDA speed up Vs and hip speed up and AMD had much higher speed up
-
- Posts: 129
- Joined: Fri Apr 16, 2010 11:43 pm
- Hardware configuration: AMD 5800X3D Asus ROG Strix X570-E Gaming WiFi II bios 5031 G-Skill TridentZ Neo 3600mhz Asrock Tachi RX 7900XTX Corsair rm850x psu Asus PG32UQXR EK Elite 360 D-rgb aio Win 11pro/Kubuntu 2404.2 LTS Kernel 6.11.x HWE LowLatency UPS BX1500G
- Location: Galifrey
Re: CUDA vs OpenCL efficiency on Kepler
Great news something to look forward too. Thank you!
I read at Phoronix that Mesa finally retired Clover with the latest 25.x .Rusticl is either ready or soon will be the default OpenCL for older cards such as in arisu's case. Is fahbench's core 21 still viable as a performance test?
As in cards that meet a certain speed when run with specific flags such as nan check enabled run and run for the max 10. Those cards could then be white listed.
Phoronix test suite provides fahbench so it would just be a matter of asking the users to run a test profile that should be easy enough to create.
If Core 21 is too old perhaps provide a custom work folder containing some newer cores.
I read at Phoronix that Mesa finally retired Clover with the latest 25.x .Rusticl is either ready or soon will be the default OpenCL for older cards such as in arisu's case. Is fahbench's core 21 still viable as a performance test?
As in cards that meet a certain speed when run with specific flags such as nan check enabled run and run for the max 10. Those cards could then be white listed.
Phoronix test suite provides fahbench so it would just be a matter of asking the users to run a test profile that should be easy enough to create.
If Core 21 is too old perhaps provide a custom work folder containing some newer cores.
Last edited by DarkFoss on Wed Apr 23, 2025 6:22 pm, edited 1 time in total.
Re: CUDA vs OpenCL efficiency on Kepler
My 12yo GTX 770M is doing better than I expected. It's getting 100-200k PPD, although that might just be because it's folding those alchemiscale WUs.
As expected, it's not able to use CUDA (11.4) so it falls back to OpenCL (470.256). Which leads me to finding another bug in the v8 client. Even when CUDA is disabled (lufah config cuda false), the client advertises to the server that it supports it. In GPUResource.cpp, it shouldn't just be checking has("cuda"), it should verify that getConfig().isCUDAEnabled() is true as well before it does sink.insert("cuda", *get("cuda")). But that's just an aside. I'll report it on GitHub eventually but it's not a very serious bug so long as the assignment servers don't distinguish between CUDA and OpenCL support when choosing what WUs or cores to send out.
I guess this laptop still has some life in it. And as I write this, P12600 R380 C0 G0 has 5 seconds left before being finished.
As expected, it's not able to use CUDA (11.4) so it falls back to OpenCL (470.256). Which leads me to finding another bug in the v8 client. Even when CUDA is disabled (lufah config cuda false), the client advertises to the server that it supports it. In GPUResource.cpp, it shouldn't just be checking has("cuda"), it should verify that getConfig().isCUDAEnabled() is true as well before it does sink.insert("cuda", *get("cuda")). But that's just an aside. I'll report it on GitHub eventually but it's not a very serious bug so long as the assignment servers don't distinguish between CUDA and OpenCL support when choosing what WUs or cores to send out.
I guess this laptop still has some life in it. And as I write this, P12600 R380 C0 G0 has 5 seconds left before being finished.