CUDA_ERROR_LAUNCH_FAILED

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Tuna_Ertemalp
Posts: 68
Joined: Sun Mar 22, 2020 8:54 pm
Hardware configuration: OS:Win10
GPUs: EVGA

CPU (cores), RAM, (GPU Core OC, Mem OC): GPU(s), Motherboard:

* AMD Ryzen 5 3600 (6C), 32G DDR4-2400, (+0,+0): 3090 FTW3 ULTRA, Gigabyte AB350M-D3H-CF
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+0,+0): 3090 XC3 ULTRA HYBRID, ASUS X99-M WS
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+100,+200): 2x 3090 FTW3 ULTRA, ASUS X99-E WS/USB 3.1
* Intel Core i7 970 (6C), 24G DDR3-1333, (+0,+0): 2x 3080 FTW3 ULTRA HYBRID, ASUS RAMPAGE III GENE
* Intel Core i7 5960X (8C), 16G DDR4-2400, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, ASRock X99 OC Formula/3.1
* AMD Ryzen 7 2700X (8C), 16G DDR4-2666, (+100,+200): 3090 FTW3 ULTRA HYBRID, ASRock B450M Pro4
* AMD Ryzen TR 1950X (16C), 32G DDR4-2133, (+100,+200): 3x 3090 XC3 ULTRA HYBRID, ASRock X399 Taichi
* Intel Core i7 5960X (8C), 64G DDR4-2133, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, 2x 1080 Ti SC2 HYBRID, MSI X99A XPOWER AC
Location: Seattle, WA, USA

Re: CUDA_ERROR_LAUNCH_FAILED

Post by Tuna_Ertemalp »

bruce wrote:All oprating systems have commands which allow task X to start only after task Y has been started. Those commands need to be part of the startup script for FAHClient.
Ummm... ??
Small things make quality, but quality is no small thing. (Adapted from Henry Royce talking about perfection, not quality)
8 Win10 PCs/22 slots: 8x CPUs (3xAMD+5xIntel=68C/122T), 14x NVIDIA EVGA GPUs (8x 3090, 2x 3080, 4x 1080Ti) [Details in my profile]
Image
Tuna_Ertemalp
Posts: 68
Joined: Sun Mar 22, 2020 8:54 pm
Hardware configuration: OS:Win10
GPUs: EVGA

CPU (cores), RAM, (GPU Core OC, Mem OC): GPU(s), Motherboard:

* AMD Ryzen 5 3600 (6C), 32G DDR4-2400, (+0,+0): 3090 FTW3 ULTRA, Gigabyte AB350M-D3H-CF
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+0,+0): 3090 XC3 ULTRA HYBRID, ASUS X99-M WS
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+100,+200): 2x 3090 FTW3 ULTRA, ASUS X99-E WS/USB 3.1
* Intel Core i7 970 (6C), 24G DDR3-1333, (+0,+0): 2x 3080 FTW3 ULTRA HYBRID, ASUS RAMPAGE III GENE
* Intel Core i7 5960X (8C), 16G DDR4-2400, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, ASRock X99 OC Formula/3.1
* AMD Ryzen 7 2700X (8C), 16G DDR4-2666, (+100,+200): 3090 FTW3 ULTRA HYBRID, ASRock B450M Pro4
* AMD Ryzen TR 1950X (16C), 32G DDR4-2133, (+100,+200): 3x 3090 XC3 ULTRA HYBRID, ASRock X399 Taichi
* Intel Core i7 5960X (8C), 64G DDR4-2133, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, 2x 1080 Ti SC2 HYBRID, MSI X99A XPOWER AC
Location: Seattle, WA, USA

Re: CUDA_ERROR_LAUNCH_FAILED

Post by Tuna_Ertemalp »

Tuna_Ertemalp wrote:I will go back to using -disable-cuda expert flag on this machine.
And, I just sliced & diced some data using HFM.NET, and my 1080Ti GPUs end up going from low 2.x million PPD to something like low-to-mid 1.x million PPD when CUDA is disabled. So, right now, I have five 1080Ti cards (out of my 9, i.e. 55% of my 1080Ti GPUs) that are running under -disable-cuda which is costing me about 2.5-5 million PPDs. :(
Small things make quality, but quality is no small thing. (Adapted from Henry Royce talking about perfection, not quality)
8 Win10 PCs/22 slots: 8x CPUs (3xAMD+5xIntel=68C/122T), 14x NVIDIA EVGA GPUs (8x 3090, 2x 3080, 4x 1080Ti) [Details in my profile]
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: CUDA_ERROR_LAUNCH_FAILED

Post by bruce »

Is this information up-to-date?

Iit seems like your GTX1080TI doesn't exiat.

Have you left at least one CPU free per GPU to support data transfers to/from it? IF that thread has to wait for resources it will definitely throttle your GPU.
:

Code: Select all

Hardware configuration:
    (OS) CPU (cores), Memory, GPU(s), Motherboard:

    (Win10) AMD Ryzen 5 3600 (6C), 32G DDR4-1200, Titan X, Gigabyte AB350M-D3H-CF
    (Win10) Intel Core i7 5960X (8C), 32G DDR4-2133, 2080 Ti Hybrid, ASUS X99-M WS
    (Win10) Intel Core i7 5960X (8C), 32G DDR4-2400, 2x 3090 FTW3 Ultra, ASUS X99-E WS/USB 3.1
    (Win10) Intel Core i7 970 (6C), 24G DDR3-1333, 1080Ti Hybrid, ASUS RAMPAGE III GENE
    (Win10) Intel Core i7 5960X (8C), 16G DDR4-2400, 1080Ti Hybrid, ASRock X99 OC Formula/3.1
    (Win10) Intel Core i7 2600 (4C), 12G DDR3-1333, Titan X Hybrid, ASUS P8P67
    (Win10) AMD Ryzen TR 1950X (16C), 32G DDR4-2133, 4x 1080Ti Hybrid, ASRock X399 Taichi
    (Win10) Intel Core i7 5960X (8C), 64G DDR4-2400, 3x 1080Ti Hybrid, MSI X99A XPOWER AC
    (Win7) Intel Core i7 2600 (4C), 16G DDR3-1333, GTX 580, Intel DP67BG
Tuna_Ertemalp
Posts: 68
Joined: Sun Mar 22, 2020 8:54 pm
Hardware configuration: OS:Win10
GPUs: EVGA

CPU (cores), RAM, (GPU Core OC, Mem OC): GPU(s), Motherboard:

* AMD Ryzen 5 3600 (6C), 32G DDR4-2400, (+0,+0): 3090 FTW3 ULTRA, Gigabyte AB350M-D3H-CF
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+0,+0): 3090 XC3 ULTRA HYBRID, ASUS X99-M WS
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+100,+200): 2x 3090 FTW3 ULTRA, ASUS X99-E WS/USB 3.1
* Intel Core i7 970 (6C), 24G DDR3-1333, (+0,+0): 2x 3080 FTW3 ULTRA HYBRID, ASUS RAMPAGE III GENE
* Intel Core i7 5960X (8C), 16G DDR4-2400, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, ASRock X99 OC Formula/3.1
* AMD Ryzen 7 2700X (8C), 16G DDR4-2666, (+100,+200): 3090 FTW3 ULTRA HYBRID, ASRock B450M Pro4
* AMD Ryzen TR 1950X (16C), 32G DDR4-2133, (+100,+200): 3x 3090 XC3 ULTRA HYBRID, ASRock X399 Taichi
* Intel Core i7 5960X (8C), 64G DDR4-2133, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, 2x 1080 Ti SC2 HYBRID, MSI X99A XPOWER AC
Location: Seattle, WA, USA

Re: CUDA_ERROR_LAUNCH_FAILED

Post by Tuna_Ertemalp »

bruce wrote:Is this information up-to-date?
Yes.
bruce wrote:Iit seems like your GTX1080TI doesn't exiat.
I don't understand what you mean by "it doesn't exist": "(Win10) AMD Ryzen TR 1950X (16C), 32G DDR4-2133, 4x 1080Ti Hybrid, ASRock X399 Taichi"

Maybe it is because my list doesn't use the "GTX"/"RTX" prefixes and you searched for "GTX1080TI" instead of "1080Ti". Given the model of the card, those two are basically fixed marketing names (580-1080 are GTX="Giga Texel Shader eXtreme", 2080...3090 are RTX="Ray Tracing Texel eXtreme"). :)
bruce wrote:Have you left at least one CPU free per GPU to support data transfers to/from it? IF that thread has to wait for resources it will definitely throttle your GPU.
Yes. Without me doing anything, by default, due to the CPUs=-1 setting in the CPU category under the SLOTS tab of the CONFIGURATION dialog, FAH automatically reduces the thread count from 32 to 28, reserving 4 threads to the 4 GPUs. By the way, that is what happens on all my hosts. I don't play around with those "-1" settings, and let FAH decide what to do with the hardware.

So, there is plenty of CPU power, RAM, SSD and hybrid cooling (with temps always <50C for each card) to go around this host.
Last edited by Tuna_Ertemalp on Sat Nov 14, 2020 11:16 pm, edited 1 time in total.
Small things make quality, but quality is no small thing. (Adapted from Henry Royce talking about perfection, not quality)
8 Win10 PCs/22 slots: 8x CPUs (3xAMD+5xIntel=68C/122T), 14x NVIDIA EVGA GPUs (8x 3090, 2x 3080, 4x 1080Ti) [Details in my profile]
Image
Tuna_Ertemalp
Posts: 68
Joined: Sun Mar 22, 2020 8:54 pm
Hardware configuration: OS:Win10
GPUs: EVGA

CPU (cores), RAM, (GPU Core OC, Mem OC): GPU(s), Motherboard:

* AMD Ryzen 5 3600 (6C), 32G DDR4-2400, (+0,+0): 3090 FTW3 ULTRA, Gigabyte AB350M-D3H-CF
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+0,+0): 3090 XC3 ULTRA HYBRID, ASUS X99-M WS
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+100,+200): 2x 3090 FTW3 ULTRA, ASUS X99-E WS/USB 3.1
* Intel Core i7 970 (6C), 24G DDR3-1333, (+0,+0): 2x 3080 FTW3 ULTRA HYBRID, ASUS RAMPAGE III GENE
* Intel Core i7 5960X (8C), 16G DDR4-2400, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, ASRock X99 OC Formula/3.1
* AMD Ryzen 7 2700X (8C), 16G DDR4-2666, (+100,+200): 3090 FTW3 ULTRA HYBRID, ASRock B450M Pro4
* AMD Ryzen TR 1950X (16C), 32G DDR4-2133, (+100,+200): 3x 3090 XC3 ULTRA HYBRID, ASRock X399 Taichi
* Intel Core i7 5960X (8C), 64G DDR4-2133, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, 2x 1080 Ti SC2 HYBRID, MSI X99A XPOWER AC
Location: Seattle, WA, USA

Re: CUDA_ERROR_LAUNCH_FAILED

Post by Tuna_Ertemalp »

Something worth mentioning: Yesterday, I took the time to do one ultimate Hail Mary move and completely reinstalled a fresh copy Win10/Pro on a different fresh empty SSD on this machine, booted from there, completely reinstalled everything on that SSD, from FAH to drivers to whatever else. Essentially, exact same hardware, but a fresh clean install of every piece of software and the data it downloads. So far it has run for 1 day and 2 hours, as of this post, without a problem. Of course, that doesn't prove anything, yet. I hope I didn't just jinx it. I am crossing my fingers that the problem was due to some Windows component triggering a TDR event by mistake, and by running a fresh copy of everything, maybe that erroneous behavior goes away. I guess we'll see if this runs for a week untouched without any crashes.

Yet, I hope you would agree, that this would be a drastic fix. The software should be able to deal with such errors without being stuck in the UI with a dialog waiting for a human intervention.
Small things make quality, but quality is no small thing. (Adapted from Henry Royce talking about perfection, not quality)
8 Win10 PCs/22 slots: 8x CPUs (3xAMD+5xIntel=68C/122T), 14x NVIDIA EVGA GPUs (8x 3090, 2x 3080, 4x 1080Ti) [Details in my profile]
Image
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: CUDA_ERROR_LAUNCH_FAILED

Post by PantherX »

Hopefully, the fresh installation has fixed it. I am curious as to how many other applications you installed since it could be an application that might be causing conflicts with F@H.

I do agree that the software should handle the error without user intervention... however, we need to figure out where the error occurs, before we can see who can fix it (Microsoft, Nvidia, F@H, something else).
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Tuna_Ertemalp
Posts: 68
Joined: Sun Mar 22, 2020 8:54 pm
Hardware configuration: OS:Win10
GPUs: EVGA

CPU (cores), RAM, (GPU Core OC, Mem OC): GPU(s), Motherboard:

* AMD Ryzen 5 3600 (6C), 32G DDR4-2400, (+0,+0): 3090 FTW3 ULTRA, Gigabyte AB350M-D3H-CF
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+0,+0): 3090 XC3 ULTRA HYBRID, ASUS X99-M WS
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+100,+200): 2x 3090 FTW3 ULTRA, ASUS X99-E WS/USB 3.1
* Intel Core i7 970 (6C), 24G DDR3-1333, (+0,+0): 2x 3080 FTW3 ULTRA HYBRID, ASUS RAMPAGE III GENE
* Intel Core i7 5960X (8C), 16G DDR4-2400, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, ASRock X99 OC Formula/3.1
* AMD Ryzen 7 2700X (8C), 16G DDR4-2666, (+100,+200): 3090 FTW3 ULTRA HYBRID, ASRock B450M Pro4
* AMD Ryzen TR 1950X (16C), 32G DDR4-2133, (+100,+200): 3x 3090 XC3 ULTRA HYBRID, ASRock X399 Taichi
* Intel Core i7 5960X (8C), 64G DDR4-2133, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, 2x 1080 Ti SC2 HYBRID, MSI X99A XPOWER AC
Location: Seattle, WA, USA

Re: CUDA_ERROR_LAUNCH_FAILED

Post by Tuna_Ertemalp »

PantherX wrote:Hopefully, the fresh installation has fixed it.
I hope so, too. So far, 2 days and 18 hours of uptime, and still ticking...
PantherX wrote:I am curious as to how many other applications you installed since it could be an application that might be causing conflicts with F@H.
I can tell you exactly since I keep an XLS to track what needs to be updated on each host as versions change. Out of my 9 hosts, 1 is also my regular office PC with other stuff installed, but the remaining 8 of them are basically dedicated to compute, running the same bare minimum of apps:

Code: Select all

Windows 10: 20H2 19042.630
Chrome: auto-updated to latest
BOINC: 7.16.11 (inactive)
FAHclient: 7.6.21 (active)
VirtualBox: 6.1.16 (inactive)
MSI AfterBurner: 4.6.2 (active, to watch temperatures)
GPUz: 2.35.0 (launched on demand)
CPUz: 1.94.0 (launched on depand)
NZXT CAM: 4.15.0 (launched on demand)
TeamViewer: 15.11.6.0 (for remote access)
EVGA Precision X1: 1.1.1 (launched on demand, used to update GPU BIOS/Firmware)
Nvidia Experience/Driver: 3.20.5.70/457.30
No need to be irked by VBox/BOINC; they are not doing any compute jobs since I started contributing to F@H.
PantherX wrote:I do agree that the software should handle the error without user intervention... however, we need to figure out where the error occurs, before we can see who can fix it (Microsoft, Nvidia, F@H, something else).
If the refreshed host continues to work, we might have lost the repro case.
Small things make quality, but quality is no small thing. (Adapted from Henry Royce talking about perfection, not quality)
8 Win10 PCs/22 slots: 8x CPUs (3xAMD+5xIntel=68C/122T), 14x NVIDIA EVGA GPUs (8x 3090, 2x 3080, 4x 1080Ti) [Details in my profile]
Image
Tuna_Ertemalp
Posts: 68
Joined: Sun Mar 22, 2020 8:54 pm
Hardware configuration: OS:Win10
GPUs: EVGA

CPU (cores), RAM, (GPU Core OC, Mem OC): GPU(s), Motherboard:

* AMD Ryzen 5 3600 (6C), 32G DDR4-2400, (+0,+0): 3090 FTW3 ULTRA, Gigabyte AB350M-D3H-CF
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+0,+0): 3090 XC3 ULTRA HYBRID, ASUS X99-M WS
* Intel Core i7 5960X (8C), 32G DDR4-2400, (+100,+200): 2x 3090 FTW3 ULTRA, ASUS X99-E WS/USB 3.1
* Intel Core i7 970 (6C), 24G DDR3-1333, (+0,+0): 2x 3080 FTW3 ULTRA HYBRID, ASUS RAMPAGE III GENE
* Intel Core i7 5960X (8C), 16G DDR4-2400, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, ASRock X99 OC Formula/3.1
* AMD Ryzen 7 2700X (8C), 16G DDR4-2666, (+100,+200): 3090 FTW3 ULTRA HYBRID, ASRock B450M Pro4
* AMD Ryzen TR 1950X (16C), 32G DDR4-2133, (+100,+200): 3x 3090 XC3 ULTRA HYBRID, ASRock X399 Taichi
* Intel Core i7 5960X (8C), 64G DDR4-2133, (+100,+0): 1080 Ti FTW3 + HYBRID KIT, 2x 1080 Ti SC2 HYBRID, MSI X99A XPOWER AC
Location: Seattle, WA, USA

Re: CUDA_ERROR_LAUNCH_FAILED

Post by Tuna_Ertemalp »

Tuna_Ertemalp wrote:If the refreshed host continues to work, we might have lost the repro case.
To report back progress: After a full 6d 23h 30m run, the refreshed quad 1080ti Hybrid host is still working without any CUDA errors.
Small things make quality, but quality is no small thing. (Adapted from Henry Royce talking about perfection, not quality)
8 Win10 PCs/22 slots: 8x CPUs (3xAMD+5xIntel=68C/122T), 14x NVIDIA EVGA GPUs (8x 3090, 2x 3080, 4x 1080Ti) [Details in my profile]
Image
Post Reply