GPU WUs taking much longer than expected

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
bana400
Posts: 5
Joined: Sun Feb 14, 2021 10:08 pm

GPU WUs taking much longer than expected

Post by bana400 »

Don't want to bump the thread so I'll edit it up here.
It's been a while since I've posted this, but ultimately this ended up having a boring diagnosis: a bad GPU! I RMA'd my GPU and plugged in the replacement I got from the company. Alas, everything works fine and I got through a GPU task no problem.


Hello, I've attached my logs at the bottom of the post.

I turned off CPU WUs to try figuring out what's going on with GPU performance - the GPU works fine in other programs (including video streaming with NVENC, GPU-accelerated Davinci Resolve exports), but while running F@H it seems to be performing very poorly.

The program is giving me an estimated PPD of 17402 and the ETA for this job (worth 34k credits) is 1.74 days - the previous job it worked on had a similar ETA and took about as long. This project is 17335 though I don't think there's anything wrong with the projects/WUs.

The account/passkey pair is working, it shows 36 WUs successfully completed at 100% success rate.

The GPU's clock seems to be going high, but the temperature is just 37 degC, which is basically idle temp.

Task Manager, for what it's worth, shows high usage only in the GPU "Copy" graph. 0% for Compute_0, Compute_1, CUDA

I left the task on from 23:15 to 23:45 and the situation is the same (GPU still only apparently being used in the Copy graph). Or are some jobs just expected to be in this phase, and not in the CUDA/compute phase, for that long?

Thanks for your time - please let me know if I'm missing anything.

Code: Select all

*********************** Log Started 2021-02-20T23:14:21Z ***********************
23:14:21:******************************* libFAH ********************************
23:14:21:           Date: Oct 20 2020
23:14:21:           Time: 13:36:55
23:14:21:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
23:14:21:         Branch: master
23:14:21:       Compiler: Visual C++ 2015
23:14:21:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
23:14:21:       Platform: win32 10
23:14:21:           Bits: 32
23:14:21:           Mode: Release
23:14:21:****************************** FAHClient ******************************
23:14:21:        Version: 7.6.21
23:14:21:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:14:21:      Copyright: 2020 foldingathome.org
23:14:21:       Homepage: https://foldingathome.org/
23:14:21:           Date: Oct 20 2020
23:14:21:           Time: 13:41:04
23:14:21:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
23:14:21:         Branch: master
23:14:21:       Compiler: Visual C++ 2015
23:14:21:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
23:14:21:       Platform: win32 10
23:14:21:           Bits: 32
23:14:21:           Mode: Release
23:14:21:           Args: --open-web-control
23:14:21:         Config: C:\ProgramData\FAHClient\config.xml
23:14:21:******************************** CBang ********************************
23:14:21:           Date: Oct 20 2020
23:14:21:           Time: 11:36:18
23:14:21:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
23:14:21:         Branch: master
23:14:21:       Compiler: Visual C++ 2015
23:14:21:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
23:14:21:       Platform: win32 10
23:14:21:           Bits: 32
23:14:21:           Mode: Release
23:14:21:******************************* System ********************************
23:14:21:            CPU: AMD Ryzen 9 5900X 12-Core Processor
23:14:21:         CPU ID: AuthenticAMD Family 25 Model 33 Stepping 0
23:14:21:           CPUs: 24
23:14:21:         Memory: 31.93GiB
23:14:21:    Free Memory: 24.88GiB
23:14:21:        Threads: WINDOWS_THREADS
23:14:21:     OS Version: 6.2
23:14:21:    Has Battery: false
23:14:21:     On Battery: false
23:14:21:     UTC Offset: -8
23:14:21:            PID: 4716
23:14:21:            CWD: C:\ProgramData\FAHClient
23:14:21:  Win32 Service: false
23:14:21:             OS: Windows 10 Enterprise
23:14:21:        OS Arch: AMD64
23:14:21:           GPUs: 1
23:14:21:          GPU 0: Bus:7 Slot:0 Func:0 NVIDIA:8 TU104 [GeForce RTX 2070 SUPER]
23:14:21:                 8218
23:14:21:  CUDA Device 0: Platform:0 Device:0 Bus:7 Slot:0 Compute:7.5 Driver:11.2
23:14:21:OpenCL Device 0: Platform:0 Device:0 Bus:7 Slot:0 Compute:1.2 Driver:461.40
23:14:21:***********************************************************************
23:14:21:<config>
23:14:21:  <!-- Network -->
23:14:21:  <proxy v=':8080'/>
23:14:21:
23:14:21:  <!-- Slot Control -->
23:14:21:  <power v='full'/>
23:14:21:
23:14:21:  <!-- User Information -->
23:14:21:  <passkey v='*****'/>
23:14:21:  <team v='------'/>
23:14:21:  <user v='------------'/>
23:14:21:
23:14:21:  <!-- Folding Slots -->
23:14:21:  <slot id='0' type='CPU'>
23:14:21:    <paused v='true'/>
23:14:21:  </slot>
23:14:21:  <slot id='1' type='GPU'>
23:14:21:    <paused v='true'/>
23:14:21:    <pci-bus v='7'/>
23:14:21:    <pci-slot v='0'/>
23:14:21:  </slot>
23:14:21:</config>
23:14:21:Trying to access database...
23:14:21:Successfully acquired database lock
23:14:21:FS00:Initialized folding slot 00: cpu:23
23:14:21:FS01:Initialized folding slot 01: gpu:7:0 TU104 [GeForce RTX 2070 SUPER] 8218
23:14:22:3:127.0.0.1:New Web session
23:14:57:FS01:Unpaused
23:14:57:WU00:FS01:Starting
23:14:57:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 4716 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
23:14:57:WU00:FS01:Started FahCore on PID 25640
23:14:57:WU00:FS01:Core PID:5212
23:14:57:WU00:FS01:FahCore 0x22 started
23:14:58:WU00:FS01:0x22:*********************** Log Started 2021-02-20T23:14:57Z ***********************
23:14:58:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
23:14:58:WU00:FS01:0x22:       Core: Core22
23:14:58:WU00:FS01:0x22:       Type: 0x22
23:14:58:WU00:FS01:0x22:    Version: 0.0.13
23:14:58:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:14:58:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
23:14:58:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
23:14:58:WU00:FS01:0x22:       Date: Sep 19 2020
23:14:58:WU00:FS01:0x22:       Time: 02:35:58
23:14:58:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
23:14:58:WU00:FS01:0x22:     Branch: core22-0.0.13
23:14:58:WU00:FS01:0x22:   Compiler: Visual C++ 2015
23:14:58:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
23:14:58:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
23:14:58:WU00:FS01:0x22:   Platform: win32 10
23:14:58:WU00:FS01:0x22:       Bits: 64
23:14:58:WU00:FS01:0x22:       Mode: Release
23:14:58:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
23:14:58:WU00:FS01:0x22:             <peastman@stanford.edu>
23:14:58:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 25640 -checkpoint 15
23:14:58:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
23:14:58:WU00:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100
23:14:58:WU00:FS01:0x22:************************************ libFAH ************************************
23:14:58:WU00:FS01:0x22:       Date: Sep 7 2020
23:14:58:WU00:FS01:0x22:       Time: 19:09:56
23:14:58:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
23:14:58:WU00:FS01:0x22:     Branch: HEAD
23:14:58:WU00:FS01:0x22:   Compiler: Visual C++ 2015
23:14:58:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
23:14:58:WU00:FS01:0x22:   Platform: win32 10
23:14:58:WU00:FS01:0x22:       Bits: 64
23:14:58:WU00:FS01:0x22:       Mode: Release
23:14:58:WU00:FS01:0x22:************************************ CBang *************************************
23:14:58:WU00:FS01:0x22:       Date: Sep 7 2020
23:14:58:WU00:FS01:0x22:       Time: 19:08:30
23:14:58:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
23:14:58:WU00:FS01:0x22:     Branch: HEAD
23:14:58:WU00:FS01:0x22:   Compiler: Visual C++ 2015
23:14:58:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
23:14:58:WU00:FS01:0x22:   Platform: win32 10
23:14:58:WU00:FS01:0x22:       Bits: 64
23:14:58:WU00:FS01:0x22:       Mode: Release
23:14:58:WU00:FS01:0x22:************************************ System ************************************
23:14:58:WU00:FS01:0x22:        CPU: AMD Ryzen 9 5900X 12-Core Processor
23:14:58:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 25 Model 33 Stepping 0
23:14:58:WU00:FS01:0x22:       CPUs: 24
23:14:58:WU00:FS01:0x22:     Memory: 31.93GiB
23:14:58:WU00:FS01:0x22:Free Memory: 24.79GiB
23:14:58:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
23:14:58:WU00:FS01:0x22: OS Version: 6.2
23:14:58:WU00:FS01:0x22:Has Battery: false
23:14:58:WU00:FS01:0x22: On Battery: false
23:14:58:WU00:FS01:0x22: UTC Offset: -8
23:14:58:WU00:FS01:0x22:        PID: 5212
23:14:58:WU00:FS01:0x22:        CWD: C:\ProgramData\FAHClient\work
23:14:58:WU00:FS01:0x22:************************************ OpenMM ************************************
23:14:58:WU00:FS01:0x22:   Revision: 189320d0
23:14:58:WU00:FS01:0x22:********************************************************************************
23:14:58:WU00:FS01:0x22:Project: 17335 (Run 17, Clone 616, Gen 10)
23:14:58:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
23:14:58:WU00:FS01:0x22:Digital signatures verified
23:14:58:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
23:14:58:WU00:FS01:0x22:Version 0.0.13
23:14:58:WU00:FS01:0x22:  Checkpoint write interval: 15000 steps (2%) [50 total]
23:14:58:WU00:FS01:0x22:  JSON viewer frame write interval: 7500 steps (1%) [100 total]
23:14:58:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (33%) [3 total]
23:14:58:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
23:14:58:WU00:FS01:0x22:There are 4 platforms available.
23:14:58:WU00:FS01:0x22:Platform 0: Reference
23:14:58:WU00:FS01:0x22:Platform 1: CPU
23:14:58:WU00:FS01:0x22:Platform 2: OpenCL
23:14:58:WU00:FS01:0x22:  opencl-device 0 specified
23:14:58:WU00:FS01:0x22:Platform 3: CUDA
23:14:58:WU00:FS01:0x22:  cuda-device 0 specified
23:15:07:WU00:FS01:0x22:Attempting to create CUDA context:
23:15:07:WU00:FS01:0x22:  Configuring platform CUDA
23:15:22:Removing old file 'configs/config-20210217-045908.xml'
23:15:22:Saving configuration to config.xml
23:15:22:<config>
23:15:22:  <!-- Network -->
23:15:22:  <proxy v=':8080'/>
23:15:22:
23:15:22:  <!-- Slot Control -->
23:15:22:  <power v='full'/>
23:15:22:
23:15:22:  <!-- User Information -->
23:15:22:  <passkey v='*****'/>
23:15:22:  <team v='------'/>
23:15:22:  <user v='------------'/>
23:15:22:
23:15:22:  <!-- Folding Slots -->
23:15:22:  <slot id='0' type='CPU'>
23:15:22:    <paused v='true'/>
23:15:22:  </slot>
23:15:22:  <slot id='1' type='GPU'>
23:15:22:    <pci-bus v='7'/>
23:15:22:    <pci-slot v='0'/>
23:15:22:  </slot>
23:15:22:</config>
Last edited by bana400 on Wed May 12, 2021 7:12 pm, edited 2 times in total.
bana400
Posts: 5
Joined: Sun Feb 14, 2021 10:08 pm

Re: GPU WUs taking much longer than expected

Post by bana400 »

Update: Much later, I found that this job printed this in the logs:

Code: Select all

... (previous log contents) ...
00:22:44:WU00:FS01:0x22:ERROR:exception: Error invoking kernel: CUDA_ERROR_LAUNCH_TIMEOUT (702)
00:22:44:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
00:22:44:WU00:FS01:0x22:Saving result file science.log
00:22:44:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
00:22:48:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
00:22:48:WARNING:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)
00:22:48:WU00:FS01:Starting
00:22:48:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 4716 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
00:22:48:WU00:FS01:Started FahCore on PID 16324
00:22:48:WU00:FS01:Core PID:28392
00:22:48:WU00:FS01:FahCore 0x22 started
00:22:49:WU00:FS01:0x22:*********************** Log Started 2021-02-21T00:22:48Z ***********************
00:22:49:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
00:22:49:WU00:FS01:0x22:       Core: Core22
00:22:49:WU00:FS01:0x22:       Type: 0x22
00:22:49:WU00:FS01:0x22:    Version: 0.0.13
00:22:49:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:22:49:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
00:22:49:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
00:22:49:WU00:FS01:0x22:       Date: Sep 19 2020
00:22:49:WU00:FS01:0x22:       Time: 02:35:58
00:22:49:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
00:22:49:WU00:FS01:0x22:     Branch: core22-0.0.13
00:22:49:WU00:FS01:0x22:   Compiler: Visual C++ 2015
00:22:49:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
00:22:49:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
00:22:49:WU00:FS01:0x22:   Platform: win32 10
00:22:49:WU00:FS01:0x22:       Bits: 64
00:22:49:WU00:FS01:0x22:       Mode: Release
00:22:49:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
00:22:49:WU00:FS01:0x22:             <peastman@stanford.edu>
00:22:49:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 16324 -checkpoint 15
00:22:49:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
00:22:49:WU00:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100
00:22:49:WU00:FS01:0x22:************************************ libFAH ************************************
00:22:49:WU00:FS01:0x22:       Date: Sep 7 2020
00:22:49:WU00:FS01:0x22:       Time: 19:09:56
00:22:49:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
00:22:49:WU00:FS01:0x22:     Branch: HEAD
00:22:49:WU00:FS01:0x22:   Compiler: Visual C++ 2015
00:22:49:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
00:22:49:WU00:FS01:0x22:   Platform: win32 10
00:22:49:WU00:FS01:0x22:       Bits: 64
00:22:49:WU00:FS01:0x22:       Mode: Release
00:22:49:WU00:FS01:0x22:************************************ CBang *************************************
00:22:49:WU00:FS01:0x22:       Date: Sep 7 2020
00:22:49:WU00:FS01:0x22:       Time: 19:08:30
00:22:49:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
00:22:49:WU00:FS01:0x22:     Branch: HEAD
00:22:49:WU00:FS01:0x22:   Compiler: Visual C++ 2015
00:22:49:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
00:22:49:WU00:FS01:0x22:   Platform: win32 10
00:22:49:WU00:FS01:0x22:       Bits: 64
00:22:49:WU00:FS01:0x22:       Mode: Release
00:22:49:WU00:FS01:0x22:************************************ System ************************************
00:22:49:WU00:FS01:0x22:        CPU: AMD Ryzen 9 5900X 12-Core Processor
00:22:49:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 25 Model 33 Stepping 0
00:22:49:WU00:FS01:0x22:       CPUs: 24
00:22:49:WU00:FS01:0x22:     Memory: 31.93GiB
00:22:49:WU00:FS01:0x22:Free Memory: 24.98GiB
00:22:49:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
00:22:49:WU00:FS01:0x22: OS Version: 6.2
00:22:49:WU00:FS01:0x22:Has Battery: false
00:22:49:WU00:FS01:0x22: On Battery: false
00:22:49:WU00:FS01:0x22: UTC Offset: -8
00:22:49:WU00:FS01:0x22:        PID: 28392
00:22:49:WU00:FS01:0x22:        CWD: C:\ProgramData\FAHClient\work
00:22:49:WU00:FS01:0x22:************************************ OpenMM ************************************
00:22:49:WU00:FS01:0x22:   Revision: 189320d0
00:22:49:WU00:FS01:0x22:********************************************************************************
00:22:49:WU00:FS01:0x22:Project: 17335 (Run 17, Clone 616, Gen 10)
00:22:49:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
00:22:49:WU00:FS01:0x22:Reading tar file core.xml
00:22:49:WU00:FS01:0x22:Reading tar file integrator.xml.bz2
00:22:49:WU00:FS01:0x22:Reading tar file state.xml.bz2
00:22:49:WU00:FS01:0x22:Reading tar file system.xml.bz2
00:22:49:WU00:FS01:0x22:Digital signatures verified
00:22:49:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:22:49:WU00:FS01:0x22:Version 0.0.13
00:22:49:WU00:FS01:0x22:  Checkpoint write interval: 15000 steps (2%) [50 total]
00:22:49:WU00:FS01:0x22:  JSON viewer frame write interval: 7500 steps (1%) [100 total]
00:22:49:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (33%) [3 total]
00:22:49:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
00:22:49:WU00:FS01:0x22:There are 4 platforms available.
00:22:49:WU00:FS01:0x22:Platform 0: Reference
00:22:49:WU00:FS01:0x22:Platform 1: CPU
00:22:49:WU00:FS01:0x22:Platform 2: OpenCL
00:22:49:WU00:FS01:0x22:  opencl-device 0 specified
00:22:49:WU00:FS01:0x22:Platform 3: CUDA
00:22:49:WU00:FS01:0x22:  cuda-device 0 specified
00:22:58:WU00:FS01:0x22:Attempting to create CUDA context:
00:22:58:WU00:FS01:0x22:  Configuring platform CUDA
00:23:00:WU00:FS01:0x22:  Using CUDA and gpu 0
00:23:00:WU00:FS01:0x22:Completed 0 out of 750000 steps (0%)
00:23:01:WU00:FS01:0x22:Checkpoint completed at step 0
00:33:03:WU00:FS01:0x22:Watchdog triggered, requesting soft shutdown down
00:43:03:WU00:FS01:0x22:Watchdog shutdown failed, hard shutdown triggered
00:43:03:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
00:43:03:WARNING:WU00:FS01:FahCore returned: WU_STALLED (127 = 0x7f)
00:43:03:WU00:FS01:Starting
00:43:03:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 4716 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
00:43:03:WU00:FS01:Started FahCore on PID 20260
00:43:03:WU00:FS01:Core PID:16352
00:43:03:WU00:FS01:FahCore 0x22 started
00:43:04:WU00:FS01:0x22:*********************** Log Started 2021-02-21T00:43:03Z ***********************
00:43:04:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
00:43:04:WU00:FS01:0x22:       Core: Core22
00:43:04:WU00:FS01:0x22:       Type: 0x22
00:43:04:WU00:FS01:0x22:    Version: 0.0.13
00:43:04:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:43:04:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
00:43:04:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
00:43:04:WU00:FS01:0x22:       Date: Sep 19 2020
00:43:04:WU00:FS01:0x22:       Time: 02:35:58
00:43:04:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
00:43:04:WU00:FS01:0x22:     Branch: core22-0.0.13
00:43:04:WU00:FS01:0x22:   Compiler: Visual C++ 2015
00:43:04:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
00:43:04:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
00:43:04:WU00:FS01:0x22:   Platform: win32 10
00:43:04:WU00:FS01:0x22:       Bits: 64
00:43:04:WU00:FS01:0x22:       Mode: Release
00:43:04:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
00:43:04:WU00:FS01:0x22:             <peastman@stanford.edu>
00:43:04:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 20260 -checkpoint 15
00:43:04:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
00:43:04:WU00:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100
00:43:04:WU00:FS01:0x22:************************************ libFAH ************************************
00:43:04:WU00:FS01:0x22:       Date: Sep 7 2020
00:43:04:WU00:FS01:0x22:       Time: 19:09:56
00:43:04:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
00:43:04:WU00:FS01:0x22:     Branch: HEAD
00:43:04:WU00:FS01:0x22:   Compiler: Visual C++ 2015
00:43:04:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
00:43:04:WU00:FS01:0x22:   Platform: win32 10
00:43:04:WU00:FS01:0x22:       Bits: 64
00:43:04:WU00:FS01:0x22:       Mode: Release
00:43:04:WU00:FS01:0x22:************************************ CBang *************************************
00:43:04:WU00:FS01:0x22:       Date: Sep 7 2020
00:43:04:WU00:FS01:0x22:       Time: 19:08:30
00:43:04:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
00:43:04:WU00:FS01:0x22:     Branch: HEAD
00:43:04:WU00:FS01:0x22:   Compiler: Visual C++ 2015
00:43:04:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
00:43:04:WU00:FS01:0x22:   Platform: win32 10
00:43:04:WU00:FS01:0x22:       Bits: 64
00:43:04:WU00:FS01:0x22:       Mode: Release
00:43:04:WU00:FS01:0x22:************************************ System ************************************
00:43:04:WU00:FS01:0x22:        CPU: AMD Ryzen 9 5900X 12-Core Processor
00:43:04:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 25 Model 33 Stepping 0
00:43:04:WU00:FS01:0x22:       CPUs: 24
00:43:04:WU00:FS01:0x22:     Memory: 31.93GiB
00:43:04:WU00:FS01:0x22:Free Memory: 24.74GiB
00:43:04:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
00:43:04:WU00:FS01:0x22: OS Version: 6.2
00:43:04:WU00:FS01:0x22:Has Battery: false
00:43:04:WU00:FS01:0x22: On Battery: false
00:43:04:WU00:FS01:0x22: UTC Offset: -8
00:43:04:WU00:FS01:0x22:        PID: 16352
00:43:04:WU00:FS01:0x22:        CWD: C:\ProgramData\FAHClient\work
00:43:04:WU00:FS01:0x22:************************************ OpenMM ************************************
00:43:04:WU00:FS01:0x22:   Revision: 189320d0
00:43:04:WU00:FS01:0x22:********************************************************************************
00:43:04:WU00:FS01:0x22:Project: 17335 (Run 17, Clone 616, Gen 10)
00:43:04:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
00:43:04:WU00:FS01:0x22:Digital signatures verified
00:43:04:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:43:04:WU00:FS01:0x22:Version 0.0.13
00:43:04:WU00:FS01:0x22:  Checkpoint write interval: 15000 steps (2%) [50 total]
00:43:04:WU00:FS01:0x22:  JSON viewer frame write interval: 7500 steps (1%) [100 total]
00:43:04:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (33%) [3 total]
00:43:04:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
00:43:04:WU00:FS01:0x22:There are 4 platforms available.
00:43:04:WU00:FS01:0x22:Platform 0: Reference
00:43:04:WU00:FS01:0x22:Platform 1: CPU
00:43:04:WU00:FS01:0x22:Platform 2: OpenCL
00:43:04:WU00:FS01:0x22:  opencl-device 0 specified
00:43:04:WU00:FS01:0x22:Platform 3: CUDA
00:43:04:WU00:FS01:0x22:  cuda-device 0 specified
00:43:13:WU00:FS01:0x22:Attempting to create CUDA context:
00:43:13:WU00:FS01:0x22:  Configuring platform CUDA
00:43:16:WU00:FS01:0x22:  Using CUDA and gpu 0
00:43:16:WU00:FS01:0x22:Completed 0 out of 750000 steps (0%)
00:43:16:WU00:FS01:0x22:Checkpoint completed at step 0

And the task manager shows the graph fully active in the CUDA graph. (The ETA was still 1.6+ days for this WU)

Edit: I let it go for another 15 minutes, it just went through the same failure loop seen above.
ajm
Posts: 754
Joined: Sat Mar 21, 2020 5:22 am
Location: Lucerne, Switzerland

Re: GPU WUs taking much longer than expected

Post by ajm »

You probably had to install the CUDA development toolkit and/or a CUDA runtime to use Davinci?
I read elsewhere on this forum that that this can lead to such problems. The exact cause was unknow at least last January. The only possibility was then to uninstall the toolkit. Maybe worth a try?
bana400
Posts: 5
Joined: Sun Feb 14, 2021 10:08 pm

Re: GPU WUs taking much longer than expected

Post by bana400 »

Thanks, I'll try looking into it. I did try using DDU to wipe nvidia drivers multiple times a few days ago and it doesn't look like I have the cuda toolkit installed as of now however.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU WUs taking much longer than expected

Post by bruce »

00:22:44:WU00:FS01:0x22:ERROR:exception: Error invoking kernel: CUDA_ERROR_LAUNCH_TIMEOUT (702)
00:22:44:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
00:22:44:WU00:FS01:0x22:Saving result file science.log
00:22:44:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
00:22:48:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
00:22:48:WARNING:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)
FAHCore_22 downloads the necessary components of CUDA. If you have installied the CUDA toolkit, apparently creates conflicts with what's in Core_22. I don't have any conflicts and don't need the toolkit since I'm not a CUDA developer.

If you uninstall that toolkit do the FAH errors go away? Do you or do you not need it for something else?
bana400
Posts: 5
Joined: Sun Feb 14, 2021 10:08 pm

Re: GPU WUs taking much longer than expected

Post by bana400 »

Alright, had some free time the other day. As far as I know, I no longer have the toolkit installed, I also deleted nvidia physx and frameview which had some relevance in other CUDA installation-related thread I found. But the System Information still shows I have the CUDA driver. Surely having the driver would be fine, just not the toolkit, right?

(It's just NVCUDA64.DLL, CUDA 11.2.135 driver)

Still having issues for GPU tasks.

Currently on the control panel (I got a new GPU WU in that time), the current GPU WU is at 99.99% with ETA "Unknown," which I read could be the result of something like the GPU driver crashing. Likely to be the case, because the logs show 0% progress on this GPU task.

Code: Select all

*********************** Log Started 2021-02-26T16:33:43Z ***********************
16:33:43:******************************* libFAH ********************************
16:33:43:           Date: Oct 20 2020
16:33:43:           Time: 13:36:55
16:33:43:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
16:33:43:         Branch: master
16:33:43:       Compiler: Visual C++ 2015
16:33:43:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
16:33:43:       Platform: win32 10
16:33:43:           Bits: 32
16:33:43:           Mode: Release
16:33:43:****************************** FAHClient ******************************
16:33:43:        Version: 7.6.21
16:33:43:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:33:43:      Copyright: 2020 foldingathome.org
16:33:43:       Homepage: https://foldingathome.org/
16:33:43:           Date: Oct 20 2020
16:33:43:           Time: 13:41:04
16:33:43:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
16:33:43:         Branch: master
16:33:43:       Compiler: Visual C++ 2015
16:33:43:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
16:33:43:       Platform: win32 10
16:33:43:           Bits: 32
16:33:43:           Mode: Release
16:33:43:           Args: --open-web-control
16:33:43:         Config: C:\ProgramData\FAHClient\config.xml
16:33:43:******************************** CBang ********************************
16:33:43:           Date: Oct 20 2020
16:33:43:           Time: 11:36:18
16:33:43:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
16:33:43:         Branch: master
16:33:43:       Compiler: Visual C++ 2015
16:33:43:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
16:33:43:       Platform: win32 10
16:33:43:           Bits: 32
16:33:43:           Mode: Release
16:33:43:******************************* System ********************************
16:33:43:            CPU: AMD Ryzen 9 5900X 12-Core Processor
16:33:43:         CPU ID: AuthenticAMD Family 25 Model 33 Stepping 0
16:33:43:           CPUs: 24
16:33:43:         Memory: 31.93GiB
16:33:43:    Free Memory: 25.02GiB
16:33:43:        Threads: WINDOWS_THREADS
16:33:43:     OS Version: 6.2
16:33:43:    Has Battery: false
16:33:43:     On Battery: false
16:33:43:     UTC Offset: -8
16:33:43:            PID: 17612
16:33:43:            CWD: C:\ProgramData\FAHClient
16:33:43:  Win32 Service: false
16:33:43:             OS: Windows 10 Enterprise
16:33:43:        OS Arch: AMD64
16:33:43:           GPUs: 1
16:33:43:          GPU 0: Bus:7 Slot:0 Func:0 NVIDIA:8 TU104 [GeForce RTX 2070 SUPER]
16:33:43:                 8218
16:33:43:  CUDA Device 0: Platform:0 Device:0 Bus:7 Slot:0 Compute:7.5 Driver:11.2
16:33:43:OpenCL Device 0: Platform:0 Device:0 Bus:7 Slot:0 Compute:1.2 Driver:461.40
16:33:43:***********************************************************************
16:33:43:<config>
16:33:43:  <!-- Network -->
16:33:43:  <proxy v=':8080'/>
16:33:43:
16:33:43:  <!-- Slot Control -->
16:33:43:  <power v='full'/>
16:33:43:
16:33:43:  <!-- User Information -->
16:33:43:  <passkey v='*****'/>
16:33:43:  <team v=''/>
16:33:43:  <user v=''/>
16:33:43:
16:33:43:  <!-- Folding Slots -->
16:33:43:  <slot id='0' type='CPU'>
16:33:43:    <paused v='true'/>
16:33:43:  </slot>
16:33:43:  <slot id='1' type='GPU'>
16:33:43:    <pci-bus v='7'/>
16:33:43:    <pci-slot v='0'/>
16:33:43:  </slot>
16:33:43:</config>
16:33:43:Trying to access database...
16:33:43:Successfully acquired database lock
16:33:43:FS00:Initialized folding slot 00: cpu:23
16:33:43:FS01:Initialized folding slot 01: gpu:7:0 TU104 [GeForce RTX 2070 SUPER] 8218
16:33:43:WU00:FS01:Starting
16:33:43:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 17612 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
16:33:43:WU00:FS01:Started FahCore on PID 11352
16:33:43:WU00:FS01:Core PID:7284
16:33:43:WU00:FS01:FahCore 0x22 started
16:33:43:WU00:FS01:0x22:*********************** Log Started 2021-02-26T16:33:43Z ***********************
16:33:43:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
16:33:43:WU00:FS01:0x22:       Core: Core22
16:33:43:WU00:FS01:0x22:       Type: 0x22
16:33:43:WU00:FS01:0x22:    Version: 0.0.13
16:33:43:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:33:43:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
16:33:43:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
16:33:43:WU00:FS01:0x22:       Date: Sep 19 2020
16:33:43:WU00:FS01:0x22:       Time: 02:35:58
16:33:43:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
16:33:43:WU00:FS01:0x22:     Branch: core22-0.0.13
16:33:43:WU00:FS01:0x22:   Compiler: Visual C++ 2015
16:33:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:33:43:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
16:33:43:WU00:FS01:0x22:   Platform: win32 10
16:33:43:WU00:FS01:0x22:       Bits: 64
16:33:43:WU00:FS01:0x22:       Mode: Release
16:33:43:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
16:33:43:WU00:FS01:0x22:             <peastman@stanford.edu>
16:33:43:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 11352 -checkpoint 15
16:33:43:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
16:33:43:WU00:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100
16:33:43:WU00:FS01:0x22:************************************ libFAH ************************************
16:33:43:WU00:FS01:0x22:       Date: Sep 7 2020
16:33:43:WU00:FS01:0x22:       Time: 19:09:56
16:33:43:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
16:33:43:WU00:FS01:0x22:     Branch: HEAD
16:33:43:WU00:FS01:0x22:   Compiler: Visual C++ 2015
16:33:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:33:43:WU00:FS01:0x22:   Platform: win32 10
16:33:43:WU00:FS01:0x22:       Bits: 64
16:33:43:WU00:FS01:0x22:       Mode: Release
16:33:43:WU00:FS01:0x22:************************************ CBang *************************************
16:33:43:WU00:FS01:0x22:       Date: Sep 7 2020
16:33:43:WU00:FS01:0x22:       Time: 19:08:30
16:33:43:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
16:33:43:WU00:FS01:0x22:     Branch: HEAD
16:33:43:WU00:FS01:0x22:   Compiler: Visual C++ 2015
16:33:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:33:43:WU00:FS01:0x22:   Platform: win32 10
16:33:43:WU00:FS01:0x22:       Bits: 64
16:33:43:WU00:FS01:0x22:       Mode: Release
16:33:43:WU00:FS01:0x22:************************************ System ************************************
16:33:43:WU00:FS01:0x22:        CPU: AMD Ryzen 9 5900X 12-Core Processor
16:33:43:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 25 Model 33 Stepping 0
16:33:43:WU00:FS01:0x22:       CPUs: 24
16:33:43:WU00:FS01:0x22:     Memory: 31.93GiB
16:33:43:WU00:FS01:0x22:Free Memory: 25.00GiB
16:33:43:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
16:33:43:WU00:FS01:0x22: OS Version: 6.2
16:33:43:WU00:FS01:0x22:Has Battery: false
16:33:43:WU00:FS01:0x22: On Battery: false
16:33:43:WU00:FS01:0x22: UTC Offset: -8
16:33:43:WU00:FS01:0x22:        PID: 7284
16:33:43:WU00:FS01:0x22:        CWD: C:\ProgramData\FAHClient\work
16:33:43:WU00:FS01:0x22:************************************ OpenMM ************************************
16:33:43:WU00:FS01:0x22:   Revision: 189320d0
16:33:43:WU00:FS01:0x22:********************************************************************************
16:33:43:WU00:FS01:0x22:Project: 17427 (Run 0, Clone 352, Gen 216)
16:33:43:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
16:33:43:WU00:FS01:0x22:Digital signatures verified
16:33:43:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
16:33:43:WU00:FS01:0x22:Version 0.0.13
16:33:43:WU00:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
16:33:43:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
16:33:43:WU00:FS01:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
16:33:43:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
16:33:43:WU00:FS01:0x22:There are 4 platforms available.
16:33:43:WU00:FS01:0x22:Platform 0: Reference
16:33:43:WU00:FS01:0x22:Platform 1: CPU
16:33:43:WU00:FS01:0x22:Platform 2: OpenCL
16:33:43:WU00:FS01:0x22:  opencl-device 0 specified
16:33:43:WU00:FS01:0x22:Platform 3: CUDA
16:33:43:WU00:FS01:0x22:  cuda-device 0 specified
16:33:44:5:127.0.0.1:New Web session
16:33:49:WU00:FS01:0x22:Attempting to create CUDA context:
16:33:49:WU00:FS01:0x22:  Configuring platform CUDA
16:33:51:WU00:FS01:0x22:  Using CUDA and gpu 0
16:33:51:WU00:FS01:0x22:Completed 0 out of 1250000 steps (0%)
16:43:54:WU00:FS01:0x22:Watchdog triggered, requesting soft shutdown down
16:53:53:WU00:FS01:0x22:Watchdog shutdown failed, hard shutdown triggered
16:53:54:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
16:53:54:WARNING:WU00:FS01:FahCore returned: WU_STALLED (127 = 0x7f)
16:53:54:WU00:FS01:Starting
16:53:54:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 17612 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
16:53:54:WU00:FS01:Started FahCore on PID 22376
16:53:54:WU00:FS01:Core PID:19196
16:53:54:WU00:FS01:FahCore 0x22 started
16:53:55:WU00:FS01:0x22:*********************** Log Started 2021-02-26T16:53:54Z ***********************
16:53:55:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
16:53:55:WU00:FS01:0x22:       Core: Core22
16:53:55:WU00:FS01:0x22:       Type: 0x22
16:53:55:WU00:FS01:0x22:    Version: 0.0.13
16:53:55:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:53:55:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
16:53:55:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
16:53:55:WU00:FS01:0x22:       Date: Sep 19 2020
16:53:55:WU00:FS01:0x22:       Time: 02:35:58
16:53:55:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
16:53:55:WU00:FS01:0x22:     Branch: core22-0.0.13
16:53:55:WU00:FS01:0x22:   Compiler: Visual C++ 2015
16:53:55:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:53:55:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
16:53:55:WU00:FS01:0x22:   Platform: win32 10
16:53:55:WU00:FS01:0x22:       Bits: 64
16:53:55:WU00:FS01:0x22:       Mode: Release
16:53:55:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
16:53:55:WU00:FS01:0x22:             <peastman@stanford.edu>
16:53:55:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 22376 -checkpoint 15
16:53:55:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
16:53:55:WU00:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100
16:53:55:WU00:FS01:0x22:************************************ libFAH ************************************
16:53:55:WU00:FS01:0x22:       Date: Sep 7 2020
16:53:55:WU00:FS01:0x22:       Time: 19:09:56
16:53:55:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
16:53:55:WU00:FS01:0x22:     Branch: HEAD
16:53:55:WU00:FS01:0x22:   Compiler: Visual C++ 2015
16:53:55:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:53:55:WU00:FS01:0x22:   Platform: win32 10
16:53:55:WU00:FS01:0x22:       Bits: 64
16:53:55:WU00:FS01:0x22:       Mode: Release
16:53:55:WU00:FS01:0x22:************************************ CBang *************************************
16:53:55:WU00:FS01:0x22:       Date: Sep 7 2020
16:53:55:WU00:FS01:0x22:       Time: 19:08:30
16:53:55:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
16:53:55:WU00:FS01:0x22:     Branch: HEAD
16:53:55:WU00:FS01:0x22:   Compiler: Visual C++ 2015
16:53:55:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:53:55:WU00:FS01:0x22:   Platform: win32 10
16:53:55:WU00:FS01:0x22:       Bits: 64
16:53:55:WU00:FS01:0x22:       Mode: Release
16:53:55:WU00:FS01:0x22:************************************ System ************************************
16:53:55:WU00:FS01:0x22:        CPU: AMD Ryzen 9 5900X 12-Core Processor
16:53:55:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 25 Model 33 Stepping 0
16:53:55:WU00:FS01:0x22:       CPUs: 24
16:53:55:WU00:FS01:0x22:     Memory: 31.93GiB
16:53:55:WU00:FS01:0x22:Free Memory: 24.68GiB
16:53:55:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
16:53:55:WU00:FS01:0x22: OS Version: 6.2
16:53:55:WU00:FS01:0x22:Has Battery: false
16:53:55:WU00:FS01:0x22: On Battery: false
16:53:55:WU00:FS01:0x22: UTC Offset: -8
16:53:55:WU00:FS01:0x22:        PID: 19196
16:53:55:WU00:FS01:0x22:        CWD: C:\ProgramData\FAHClient\work
16:53:55:WU00:FS01:0x22:************************************ OpenMM ************************************
16:53:55:WU00:FS01:0x22:   Revision: 189320d0
16:53:55:WU00:FS01:0x22:********************************************************************************
16:53:55:WU00:FS01:0x22:Project: 17427 (Run 0, Clone 352, Gen 216)
16:53:55:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
16:53:55:WU00:FS01:0x22:Digital signatures verified
16:53:55:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
16:53:55:WU00:FS01:0x22:Version 0.0.13
16:53:55:WU00:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
16:53:55:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
16:53:55:WU00:FS01:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
16:53:55:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
16:53:55:WU00:FS01:0x22:There are 4 platforms available.
16:53:55:WU00:FS01:0x22:Platform 0: Reference
16:53:55:WU00:FS01:0x22:Platform 1: CPU
16:53:55:WU00:FS01:0x22:Platform 2: OpenCL
16:53:55:WU00:FS01:0x22:  opencl-device 0 specified
16:53:55:WU00:FS01:0x22:Platform 3: CUDA
16:53:55:WU00:FS01:0x22:  cuda-device 0 specified
16:54:01:WU00:FS01:0x22:Attempting to create CUDA context:
16:54:01:WU00:FS01:0x22:  Configuring platform CUDA
16:54:03:WU00:FS01:0x22:  Using CUDA and gpu 0
16:54:03:WU00:FS01:0x22:Completed 0 out of 1250000 steps (0%)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU WUs taking much longer than expected

Post by bruce »

That looks like an old log, though I can't be sure. What error appears after the
00:43:16:WU00:FS01:0x22:Checkpoint completed at step 0
at the bottom.

Nobody has actually diagnosed exactly where and why such conflicts arise.

I'd probably do a clean install of the driver if it were my system instead of trying to back out the offending component.
bana400
Posts: 5
Joined: Sun Feb 14, 2021 10:08 pm

Re: GPU WUs taking much longer than expected

Post by bana400 »

Gotcha, will try a clean install again.

But regarding what comes after Checkpoint completed at step 0:
First, I get: "Checkpoint completed at step 0"

Exactly 10 minutes after that, I get:
"Watchdog triggered, requesting soft shutdown down"

Exactly 10 minutes after that, I get:
Watchdog shutdown failed, hard shutdown triggered
FahCore returned an unknown error code which probably indicates that it crashed
FahCore returned: WU_STALLED (127 = 0x7f)
Starting

Then the program tries the task again, and the cycle just repeats: start task -> watchdog triggered, requesting soft shutdown down -> shutdown failed, hard shutdown triggered, exact same WU_STALLED 127 message
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU WUs taking much longer than expected

Post by bruce »

Apparently the driver that came with the development kit is trying to invoke some other software component that you removed so I expect that the clean install will work.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: GPU WUs taking much longer than expected

Post by PantherX »

Welcome to the F@H Forum bana400,

As stated, it is a known issue and Nvidia is aware of it. However, there's no ETA as to when it will be fixed.

As bruce mentioned, a fresh installation of only the drivers from here: https://www.nvidia.com/Download/index.aspx?lang=en-us# would do the trick :) You can also verify the OpenCL/CUDA functionality by using GPU-Z: https://www.techpowerup.com/download/techpowerup-gpu-z/ and seeing what features are available.

When it comes to Task Manager, it will show different values depending on what Windows 10 version you're using and you may need to change the component from the drop-down menus. However, GPU-Z does a reliable job and is more intuitive IMO.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply