Page 3 of 5

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Thu Oct 29, 2020 10:25 am
by VxJasonxV
Took me a minute to realize I had to "commit the change" to the Power Limit in Afterburner because I couldn't save 75% to Profile 2, but I've done that now. Profile 1 = 100% Power Limit, normal, etc. Profile 2 = 75% Power Limit, with Temperature Limit linked so it's set to 77 C. I've committed Profile 2 and will run FAH overnight. I don't want to deal with this back and forth on a regular basis so if my computer survives the night I will also look into disabling CUDA. I'm not optimistic because I've had crashes even when set to "Low" Folding power.

Also I noticed that my previous logs aren't complete; Due to the sudden crash nature of the problem, $FahDataDir\log.txt doesn't get moved into a $FahDataDir\logs\log-yyyymmdd-hhmmss.txt file. First thing tomorrow I'm going to look at the last \logs\*.txt file as well as \log.txt and see if there's something I've been missing ever since. I don't know if FAH recovers, retains logs, moves them out, etc., but that's what I will find out.

Random aside, more for my own memory than anything else, earlier today I had FAH on while I was playing with Afterburner and various things, my computer did fully reboot, saw the EFI and everything. Even though I have "Automatically Restart" off, and got a random "IRQL_NOT_LESS_OR_EQUAL" a week or two back. Who knows.

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Sun Nov 01, 2020 6:27 am
by PantherX
VxJasonxV wrote:...I'm not optimistic because I've had crashes even when set to "Low" Folding power...
The impact of that setting on GPU would only mean that it would fold on the GPU when it's idle. It doesn't have any impact on how much GPU will be utilized. Note that the behavior of CPU folding would be to use 50% of it's CPUs.
VxJasonxV wrote:...Also I noticed that my previous logs aren't complete; Due to the sudden crash nature of the problem, $FahDataDir\log.txt doesn't get moved into a $FahDataDir\logs\log-yyyymmdd-hhmmss.txt file. First thing tomorrow I'm going to look at the last \logs\*.txt file as well as \log.txt and see if there's something I've been missing ever since. I don't know if FAH recovers, retains logs, moves them out, etc., but that's what I will find out....
That's weird... FAHClient will always move the existing log file to the logs folder (with the date time string) and open up a new log file when it is restarted. If the log file isn't being written, then that's a disk IO issue since F@H doesn't do anything special there.

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Sun Nov 01, 2020 6:34 am
by VxJasonxV
Same issue at 75%, but I turned it all the way down to 40% (the lowest Afterburner allows) and it has survived for just under 2 days now.

I would rather not have to remember to change it back and forth, so I'm interested in disabling CUDA. Reminder that this issue even happened at "Low" Folding Power. I can't seem to find docs about it. F@H 7.6.21 got rid of the OpenCL and CUDA index settings in the GPU slot tab so I can't set it to 0 there (assuming 0 = disable), I assume it's available via a key/value pair client option I can put in "Expert"?

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Sun Nov 01, 2020 6:45 am
by ajm
You can disable CUDA in the Expert options:
FAHControl -> Configure -> Expert -> Click Add under Extra Core Options -> -disable-cuda -> OK -> Save

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Sun Nov 01, 2020 11:52 pm
by VxJasonxV
I suspect CUDA was my problem. https://foldingathome.org/2020/09/28/fo ... a-support/ is dated September 28, this post is less than a week later, first mentioning it to friends was just a day or two before. It survived overnight, though only about ~7 hours, but considering the crash happens much faster most of the time… Gaming up late, up early to game :D.

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Mon Nov 02, 2020 6:16 am
by ajm
Great, glad to see that you sorted it out!
But looking forward, I would check your PSU and cables. Are they OK? Maybe you use only one cable from the PSU, with two 8-pin connectors? Then if you feel like experimenting, perhaps try with two separate cables.

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Thu Jan 14, 2021 6:37 am
by VxJasonxV
These issues have slowly but started coming back. It used to be once every few days, but now it's back to within an overnight again. I haven't updated F@H client at all, still on 7.6.21 which I feel I was by the conclusion of this thread. I still have -disable-cuda in my "extra core options".

Image

I feel like when I first configured that that the log file said that the Cuda device was "null" or "none" or something, but now:

Code: Select all

04:28:36:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:11.2
No, I still haven't checked my PSU and cables. I meant to, but when things otherwise were working so well, … I also can't imagine that a runtime argument would change without a core app change. Full log just because:

Code: Select all

*********************** Log Started 2021-01-14T04:28:36Z ***********************
04:28:36:******************************* libFAH ********************************
04:28:36:           Date: Oct 20 2020
04:28:36:           Time: 13:36:55
04:28:36:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
04:28:36:         Branch: master
04:28:36:       Compiler: Visual C++ 2015
04:28:36:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
04:28:36:       Platform: win32 10
04:28:36:           Bits: 32
04:28:36:           Mode: Release
04:28:36:****************************** FAHClient ******************************
04:28:36:        Version: 7.6.21
04:28:36:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
04:28:36:      Copyright: 2020 foldingathome.org
04:28:36:       Homepage: https://foldingathome.org/
04:28:36:           Date: Oct 20 2020
04:28:36:           Time: 13:41:04
04:28:36:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
04:28:36:         Branch: master
04:28:36:       Compiler: Visual C++ 2015
04:28:36:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
04:28:36:       Platform: win32 10
04:28:36:           Bits: 32
04:28:36:           Mode: Release
04:28:36:         Config: C:\Users\vxjas\AppData\Roaming\FAHClient\config.xml
04:28:36:******************************** CBang ********************************
04:28:36:           Date: Oct 20 2020
04:28:36:           Time: 11:36:18
04:28:36:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
04:28:36:         Branch: master
04:28:36:       Compiler: Visual C++ 2015
04:28:36:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
04:28:36:       Platform: win32 10
04:28:36:           Bits: 32
04:28:36:           Mode: Release
04:28:36:******************************* System ********************************
04:28:36:            CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
04:28:36:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 12
04:28:36:           CPUs: 16
04:28:36:         Memory: 31.92GiB
04:28:36:    Free Memory: 27.45GiB
04:28:36:        Threads: WINDOWS_THREADS
04:28:36:     OS Version: 6.2
04:28:36:    Has Battery: false
04:28:36:     On Battery: false
04:28:36:     UTC Offset: -7
04:28:36:            PID: 15108
04:28:36:            CWD: C:\Users\vxjas\AppData\Roaming\FAHClient
04:28:36:  Win32 Service: false
04:28:36:             OS: Windows 10 Home
04:28:36:        OS Arch: AMD64
04:28:36:           GPUs: 1
04:28:36:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti] M
04:28:36:                 13448
04:28:36:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:11.2
04:28:36:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:461.9
04:28:36:***********************************************************************
04:28:36:<config>
04:28:36:  <!-- Folding Slot Configuration -->
04:28:36:  <extra-core-args v='-disable-cuda'/>
04:28:36:
04:28:36:  <!-- HTTP Server -->
04:28:36:  <allow v='127.0.0.1/32 192.168.107.1/24'/>
04:28:36:
04:28:36:  <!-- Network -->
04:28:36:  <proxy v=':8080'/>
04:28:36:
04:28:36:  <!-- Remote Command Server -->
04:28:36:  <command-allow-no-pass v='127.0.0.1/32 192.168.107.1/24'/>
04:28:36:  <password v='*****'/>
04:28:36:
04:28:36:  <!-- Slot Control -->
04:28:36:  <power v='full'/>
04:28:36:
04:28:36:  <!-- User Information -->
04:28:36:  <passkey v='*****'/>
04:28:36:  <team v='232992'/>
04:28:36:  <user v='VxJasonxV'/>
04:28:36:
04:28:36:  <!-- Folding Slots -->
04:28:36:  <slot id='0' type='CPU'>
04:28:36:    <idle v='true'/>
04:28:36:  </slot>
04:28:36:  <slot id='1' type='GPU'>
04:28:36:    <idle v='true'/>
04:28:36:    <pci-bus v='1'/>
04:28:36:    <pci-slot v='0'/>
04:28:36:  </slot>
04:28:36:</config>
04:28:36:Trying to access database...
04:28:36:Successfully acquired database lock
04:28:36:FS00:Initialized folding slot 00: cpu:15
04:28:36:FS01:Initialized folding slot 01: gpu:1:0 TU102 [GeForce RTX 2080 Ti] M 13448

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Thu Jan 14, 2021 12:39 pm
by PantherX
The initial section will always have CUDA if you have a Nvidia GPU and the correct drivers installed. The log file shows that you have a RTX 2080 Ti with the latest Game Ready Drivers. What we need to see is the actual GPU Slot starting up where we can see if it is using OpenCL or CUDA. I am 99% sure that it is using OpenCL since the flag is correctly configured.

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Fri Jan 15, 2021 9:15 pm
by VxJasonxV
Ah, ok. Yeah, it's definitely using CL;

Code: Select all

22:52:54:WU02:FS01:0x22:*********************** Log Started 2021-01-11T22:52:54Z ***********************
22:52:54:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
22:52:54:WU02:FS01:0x22:       Core: Core22
22:52:54:WU02:FS01:0x22:       Type: 0x22
22:52:54:WU02:FS01:0x22:    Version: 0.0.13
22:52:54:WU02:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
22:52:54:WU02:FS01:0x22:  Copyright: 2020 foldingathome.org
22:52:54:WU02:FS01:0x22:   Homepage: https://foldingathome.org/
22:52:54:WU02:FS01:0x22:       Date: Sep 19 2020
22:52:54:WU02:FS01:0x22:       Time: 02:35:58
22:52:54:WU02:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
22:52:54:WU02:FS01:0x22:     Branch: core22-0.0.13
22:52:54:WU02:FS01:0x22:   Compiler: Visual C++ 2015
22:52:54:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
22:52:54:WU02:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
22:52:54:WU02:FS01:0x22:   Platform: win32 10
22:52:54:WU02:FS01:0x22:       Bits: 64
22:52:54:WU02:FS01:0x22:       Mode: Release
22:52:54:WU02:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
22:52:54:WU02:FS01:0x22:             <peastman@stanford.edu>
22:52:54:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 706 -lifeline 20656 -checkpoint 15
22:52:54:WU02:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
22:52:54:WU02:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100 -disable-cuda
22:52:54:WU02:FS01:0x22:************************************ libFAH ************************************
22:52:54:WU02:FS01:0x22:       Date: Sep 7 2020
22:52:54:WU02:FS01:0x22:       Time: 19:09:56
22:52:54:WU02:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
22:52:54:WU02:FS01:0x22:     Branch: HEAD
22:52:54:WU02:FS01:0x22:   Compiler: Visual C++ 2015
22:52:54:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
22:52:54:WU02:FS01:0x22:   Platform: win32 10
22:52:54:WU02:FS01:0x22:       Bits: 64
22:52:54:WU02:FS01:0x22:       Mode: Release
22:52:54:WU02:FS01:0x22:************************************ CBang *************************************
22:52:54:WU02:FS01:0x22:       Date: Sep 7 2020
22:52:54:WU02:FS01:0x22:       Time: 19:08:30
22:52:54:WU02:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
22:52:54:WU02:FS01:0x22:     Branch: HEAD
22:52:54:WU02:FS01:0x22:   Compiler: Visual C++ 2015
22:52:54:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
22:52:54:WU02:FS01:0x22:   Platform: win32 10
22:52:54:WU02:FS01:0x22:       Bits: 64
22:52:54:WU02:FS01:0x22:       Mode: Release
22:52:54:WU02:FS01:0x22:************************************ System ************************************
22:52:54:WU02:FS01:0x22:        CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
22:52:54:WU02:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 12
22:52:54:WU02:FS01:0x22:       CPUs: 16
22:52:54:WU02:FS01:0x22:     Memory: 31.92GiB
22:52:54:WU02:FS01:0x22:Free Memory: 24.44GiB
22:52:54:WU02:FS01:0x22:    Threads: WINDOWS_THREADS
22:52:54:WU02:FS01:0x22: OS Version: 6.2
22:52:54:WU02:FS01:0x22:Has Battery: false
22:52:54:WU02:FS01:0x22: On Battery: false
22:52:54:WU02:FS01:0x22: UTC Offset: -7
22:52:54:WU02:FS01:0x22:        PID: 8184
22:52:54:WU02:FS01:0x22:        CWD: C:\Users\vxjas\AppData\Roaming\FAHClient\work
22:52:54:WU02:FS01:0x22:************************************ OpenMM ************************************
22:52:54:WU02:FS01:0x22:   Revision: 189320d0
22:52:54:WU02:FS01:0x22:********************************************************************************
22:52:54:WU02:FS01:0x22:Project: 17320 (Run 0, Clone 404, Gen 58)
22:52:54:WU02:FS01:0x22:Unit: 0x00000000000000000000000000000000
22:52:54:WU02:FS01:0x22:Reading tar file core.xml
22:52:54:WU02:FS01:0x22:Reading tar file integrator.xml.bz2
22:52:54:WU02:FS01:0x22:Reading tar file state.xml.bz2
22:52:54:WU02:FS01:0x22:Reading tar file system.xml.bz2
22:52:54:WU02:FS01:0x22:Digital signatures verified
22:52:54:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
22:52:54:WU02:FS01:0x22:Version 0.0.13
22:52:54:WU02:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
22:52:54:WU02:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
22:52:54:WU02:FS01:0x22:  XTC frame write interval: 250000 steps (20%) [5 total]
22:52:54:WU02:FS01:0x22:  Global context and integrator variables write interval: disabled
22:52:54:WU02:FS01:0x22:There are 4 platforms available.
22:52:54:WU02:FS01:0x22:Platform 0: Reference
22:52:54:WU02:FS01:0x22:Platform 1: CPU
22:52:54:WU02:FS01:0x22:Platform 2: OpenCL
22:52:54:WU02:FS01:0x22:  opencl-device 0 specified
22:52:54:WU02:FS01:0x22:Platform 3: CUDA
22:52:54:WU02:FS01:0x22:  cuda-device 0 specified
22:52:54:WU02:FS01:0x22:Disabling CUDA platform because 'disable-cuda' argument was specified.
22:52:59:WU01:FS01:Upload 48.60%
22:53:05:WU01:FS01:Upload 99.49%
22:53:05:WU01:FS01:Upload complete
22:53:06:WU01:FS01:Server responded WORK_ACK (400)
22:53:06:WU01:FS01:Final credit estimate, 108038.00 points
22:53:06:WU01:FS01:Cleaning up
22:53:18:WU00:FS00:0xa7:Completed 410000 out of 500000 steps (82%)
22:53:26:WU02:FS01:0x22:Attempting to create OpenCL context:
22:53:26:WU02:FS01:0x22:  Configuring platform OpenCL
22:53:35:WU02:FS01:0x22:  Using OpenCL on platformId 0 and gpu 0
22:53:36:WU02:FS01:0x22:Completed 0 out of 1250000 steps (0%)
22:53:38:WU02:FS01:0x22:Checkpoint completed at step 0
…
I have to dig into log archives for that because a new log file is started when my system recovers from the quasi-restarted state. So the latest log file that I can get from the app itself is just a boot and not anything with a run log. I could disable "on idle", let it start up, then re-enable "on idle", or I can just get a slightly older log, as I have here.

No other suggestions besides diving into the physical cabling and health of my machine?

Revisiting my posts here, it seems like throttling it down to 40% did allow the machine to survive, but 75% didn't. I don't know what the power rate differences of those two are. Frankly I feel like like this is a video card specific problem, but that's hard to prove. I will dive into the guts of the machine soon, absent any other recommendations.

Can I also disable OpenCL? Is OpenCL also more demanding than a classic GPU computation? I vaguely recall CL is like a CPU/GPU cooperation or something? I dunno.

Anyway, rambling. Suggestions welcomed.

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Fri Jan 15, 2021 9:34 pm
by Neil-B
Your "choice" is opencl or cuda and cuda works harder than opencl .. you have disabled cuda i think which would normally be the preferred option so opencl it is

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Fri Jan 15, 2021 9:39 pm
by Neil-B
Given disabling cuda hasn't resolved issues id revert to using it and try other changes from there .. but i will now read whole thread to get history :)

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Fri Jan 15, 2021 9:48 pm
by Neil-B
Given you have had issues with gog in the past have you tried removing that as bruce suggested?

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Fri Jan 15, 2021 10:20 pm
by VxJasonxV
The issues with GOG were strictly during updates. I can't imagine 3rd party library problems would be causing this issue. That would crash Folding at best, but Windows on the whole one-step-shy of a Blue Screen? No, I can't blame GOG. I'll shut it off next time I'm upstairs, but I'm near positive it won't make a difference.

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Fri Jan 15, 2021 10:30 pm
by PantherX
Occasionally, there has been a variety of issues with GOG over the years (search.php?st=0&sk=t&sd=d&sr=posts&keywords=GOG). I am aware that software can do all sorts of weird stuff on a system so I personally would test it out to rule this off the list.

I skimmed the topic but can't see what model of PSU you have. You have a decent system that's a typical gaming machine so am perplexed as to why you can't use CUDA while thousands of systems can :| Would be nice to discover the root cause and hopefully fix it!

Re: F@H 7.6.13 + RTX 2080 Ti 456.55 Windows Crash

Posted: Fri Jan 15, 2021 11:05 pm
by Neil-B
Shutting it off may not resolve some of the potential clashes but switching it off is a first good test?