Page 1 of 2

FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Tue Sep 29, 2020 6:11 pm
by CarolusMagnus
I had set Checkpoint frequency to 3 min for testing; now trying to get back to 30 min, the Core 22 v.0.0.13 still saves a checkpoint every 3 min.
Tried to restart session, then PC, with no change.

Here's the present log:

Code: Select all

*********************** Log Started 2020-09-29T17:33:01Z ***********************
17:33:01:Trying to access database...
17:33:01:Successfully acquired database lock
17:33:01:Read GPUs.txt
17:33:01:Enabled folding slot 01: READY gpu:0:TU104 [GeForce RTX 2070 SUPER] 8218
17:33:01:****************************** FAHClient ******************************
17:33:01:      Version: 7.6.13
17:33:01:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:33:01:    Copyright: 2020 foldingathome.org
17:33:01:     Homepage: https://foldingathome.org/
17:33:01:         Date: Apr 27 2020
17:33:01:         Time: 21:21:01
17:33:01:     Revision: 5a652817f46116b6e135503af97f18e094414e3b
17:33:01:       Branch: master
17:33:01:     Compiler: Visual C++ 2008
17:33:01:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
17:33:01:     Platform: win32 10
17:33:01:         Bits: 32
17:33:01:         Mode: Release
17:33:01:       Config: C:\Users\Charles\AppData\Roaming\FAHClient\config.xml
17:33:01:******************************** CBang ********************************
17:33:01:         Date: Apr 24 2020
17:33:01:         Time: 17:07:55
17:33:01:     Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
17:33:01:       Branch: master
17:33:01:     Compiler: Visual C++ 2008
17:33:01:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
17:33:01:     Platform: win32 10
17:33:01:         Bits: 32
17:33:01:         Mode: Release
17:33:01:******************************* System ********************************
17:33:01:          CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
17:33:01:       CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
17:33:01:         CPUs: 8
17:33:01:       Memory: 23.84GiB
17:33:01:  Free Memory: 19.69GiB
17:33:01:      Threads: WINDOWS_THREADS
17:33:01:   OS Version: 6.1
17:33:01:  Has Battery: false
17:33:01:   On Battery: false
17:33:01:   UTC Offset: 2
17:33:01:          PID: 4536
17:33:01:          CWD: C:\Users\Charles\AppData\Roaming\FAHClient
17:33:01:Win32 Service: false
17:33:01:           OS: Windows 7 Professional
17:33:01:      OS Arch: AMD64
17:33:01:         GPUs: 1
17:33:01:        GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU104 [GeForce RTX 2070 SUPER] 8218
17:33:01:CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:11.1
17:33:01:       OpenCL: Not detected: clGetDeviceIDs() returned -1
17:33:01:******************************* libFAH ********************************
17:33:01:         Date: Apr 15 2020
17:33:01:         Time: 14:53:14
17:33:01:     Revision: 216968bc7025029c841ed6e36e81a03a316890d3
17:33:01:       Branch: master
17:33:01:     Compiler: Visual C++ 2008
17:33:01:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
17:33:01:     Platform: win32 10
17:33:01:         Bits: 32
17:33:01:         Mode: Release
17:33:01:***********************************************************************
17:33:01:<config>
17:33:01:  <!-- Folding Core -->
17:33:01:  <checkpoint v='30'/>
17:33:01:  <core-priority v='low'/>
17:33:01:
17:33:01:  <!-- Folding Slot Configuration -->
17:33:01:  <cause v='HIGH_PRIORITY'/>
17:33:01:
17:33:01:  <!-- HTTP Server -->
17:33:01:  <allow v='127.0.0.1 192.168.1.0/24'/>
17:33:01:
17:33:01:  <!-- Network -->
17:33:01:  <proxy v=':8080'/>
17:33:01:
17:33:01:  <!-- Remote Command Server -->
17:33:01:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
17:33:01:
17:33:01:  <!-- Slot Control -->
17:33:01:  <pause-on-battery v='false'/>
17:33:01:  <power v='FULL'/>
17:33:01:
17:33:01:  <!-- User Information -->
17:33:01:  <passkey v='*****'/>
17:33:01:  <team v='51'/>
17:33:01:  <user v='CarolusMagnus'/>
17:33:01:
17:33:01:  <!-- Folding Slots -->
17:33:01:  <slot id='1' type='GPU'>
17:33:01:    <opencl-index v='0'/>
17:33:01:  </slot>
17:33:01:</config>
17:33:01:WU00:FS01:Starting
17:33:01:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Charles\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 4536 -checkpoint 30 -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
17:33:01:WU00:FS01:Started FahCore on PID 4792
17:33:01:WU00:FS01:Core PID:4804
17:33:01:WU00:FS01:FahCore 0x22 started
17:33:01:WU00:FS01:0x22:*********************** Log Started 2020-09-29T17:33:01Z ***********************
17:33:01:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
17:33:01:WU00:FS01:0x22:       Core: Core22
17:33:01:WU00:FS01:0x22:       Type: 0x22
17:33:01:WU00:FS01:0x22:    Version: 0.0.13
17:33:01:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:33:01:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
17:33:01:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
17:33:01:WU00:FS01:0x22:       Date: Sep 19 2020
17:33:01:WU00:FS01:0x22:       Time: 02:35:58
17:33:01:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
17:33:01:WU00:FS01:0x22:     Branch: core22-0.0.13
17:33:01:WU00:FS01:0x22:   Compiler: Visual C++ 2015
17:33:01:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:33:01:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
17:33:01:WU00:FS01:0x22:   Platform: win32 10
17:33:01:WU00:FS01:0x22:       Bits: 64
17:33:01:WU00:FS01:0x22:       Mode: Release
17:33:01:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
17:33:01:WU00:FS01:0x22:             <peastman@stanford.edu>
17:33:01:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 4792 -checkpoint 30
17:33:01:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
17:33:01:WU00:FS01:0x22:************************************ libFAH ************************************
17:33:01:WU00:FS01:0x22:       Date: Sep 7 2020
17:33:01:WU00:FS01:0x22:       Time: 19:09:56
17:33:01:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
17:33:01:WU00:FS01:0x22:     Branch: HEAD
17:33:01:WU00:FS01:0x22:   Compiler: Visual C++ 2015
17:33:01:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:33:01:WU00:FS01:0x22:   Platform: win32 10
17:33:01:WU00:FS01:0x22:       Bits: 64
17:33:01:WU00:FS01:0x22:       Mode: Release
17:33:01:WU00:FS01:0x22:************************************ CBang *************************************
17:33:01:WU00:FS01:0x22:       Date: Sep 7 2020
17:33:01:WU00:FS01:0x22:       Time: 19:08:30
17:33:01:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
17:33:01:WU00:FS01:0x22:     Branch: HEAD
17:33:01:WU00:FS01:0x22:   Compiler: Visual C++ 2015
17:33:01:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:33:01:WU00:FS01:0x22:   Platform: win32 10
17:33:01:WU00:FS01:0x22:       Bits: 64
17:33:01:WU00:FS01:0x22:       Mode: Release
17:33:01:WU00:FS01:0x22:************************************ System ************************************
17:33:01:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
17:33:01:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
17:33:01:WU00:FS01:0x22:       CPUs: 8
17:33:01:WU00:FS01:0x22:     Memory: 23.84GiB
17:33:01:WU00:FS01:0x22:Free Memory: 19.66GiB
17:33:01:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
17:33:01:WU00:FS01:0x22: OS Version: 6.1
17:33:01:WU00:FS01:0x22:Has Battery: false
17:33:01:WU00:FS01:0x22: On Battery: false
17:33:01:WU00:FS01:0x22: UTC Offset: 2
17:33:01:WU00:FS01:0x22:        PID: 4804
17:33:01:WU00:FS01:0x22:        CWD: C:\Users\Charles\AppData\Roaming\FAHClient\work
17:33:01:WU00:FS01:0x22:************************************ OpenMM ************************************
17:33:01:WU00:FS01:0x22:   Revision: 189320d0
17:33:01:WU00:FS01:0x22:********************************************************************************
17:33:01:WU00:FS01:0x22:Project: 16918 (Run 113, Clone 74, Gen 68)
17:33:01:WU00:FS01:0x22:Unit: 0x0000005b0002894c5f17618adab4ba51
17:33:01:WU00:FS01:0x22:Digital signatures verified
17:33:01:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
17:33:01:WU00:FS01:0x22:Version 0.0.13
17:33:01:WU00:FS01:0x22:  Checkpoint write interval: 100000 steps (2%) [50 total]
17:33:01:WU00:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
17:33:01:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
17:33:01:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
17:33:01:WU00:FS01:0x22:There are 4 platforms available.
17:33:01:WU00:FS01:0x22:Platform 0: Reference
17:33:01:WU00:FS01:0x22:Platform 1: CPU
17:33:01:WU00:FS01:0x22:Platform 2: OpenCL
17:33:01:WU00:FS01:0x22:  opencl-device 0 specified
17:33:01:WU00:FS01:0x22:Platform 3: CUDA
17:33:01:WU00:FS01:0x22:  cuda-device 0 specified
17:33:04:WU00:FS01:0x22:Attempting to create CUDA context:
17:33:04:WU00:FS01:0x22:  Configuring platform CUDA
17:33:07:WU00:FS01:0x22:  Using CUDA and gpu 0
17:33:07:WU00:FS01:0x22:Completed 1200000 out of 5000000 steps (24%)
17:34:21:WU00:FS01:0x22:Completed 1250000 out of 5000000 steps (25%)
17:35:36:WU00:FS01:0x22:Completed 1300000 out of 5000000 steps (26%)
17:35:37:WU00:FS01:0x22:Checkpoint completed at step 1300000
17:36:51:WU00:FS01:0x22:Completed 1350000 out of 5000000 steps (27%)
17:38:06:WU00:FS01:0x22:Completed 1400000 out of 5000000 steps (28%)
17:38:06:WU00:FS01:0x22:Checkpoint completed at step 1400000
17:39:21:WU00:FS01:0x22:Completed 1450000 out of 5000000 steps (29%)
17:40:35:WU00:FS01:0x22:Completed 1500000 out of 5000000 steps (30%)
17:40:35:WU00:FS01:0x22:Checkpoint completed at step 1500000
17:41:50:WU00:FS01:0x22:Completed 1550000 out of 5000000 steps (31%)
17:43:04:WU00:FS01:0x22:Completed 1600000 out of 5000000 steps (32%)
17:43:04:WU00:FS01:0x22:Checkpoint completed at step 1600000
17:43:11:Saving configuration to config.xml
17:43:11:<config>
17:43:11:  <!-- Folding Core -->
17:43:11:  <core-priority v='low'/>
17:43:11:
17:43:11:  <!-- Folding Slot Configuration -->
17:43:11:  <cause v='HIGH_PRIORITY'/>
17:43:11:
17:43:11:  <!-- HTTP Server -->
17:43:11:  <allow v='127.0.0.1 192.168.1.0/24'/>
17:43:11:
17:43:11:  <!-- Network -->
17:43:11:  <proxy v=':8080'/>
17:43:11:
17:43:11:  <!-- Remote Command Server -->
17:43:11:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
17:43:11:
17:43:11:  <!-- Slot Control -->
17:43:11:  <pause-on-battery v='false'/>
17:43:11:  <power v='FULL'/>
17:43:11:
17:43:11:  <!-- User Information -->
17:43:11:  <passkey v='*****'/>
17:43:11:  <team v='51'/>
17:43:11:  <user v='CarolusMagnus'/>
17:43:11:
17:43:11:  <!-- Folding Slots -->
17:43:11:  <slot id='1' type='GPU'>
17:43:11:    <opencl-index v='0'/>
17:43:11:  </slot>
17:43:11:</config>
17:44:12:Saving configuration to config.xml
17:44:12:<config>
17:44:12:  <!-- Folding Core -->
17:44:12:  <checkpoint v='30'/>
17:44:12:  <core-priority v='low'/>
17:44:12:
17:44:12:  <!-- Folding Slot Configuration -->
17:44:12:  <cause v='HIGH_PRIORITY'/>
17:44:12:
17:44:12:  <!-- HTTP Server -->
17:44:12:  <allow v='127.0.0.1 192.168.1.0/24'/>
17:44:12:
17:44:12:  <!-- Network -->
17:44:12:  <proxy v=':8080'/>
17:44:12:
17:44:12:  <!-- Remote Command Server -->
17:44:12:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
17:44:12:
17:44:12:  <!-- Slot Control -->
17:44:12:  <pause-on-battery v='false'/>
17:44:12:  <power v='FULL'/>
17:44:12:
17:44:12:  <!-- User Information -->
17:44:12:  <passkey v='*****'/>
17:44:12:  <team v='51'/>
17:44:12:  <user v='CarolusMagnus'/>
17:44:12:
17:44:12:  <!-- Folding Slots -->
17:44:12:  <slot id='1' type='GPU'>
17:44:12:    <opencl-index v='0'/>
17:44:12:  </slot>
17:44:12:</config>
17:44:18:WU00:FS01:0x22:Completed 1650000 out of 5000000 steps (33%)
17:45:33:WU00:FS01:0x22:Completed 1700000 out of 5000000 steps (34%)
17:45:33:WU00:FS01:0x22:Checkpoint completed at step 1700000
17:46:47:WU00:FS01:0x22:Completed 1750000 out of 5000000 steps (35%)
17:48:02:WU00:FS01:0x22:Completed 1800000 out of 5000000 steps (36%)
17:48:02:WU00:FS01:0x22:Checkpoint completed at step 1800000
17:49:17:WU00:FS01:0x22:Completed 1850000 out of 5000000 steps (37%)
17:50:31:WU00:FS01:0x22:Completed 1900000 out of 5000000 steps (38%)
17:50:31:WU00:FS01:0x22:Checkpoint completed at step 1900000
17:51:45:WU00:FS01:0x22:Completed 1950000 out of 5000000 steps (39%)
17:52:59:WU00:FS01:0x22:Completed 2000000 out of 5000000 steps (40%)
17:53:00:WU00:FS01:0x22:Checkpoint completed at step 2000000
17:54:14:WU00:FS01:0x22:Completed 2050000 out of 5000000 steps (41%)
17:55:29:WU00:FS01:0x22:Completed 2100000 out of 5000000 steps (42%)
17:55:29:WU00:FS01:0x22:Checkpoint completed at step 2100000
17:56:43:WU00:FS01:0x22:Completed 2150000 out of 5000000 steps (43%)
17:57:58:WU00:FS01:0x22:Completed 2200000 out of 5000000 steps (44%)
17:57:58:WU00:FS01:0x22:Checkpoint completed at step 2200000
17:59:13:WU00:FS01:0x22:Completed 2250000 out of 5000000 steps (45%)
18:00:28:WU00:FS01:0x22:Completed 2300000 out of 5000000 steps (46%)
18:00:29:WU00:FS01:0x22:Checkpoint completed at step 2300000
18:01:43:WU00:FS01:0x22:Completed 2350000 out of 5000000 steps (47%)
18:02:58:WU00:FS01:0x22:Completed 2400000 out of 5000000 steps (48%)
18:02:58:WU00:FS01:0x22:Checkpoint completed at step 2400000
18:04:13:WU00:FS01:0x22:Completed 2450000 out of 5000000 steps (49%)
18:05:28:WU00:FS01:0x22:Completed 2500000 out of 5000000 steps (50%)
18:05:28:WU00:FS01:0x22:Checkpoint completed at step 2500000
18:06:42:WU00:FS01:0x22:Completed 2550000 out of 5000000 steps (51%)
18:07:57:WU00:FS01:0x22:Completed 2600000 out of 5000000 steps (52%)
18:07:57:WU00:FS01:0x22:Checkpoint completed at step 2600000
Any hint ?

Re: FAHCore v0.0.013 and Checkpoint frequency

Posted: Tue Sep 29, 2020 6:13 pm
by Neil-B
Iirc check point freq in client is for cpu WUs ... gpu checkpoints are set by project owner ... which from your log looks like 2% for that wu

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Tue Sep 29, 2020 6:17 pm
by CarolusMagnus
Is this new to this core 22 v0.0.13 ?
This was running as expected with previous core 22 v0.0.11

Edit:

Seems the checkpoints are saved at fixed compute steps:

Code: Select all

17:33:01:WU00:FS01:0x22:  Checkpoint write interval: 100000 steps (2%) [50 total]
17:33:01:WU00:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
17:33:01:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
17:33:01:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
This seems to me too frequent; How can it be modified ?

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Tue Sep 29, 2020 6:27 pm
by bruce
The Project Owner has to modify it. You can only change the settings of WUs to be processed by your CPU.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Tue Sep 29, 2020 6:35 pm
by Neil-B
Not new ... fairly sure gpu checkpoints have been set by project owner not client for a fair while - before core 21 iirc ... I think however that checkpointing interval notification may be more explicitly stated to the log in the latest client/core combination

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Tue Sep 29, 2020 6:57 pm
by gunnarre
With checkpointing every 2% on an RTX 3080, write speeds would become a problem if you're running on a slow disk, because that means checkpointing more than once per minute.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Tue Sep 29, 2020 7:13 pm
by psaam0001
I am already looking at buying a new SATA III HD and controller for what will soon be the former Windows 7.0 SP1 system. IDE/PATA drives are so 1980's--not worth spending my time with worrying about them crashing and not keeping up with the F@H client anymore.

Once that new drive & controller arrive, I'm installing Fedora on it (so I am no longer dealing w/legacy Windows issues). :D

Paul

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Tue Sep 29, 2020 9:13 pm
by aetch
You shouldn't need a new HDD controller. Any system powerful enough to fold should have at least sata II interfaces.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Tue Sep 29, 2020 10:27 pm
by PantherX
FYI, the checkpoint timer was designed for CPU WUs and never used by GPUs. Prior to FahCore_22 version 0.0.13, the checkpoint details would only be mentioned in science.log file. However, to make it easier for donors to understand and for troubleshooting, that information is also now printed in the log file.

The reason why GPUs can't use the feature to create a checkpoint every X minutes is because during the GPU checkpoint, sanity checks and verification needs to run (this explains the jump in CPU usage during the checkpoint) after a certain amount of data has been generated by the GPU to be verified.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Wed Sep 30, 2020 3:32 am
by psaam0001
aetch wrote:You shouldn't need a new HDD controller. Any system powerful enough to fold should have at least sata II interfaces.
I am having second thoughts on that controller, as I think I can still get away with using a SATA III drive on a SATA II supporting motherboard.

Paul

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Wed Sep 30, 2020 7:42 am
by CarolusMagnus
PantherX wrote:FYI, the checkpoint timer was designed for CPU WUs and never used by GPUs. Prior to FahCore_22 version 0.0.13, the checkpoint details would only be mentioned in science.log file. However, to make it easier for donors to understand and for troubleshooting, that information is also now printed in the log file.

The reason why GPUs can't use the feature to create a checkpoint every X minutes is because during the GPU checkpoint, sanity checks and verification needs to run (this explains the jump in CPU usage during the checkpoint) after a certain amount of data has been generated by the GPU to be verified.
That's much clearer, thanks.

Still, would be appreciable to have these parameters adjustable for each configuration.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Wed Sep 30, 2020 4:22 pm
by Joe_H
CarolusMagnus wrote:Still, would be appreciable to have these parameters adjustable for each configuration.
Using the current codebase for the GPU folding cores this is just not going to happen for a user adjustable parameter. It is adjusted on a per-project basis by the researchers however. Some projects do a checkpoint every 2%, others at up to every 5%. If I recall correctly, the short WUs from the project being used to get benchmarks for different GPUs does its checkpoints every 25%.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Wed Sep 30, 2020 4:47 pm
by bruce
Joe_H wrote:
CarolusMagnus wrote:Still, would be appreciable to have these parameters adjustable for each configuration.
Using the current codebase for the GPU folding cores this is just not going to happen for a user adjustable parameter. It is adjusted on a per-project basis by the researchers however.
I, too, would appreciate being able to adjust it but OpenMM checkpoints can only happen when several internal calculation events coincide, and you don't control those other events. If they don't coincide, the FAHCore wouldn't be able to restart from the checkpoint.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Mon Dec 28, 2020 9:56 pm
by Frogging101
I don't know what factors are considered when deciding checkpoint intervals, but I have to say that I find it inconvenient when I have a WU with a long checkpoint interval. It means that pausing it can waste a fair bit of work. For example right now I'm folding Project:13435 (Run 150, Clone 6, Gen 0). This project checkpoints at every 5% and on my GPU this WU is averaging a TPF (time per frame; effectively time per %) of 7.5 minutes. So if it's at 63% progress and I sit down at my PC and want to do something with the GPU, I either have to wait 15 minutes for it to reach the next checkpoint at 65%, or pause it now and knock it back to 60% and waste the last 22 minutes of work.

This wouldn't be an issue if I had a dedicated folding machine, but unfortunately I only have one decent GPU and it's in the PC I use daily, so folding shares time with gaming.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Posted: Mon Dec 28, 2020 10:12 pm
by Neil-B
The tough part is that checkpoints take time/resource to write and the way gpu checkpoints work a global checkpoint interval needs to be set by the researcher that balances lost work when resetting back to a checkpoint and the resource utilisation of creating many checkpoints that are never used .. along with some other science reasons for checkpointing of course ... it may be that at some point in the future gpu cores and a new client may change this but until then it is a case of the slower gpus or those running large wus suffering the inconvenience you mention ... in my case my gpus waste endless resource writing checkpoints that are never used apart from some sanity checking i believe and even then I don't get wu failures on my gpus so those sanity checks are actually productive either .. I wouldn't say no to being able to reduce the number of checkpoints - same functionality just used in the other direction ... but time will tell if we ever get our wishes .. fingers crossed :)