FAHCore 22 v0.0.13 and Checkpoint frequency

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

CarolusMagnus
Posts: 8
Joined: Sun Aug 02, 2020 9:36 pm
Hardware configuration: 1. Windows 7 pro 64 bits, Intel Core i7 4790K 4.4 GHz, 24 GB RAM, 512 GB SAMSUNG 860 PRO SSD, NVIDIA RTX 2070 SUPER from MSI (OC up to 1980 MHz)

2. Windows 7 pro 64 bits, AMD Phenom IIx4 3.8 GHz (OC from 3.5 GHz), 4 GB RAM, 256 GB SAMSUNG 850 PRO SSD, AMD RX580 8GB from MSI (OC @ 1400 MHz)
Location: 44 Derval, FRANCE

FAHCore 22 v0.0.13 and Checkpoint frequency

Post by CarolusMagnus »

I had set Checkpoint frequency to 3 min for testing; now trying to get back to 30 min, the Core 22 v.0.0.13 still saves a checkpoint every 3 min.
Tried to restart session, then PC, with no change.

Here's the present log:

Code: Select all

*********************** Log Started 2020-09-29T17:33:01Z ***********************
17:33:01:Trying to access database...
17:33:01:Successfully acquired database lock
17:33:01:Read GPUs.txt
17:33:01:Enabled folding slot 01: READY gpu:0:TU104 [GeForce RTX 2070 SUPER] 8218
17:33:01:****************************** FAHClient ******************************
17:33:01:      Version: 7.6.13
17:33:01:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:33:01:    Copyright: 2020 foldingathome.org
17:33:01:     Homepage: https://foldingathome.org/
17:33:01:         Date: Apr 27 2020
17:33:01:         Time: 21:21:01
17:33:01:     Revision: 5a652817f46116b6e135503af97f18e094414e3b
17:33:01:       Branch: master
17:33:01:     Compiler: Visual C++ 2008
17:33:01:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
17:33:01:     Platform: win32 10
17:33:01:         Bits: 32
17:33:01:         Mode: Release
17:33:01:       Config: C:\Users\Charles\AppData\Roaming\FAHClient\config.xml
17:33:01:******************************** CBang ********************************
17:33:01:         Date: Apr 24 2020
17:33:01:         Time: 17:07:55
17:33:01:     Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
17:33:01:       Branch: master
17:33:01:     Compiler: Visual C++ 2008
17:33:01:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
17:33:01:     Platform: win32 10
17:33:01:         Bits: 32
17:33:01:         Mode: Release
17:33:01:******************************* System ********************************
17:33:01:          CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
17:33:01:       CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
17:33:01:         CPUs: 8
17:33:01:       Memory: 23.84GiB
17:33:01:  Free Memory: 19.69GiB
17:33:01:      Threads: WINDOWS_THREADS
17:33:01:   OS Version: 6.1
17:33:01:  Has Battery: false
17:33:01:   On Battery: false
17:33:01:   UTC Offset: 2
17:33:01:          PID: 4536
17:33:01:          CWD: C:\Users\Charles\AppData\Roaming\FAHClient
17:33:01:Win32 Service: false
17:33:01:           OS: Windows 7 Professional
17:33:01:      OS Arch: AMD64
17:33:01:         GPUs: 1
17:33:01:        GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU104 [GeForce RTX 2070 SUPER] 8218
17:33:01:CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:11.1
17:33:01:       OpenCL: Not detected: clGetDeviceIDs() returned -1
17:33:01:******************************* libFAH ********************************
17:33:01:         Date: Apr 15 2020
17:33:01:         Time: 14:53:14
17:33:01:     Revision: 216968bc7025029c841ed6e36e81a03a316890d3
17:33:01:       Branch: master
17:33:01:     Compiler: Visual C++ 2008
17:33:01:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
17:33:01:     Platform: win32 10
17:33:01:         Bits: 32
17:33:01:         Mode: Release
17:33:01:***********************************************************************
17:33:01:<config>
17:33:01:  <!-- Folding Core -->
17:33:01:  <checkpoint v='30'/>
17:33:01:  <core-priority v='low'/>
17:33:01:
17:33:01:  <!-- Folding Slot Configuration -->
17:33:01:  <cause v='HIGH_PRIORITY'/>
17:33:01:
17:33:01:  <!-- HTTP Server -->
17:33:01:  <allow v='127.0.0.1 192.168.1.0/24'/>
17:33:01:
17:33:01:  <!-- Network -->
17:33:01:  <proxy v=':8080'/>
17:33:01:
17:33:01:  <!-- Remote Command Server -->
17:33:01:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
17:33:01:
17:33:01:  <!-- Slot Control -->
17:33:01:  <pause-on-battery v='false'/>
17:33:01:  <power v='FULL'/>
17:33:01:
17:33:01:  <!-- User Information -->
17:33:01:  <passkey v='*****'/>
17:33:01:  <team v='51'/>
17:33:01:  <user v='CarolusMagnus'/>
17:33:01:
17:33:01:  <!-- Folding Slots -->
17:33:01:  <slot id='1' type='GPU'>
17:33:01:    <opencl-index v='0'/>
17:33:01:  </slot>
17:33:01:</config>
17:33:01:WU00:FS01:Starting
17:33:01:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Charles\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 4536 -checkpoint 30 -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
17:33:01:WU00:FS01:Started FahCore on PID 4792
17:33:01:WU00:FS01:Core PID:4804
17:33:01:WU00:FS01:FahCore 0x22 started
17:33:01:WU00:FS01:0x22:*********************** Log Started 2020-09-29T17:33:01Z ***********************
17:33:01:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
17:33:01:WU00:FS01:0x22:       Core: Core22
17:33:01:WU00:FS01:0x22:       Type: 0x22
17:33:01:WU00:FS01:0x22:    Version: 0.0.13
17:33:01:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:33:01:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
17:33:01:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
17:33:01:WU00:FS01:0x22:       Date: Sep 19 2020
17:33:01:WU00:FS01:0x22:       Time: 02:35:58
17:33:01:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
17:33:01:WU00:FS01:0x22:     Branch: core22-0.0.13
17:33:01:WU00:FS01:0x22:   Compiler: Visual C++ 2015
17:33:01:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:33:01:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
17:33:01:WU00:FS01:0x22:   Platform: win32 10
17:33:01:WU00:FS01:0x22:       Bits: 64
17:33:01:WU00:FS01:0x22:       Mode: Release
17:33:01:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
17:33:01:WU00:FS01:0x22:             <peastman@stanford.edu>
17:33:01:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 4792 -checkpoint 30
17:33:01:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
17:33:01:WU00:FS01:0x22:************************************ libFAH ************************************
17:33:01:WU00:FS01:0x22:       Date: Sep 7 2020
17:33:01:WU00:FS01:0x22:       Time: 19:09:56
17:33:01:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
17:33:01:WU00:FS01:0x22:     Branch: HEAD
17:33:01:WU00:FS01:0x22:   Compiler: Visual C++ 2015
17:33:01:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:33:01:WU00:FS01:0x22:   Platform: win32 10
17:33:01:WU00:FS01:0x22:       Bits: 64
17:33:01:WU00:FS01:0x22:       Mode: Release
17:33:01:WU00:FS01:0x22:************************************ CBang *************************************
17:33:01:WU00:FS01:0x22:       Date: Sep 7 2020
17:33:01:WU00:FS01:0x22:       Time: 19:08:30
17:33:01:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
17:33:01:WU00:FS01:0x22:     Branch: HEAD
17:33:01:WU00:FS01:0x22:   Compiler: Visual C++ 2015
17:33:01:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:33:01:WU00:FS01:0x22:   Platform: win32 10
17:33:01:WU00:FS01:0x22:       Bits: 64
17:33:01:WU00:FS01:0x22:       Mode: Release
17:33:01:WU00:FS01:0x22:************************************ System ************************************
17:33:01:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
17:33:01:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
17:33:01:WU00:FS01:0x22:       CPUs: 8
17:33:01:WU00:FS01:0x22:     Memory: 23.84GiB
17:33:01:WU00:FS01:0x22:Free Memory: 19.66GiB
17:33:01:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
17:33:01:WU00:FS01:0x22: OS Version: 6.1
17:33:01:WU00:FS01:0x22:Has Battery: false
17:33:01:WU00:FS01:0x22: On Battery: false
17:33:01:WU00:FS01:0x22: UTC Offset: 2
17:33:01:WU00:FS01:0x22:        PID: 4804
17:33:01:WU00:FS01:0x22:        CWD: C:\Users\Charles\AppData\Roaming\FAHClient\work
17:33:01:WU00:FS01:0x22:************************************ OpenMM ************************************
17:33:01:WU00:FS01:0x22:   Revision: 189320d0
17:33:01:WU00:FS01:0x22:********************************************************************************
17:33:01:WU00:FS01:0x22:Project: 16918 (Run 113, Clone 74, Gen 68)
17:33:01:WU00:FS01:0x22:Unit: 0x0000005b0002894c5f17618adab4ba51
17:33:01:WU00:FS01:0x22:Digital signatures verified
17:33:01:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
17:33:01:WU00:FS01:0x22:Version 0.0.13
17:33:01:WU00:FS01:0x22:  Checkpoint write interval: 100000 steps (2%) [50 total]
17:33:01:WU00:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
17:33:01:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
17:33:01:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
17:33:01:WU00:FS01:0x22:There are 4 platforms available.
17:33:01:WU00:FS01:0x22:Platform 0: Reference
17:33:01:WU00:FS01:0x22:Platform 1: CPU
17:33:01:WU00:FS01:0x22:Platform 2: OpenCL
17:33:01:WU00:FS01:0x22:  opencl-device 0 specified
17:33:01:WU00:FS01:0x22:Platform 3: CUDA
17:33:01:WU00:FS01:0x22:  cuda-device 0 specified
17:33:04:WU00:FS01:0x22:Attempting to create CUDA context:
17:33:04:WU00:FS01:0x22:  Configuring platform CUDA
17:33:07:WU00:FS01:0x22:  Using CUDA and gpu 0
17:33:07:WU00:FS01:0x22:Completed 1200000 out of 5000000 steps (24%)
17:34:21:WU00:FS01:0x22:Completed 1250000 out of 5000000 steps (25%)
17:35:36:WU00:FS01:0x22:Completed 1300000 out of 5000000 steps (26%)
17:35:37:WU00:FS01:0x22:Checkpoint completed at step 1300000
17:36:51:WU00:FS01:0x22:Completed 1350000 out of 5000000 steps (27%)
17:38:06:WU00:FS01:0x22:Completed 1400000 out of 5000000 steps (28%)
17:38:06:WU00:FS01:0x22:Checkpoint completed at step 1400000
17:39:21:WU00:FS01:0x22:Completed 1450000 out of 5000000 steps (29%)
17:40:35:WU00:FS01:0x22:Completed 1500000 out of 5000000 steps (30%)
17:40:35:WU00:FS01:0x22:Checkpoint completed at step 1500000
17:41:50:WU00:FS01:0x22:Completed 1550000 out of 5000000 steps (31%)
17:43:04:WU00:FS01:0x22:Completed 1600000 out of 5000000 steps (32%)
17:43:04:WU00:FS01:0x22:Checkpoint completed at step 1600000
17:43:11:Saving configuration to config.xml
17:43:11:<config>
17:43:11:  <!-- Folding Core -->
17:43:11:  <core-priority v='low'/>
17:43:11:
17:43:11:  <!-- Folding Slot Configuration -->
17:43:11:  <cause v='HIGH_PRIORITY'/>
17:43:11:
17:43:11:  <!-- HTTP Server -->
17:43:11:  <allow v='127.0.0.1 192.168.1.0/24'/>
17:43:11:
17:43:11:  <!-- Network -->
17:43:11:  <proxy v=':8080'/>
17:43:11:
17:43:11:  <!-- Remote Command Server -->
17:43:11:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
17:43:11:
17:43:11:  <!-- Slot Control -->
17:43:11:  <pause-on-battery v='false'/>
17:43:11:  <power v='FULL'/>
17:43:11:
17:43:11:  <!-- User Information -->
17:43:11:  <passkey v='*****'/>
17:43:11:  <team v='51'/>
17:43:11:  <user v='CarolusMagnus'/>
17:43:11:
17:43:11:  <!-- Folding Slots -->
17:43:11:  <slot id='1' type='GPU'>
17:43:11:    <opencl-index v='0'/>
17:43:11:  </slot>
17:43:11:</config>
17:44:12:Saving configuration to config.xml
17:44:12:<config>
17:44:12:  <!-- Folding Core -->
17:44:12:  <checkpoint v='30'/>
17:44:12:  <core-priority v='low'/>
17:44:12:
17:44:12:  <!-- Folding Slot Configuration -->
17:44:12:  <cause v='HIGH_PRIORITY'/>
17:44:12:
17:44:12:  <!-- HTTP Server -->
17:44:12:  <allow v='127.0.0.1 192.168.1.0/24'/>
17:44:12:
17:44:12:  <!-- Network -->
17:44:12:  <proxy v=':8080'/>
17:44:12:
17:44:12:  <!-- Remote Command Server -->
17:44:12:  <command-allow-no-pass v='127.0.0.1 192.168.1.0/24'/>
17:44:12:
17:44:12:  <!-- Slot Control -->
17:44:12:  <pause-on-battery v='false'/>
17:44:12:  <power v='FULL'/>
17:44:12:
17:44:12:  <!-- User Information -->
17:44:12:  <passkey v='*****'/>
17:44:12:  <team v='51'/>
17:44:12:  <user v='CarolusMagnus'/>
17:44:12:
17:44:12:  <!-- Folding Slots -->
17:44:12:  <slot id='1' type='GPU'>
17:44:12:    <opencl-index v='0'/>
17:44:12:  </slot>
17:44:12:</config>
17:44:18:WU00:FS01:0x22:Completed 1650000 out of 5000000 steps (33%)
17:45:33:WU00:FS01:0x22:Completed 1700000 out of 5000000 steps (34%)
17:45:33:WU00:FS01:0x22:Checkpoint completed at step 1700000
17:46:47:WU00:FS01:0x22:Completed 1750000 out of 5000000 steps (35%)
17:48:02:WU00:FS01:0x22:Completed 1800000 out of 5000000 steps (36%)
17:48:02:WU00:FS01:0x22:Checkpoint completed at step 1800000
17:49:17:WU00:FS01:0x22:Completed 1850000 out of 5000000 steps (37%)
17:50:31:WU00:FS01:0x22:Completed 1900000 out of 5000000 steps (38%)
17:50:31:WU00:FS01:0x22:Checkpoint completed at step 1900000
17:51:45:WU00:FS01:0x22:Completed 1950000 out of 5000000 steps (39%)
17:52:59:WU00:FS01:0x22:Completed 2000000 out of 5000000 steps (40%)
17:53:00:WU00:FS01:0x22:Checkpoint completed at step 2000000
17:54:14:WU00:FS01:0x22:Completed 2050000 out of 5000000 steps (41%)
17:55:29:WU00:FS01:0x22:Completed 2100000 out of 5000000 steps (42%)
17:55:29:WU00:FS01:0x22:Checkpoint completed at step 2100000
17:56:43:WU00:FS01:0x22:Completed 2150000 out of 5000000 steps (43%)
17:57:58:WU00:FS01:0x22:Completed 2200000 out of 5000000 steps (44%)
17:57:58:WU00:FS01:0x22:Checkpoint completed at step 2200000
17:59:13:WU00:FS01:0x22:Completed 2250000 out of 5000000 steps (45%)
18:00:28:WU00:FS01:0x22:Completed 2300000 out of 5000000 steps (46%)
18:00:29:WU00:FS01:0x22:Checkpoint completed at step 2300000
18:01:43:WU00:FS01:0x22:Completed 2350000 out of 5000000 steps (47%)
18:02:58:WU00:FS01:0x22:Completed 2400000 out of 5000000 steps (48%)
18:02:58:WU00:FS01:0x22:Checkpoint completed at step 2400000
18:04:13:WU00:FS01:0x22:Completed 2450000 out of 5000000 steps (49%)
18:05:28:WU00:FS01:0x22:Completed 2500000 out of 5000000 steps (50%)
18:05:28:WU00:FS01:0x22:Checkpoint completed at step 2500000
18:06:42:WU00:FS01:0x22:Completed 2550000 out of 5000000 steps (51%)
18:07:57:WU00:FS01:0x22:Completed 2600000 out of 5000000 steps (52%)
18:07:57:WU00:FS01:0x22:Checkpoint completed at step 2600000
Any hint ?
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: FAHCore v0.0.013 and Checkpoint frequency

Post by Neil-B »

Iirc check point freq in client is for cpu WUs ... gpu checkpoints are set by project owner ... which from your log looks like 2% for that wu
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
CarolusMagnus
Posts: 8
Joined: Sun Aug 02, 2020 9:36 pm
Hardware configuration: 1. Windows 7 pro 64 bits, Intel Core i7 4790K 4.4 GHz, 24 GB RAM, 512 GB SAMSUNG 860 PRO SSD, NVIDIA RTX 2070 SUPER from MSI (OC up to 1980 MHz)

2. Windows 7 pro 64 bits, AMD Phenom IIx4 3.8 GHz (OC from 3.5 GHz), 4 GB RAM, 256 GB SAMSUNG 850 PRO SSD, AMD RX580 8GB from MSI (OC @ 1400 MHz)
Location: 44 Derval, FRANCE

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by CarolusMagnus »

Is this new to this core 22 v0.0.13 ?
This was running as expected with previous core 22 v0.0.11

Edit:

Seems the checkpoints are saved at fixed compute steps:

Code: Select all

17:33:01:WU00:FS01:0x22:  Checkpoint write interval: 100000 steps (2%) [50 total]
17:33:01:WU00:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
17:33:01:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
17:33:01:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
This seems to me too frequent; How can it be modified ?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by bruce »

The Project Owner has to modify it. You can only change the settings of WUs to be processed by your CPU.
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by Neil-B »

Not new ... fairly sure gpu checkpoints have been set by project owner not client for a fair while - before core 21 iirc ... I think however that checkpointing interval notification may be more explicitly stated to the log in the latest client/core combination
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
gunnarre
Posts: 567
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by gunnarre »

With checkpointing every 2% on an RTX 3080, write speeds would become a problem if you're running on a slow disk, because that means checkpointing more than once per minute.
Image
Online: GTX 1660 Super, GTX 1080, GTX 1050 Ti 4G OC, RX580 + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 960, GTX 950
psaam0001
Posts: 383
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by psaam0001 »

I am already looking at buying a new SATA III HD and controller for what will soon be the former Windows 7.0 SP1 system. IDE/PATA drives are so 1980's--not worth spending my time with worrying about them crashing and not keeping up with the F@H client anymore.

Once that new drive & controller arrive, I'm installing Fedora on it (so I am no longer dealing w/legacy Windows issues). :D

Paul
aetch
Posts: 447
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by aetch »

You shouldn't need a new HDD controller. Any system powerful enough to fold should have at least sata II interfaces.
Folding Rigs - None (25-Jun-2022)

ImageImage
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by PantherX »

FYI, the checkpoint timer was designed for CPU WUs and never used by GPUs. Prior to FahCore_22 version 0.0.13, the checkpoint details would only be mentioned in science.log file. However, to make it easier for donors to understand and for troubleshooting, that information is also now printed in the log file.

The reason why GPUs can't use the feature to create a checkpoint every X minutes is because during the GPU checkpoint, sanity checks and verification needs to run (this explains the jump in CPU usage during the checkpoint) after a certain amount of data has been generated by the GPU to be verified.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
psaam0001
Posts: 383
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by psaam0001 »

aetch wrote:You shouldn't need a new HDD controller. Any system powerful enough to fold should have at least sata II interfaces.
I am having second thoughts on that controller, as I think I can still get away with using a SATA III drive on a SATA II supporting motherboard.

Paul
CarolusMagnus
Posts: 8
Joined: Sun Aug 02, 2020 9:36 pm
Hardware configuration: 1. Windows 7 pro 64 bits, Intel Core i7 4790K 4.4 GHz, 24 GB RAM, 512 GB SAMSUNG 860 PRO SSD, NVIDIA RTX 2070 SUPER from MSI (OC up to 1980 MHz)

2. Windows 7 pro 64 bits, AMD Phenom IIx4 3.8 GHz (OC from 3.5 GHz), 4 GB RAM, 256 GB SAMSUNG 850 PRO SSD, AMD RX580 8GB from MSI (OC @ 1400 MHz)
Location: 44 Derval, FRANCE

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by CarolusMagnus »

PantherX wrote:FYI, the checkpoint timer was designed for CPU WUs and never used by GPUs. Prior to FahCore_22 version 0.0.13, the checkpoint details would only be mentioned in science.log file. However, to make it easier for donors to understand and for troubleshooting, that information is also now printed in the log file.

The reason why GPUs can't use the feature to create a checkpoint every X minutes is because during the GPU checkpoint, sanity checks and verification needs to run (this explains the jump in CPU usage during the checkpoint) after a certain amount of data has been generated by the GPU to be verified.
That's much clearer, thanks.

Still, would be appreciable to have these parameters adjustable for each configuration.
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by Joe_H »

CarolusMagnus wrote:Still, would be appreciable to have these parameters adjustable for each configuration.
Using the current codebase for the GPU folding cores this is just not going to happen for a user adjustable parameter. It is adjusted on a per-project basis by the researchers however. Some projects do a checkpoint every 2%, others at up to every 5%. If I recall correctly, the short WUs from the project being used to get benchmarks for different GPUs does its checkpoints every 25%.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by bruce »

Joe_H wrote:
CarolusMagnus wrote:Still, would be appreciable to have these parameters adjustable for each configuration.
Using the current codebase for the GPU folding cores this is just not going to happen for a user adjustable parameter. It is adjusted on a per-project basis by the researchers however.
I, too, would appreciate being able to adjust it but OpenMM checkpoints can only happen when several internal calculation events coincide, and you don't control those other events. If they don't coincide, the FAHCore wouldn't be able to restart from the checkpoint.
Frogging101
Posts: 85
Joined: Wed Mar 25, 2020 2:39 am
Location: Canada

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by Frogging101 »

I don't know what factors are considered when deciding checkpoint intervals, but I have to say that I find it inconvenient when I have a WU with a long checkpoint interval. It means that pausing it can waste a fair bit of work. For example right now I'm folding Project:13435 (Run 150, Clone 6, Gen 0). This project checkpoints at every 5% and on my GPU this WU is averaging a TPF (time per frame; effectively time per %) of 7.5 minutes. So if it's at 63% progress and I sit down at my PC and want to do something with the GPU, I either have to wait 15 minutes for it to reach the next checkpoint at 65%, or pause it now and knock it back to 60% and waste the last 22 minutes of work.

This wouldn't be an issue if I had a dedicated folding machine, but unfortunately I only have one decent GPU and it's in the PC I use daily, so folding shares time with gaming.
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: FAHCore 22 v0.0.13 and Checkpoint frequency

Post by Neil-B »

The tough part is that checkpoints take time/resource to write and the way gpu checkpoints work a global checkpoint interval needs to be set by the researcher that balances lost work when resetting back to a checkpoint and the resource utilisation of creating many checkpoints that are never used .. along with some other science reasons for checkpointing of course ... it may be that at some point in the future gpu cores and a new client may change this but until then it is a case of the slower gpus or those running large wus suffering the inconvenience you mention ... in my case my gpus waste endless resource writing checkpoints that are never used apart from some sanity checking i believe and even then I don't get wu failures on my gpus so those sanity checks are actually productive either .. I wouldn't say no to being able to reduce the number of checkpoints - same functionality just used in the other direction ... but time will tell if we ever get our wishes .. fingers crossed :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Post Reply