Project: 16484 lots of NaN ?

Moderators: Site Moderators, FAHC Science Team

Post Reply
Knish
Posts: 232
Joined: Tue Mar 17, 2020 5:20 am

Project: 16484 lots of NaN ?

Post by Knish »

i guess it's running as it should, and fortunately it's restarting successfully and keeps on chugging, but it does seem like a lot of restarts along the way

Code: Select all

18:50:03:******************************* libFAH ********************************
18:50:03:           Date: Oct 20 2020
18:50:03:           Time: 13:36:55
18:50:03:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
18:50:03:         Branch: master
18:50:03:       Compiler: Visual C++ 2015
18:50:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
18:50:03:       Platform: win32 10
18:50:03:           Bits: 32
18:50:03:           Mode: Release
18:50:03:****************************** FAHClient ******************************
18:50:03:        Version: 7.6.21
18:50:03:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:50:03:      Copyright: 2020 foldingathome.org
18:50:03:       Homepage: https://foldingathome.org/
18:50:03:           Date: Oct 20 2020
18:50:03:           Time: 13:41:04
18:50:03:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
18:50:03:         Branch: master
18:50:03:       Compiler: Visual C++ 2015
18:50:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
18:50:03:       Platform: win32 10
18:50:03:           Bits: 32
18:50:03:           Mode: Release
18:50:03:         Config: C:\ProgramData\FAHClient\config.xml
18:50:03:******************************** CBang ********************************
18:50:03:           Date: Oct 20 2020
18:50:03:           Time: 11:36:18
18:50:03:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
18:50:03:         Branch: master
18:50:03:       Compiler: Visual C++ 2015
18:50:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
18:50:03:       Platform: win32 10
18:50:03:           Bits: 32
18:50:03:           Mode: Release
18:50:03:******************************* System ********************************
18:50:03:            CPU: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
18:50:03:         CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
18:50:03:           CPUs: 4
18:50:03:         Memory: 7.94GiB
18:50:03:    Free Memory: 6.26GiB
18:50:03:        Threads: WINDOWS_THREADS
18:50:03:     OS Version: 6.2
18:50:03:    Has Battery: false
18:50:03:     On Battery: false
18:50:03:     UTC Offset: -8
18:50:03:            PID: 7756
18:50:03:            CWD: C:\ProgramData\FAHClient
18:50:03:  Win32 Service: false
18:50:03:             OS: Windows 10 Home
18:50:03:        OS Arch: AMD64
18:50:03:           GPUs: 1
18:50:03:          GPU 0: Bus:1 Slot:0 Func:0 AMD:5 Amethyst XT [Radeon R9 M295X]
18:50:03:           CUDA: Not detected: Failed to open dynamic library 'nvcuda.dll': The
18:50:03:                 specified module could not be found.
18:50:03:
18:50:03:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:3075.13
18:50:03:***********************************************************************


18:50:03:Trying to access database...
18:50:03:Successfully acquired database lock
18:50:03:FS00:Initialized folding slot 00: cpu:3
18:50:03:FS01:Initialized folding slot 01: gpu:1:0 Amethyst XT [Radeon R9 M295X]
18:50:03:WU01:FS01:Starting
18:50:03:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 7756 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -gpu-vendor amd -gpu 0 -gpu-usage 100
18:50:03:WU01:FS01:Started FahCore on PID 7968
18:50:04:WU01:FS01:Core PID:7992


16:08:32:WU00:FS01:0x22:*********************** Log Started 2021-11-18T16:08:32Z ***********************
16:08:32:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
16:08:32:WU00:FS01:0x22:       Core: Core22
16:08:32:WU00:FS01:0x22:       Type: 0x22
16:08:32:WU00:FS01:0x22:    Version: 0.0.13
16:08:32:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:08:32:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
16:08:32:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
16:08:32:WU00:FS01:0x22:       Date: Sep 19 2020
16:08:32:WU00:FS01:0x22:       Time: 02:35:58
16:08:32:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
16:08:32:WU00:FS01:0x22:     Branch: core22-0.0.13
16:08:32:WU00:FS01:0x22:   Compiler: Visual C++ 2015
16:08:32:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:08:32:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
16:08:32:WU00:FS01:0x22:   Platform: win32 10
16:08:32:WU00:FS01:0x22:       Bits: 64
16:08:32:WU00:FS01:0x22:       Mode: Release
16:08:32:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
16:08:32:WU00:FS01:0x22:             <peastman@stanford.edu>
16:08:32:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 7084 -checkpoint 15
16:08:32:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -gpu-vendor amd -gpu 0
16:08:32:WU00:FS01:0x22:             -gpu-usage 100
16:08:32:WU00:FS01:0x22:************************************ libFAH ************************************
16:08:32:WU00:FS01:0x22:       Date: Sep 7 2020
16:08:32:WU00:FS01:0x22:       Time: 19:09:56
16:08:32:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
16:08:32:WU00:FS01:0x22:     Branch: HEAD
16:08:32:WU00:FS01:0x22:   Compiler: Visual C++ 2015
16:08:32:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:08:32:WU00:FS01:0x22:   Platform: win32 10
16:08:32:WU00:FS01:0x22:       Bits: 64
16:08:32:WU00:FS01:0x22:       Mode: Release
16:08:32:WU00:FS01:0x22:************************************ CBang *************************************
16:08:32:WU00:FS01:0x22:       Date: Sep 7 2020
16:08:32:WU00:FS01:0x22:       Time: 19:08:30
16:08:32:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
16:08:32:WU00:FS01:0x22:     Branch: HEAD
16:08:32:WU00:FS01:0x22:   Compiler: Visual C++ 2015
16:08:32:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
16:08:32:WU00:FS01:0x22:   Platform: win32 10
16:08:32:WU00:FS01:0x22:       Bits: 64
16:08:32:WU00:FS01:0x22:       Mode: Release
16:08:32:WU00:FS01:0x22:************************************ System ************************************
16:08:32:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
16:08:32:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
16:08:32:WU00:FS01:0x22:       CPUs: 4
16:08:32:WU00:FS01:0x22:     Memory: 7.94GiB
16:08:32:WU00:FS01:0x22:Free Memory: 3.67GiB
16:08:32:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
16:08:32:WU00:FS01:0x22: OS Version: 6.2
16:08:32:WU00:FS01:0x22:Has Battery: false
16:08:32:WU00:FS01:0x22: On Battery: false
16:08:32:WU00:FS01:0x22: UTC Offset: -8
16:08:32:WU00:FS01:0x22:        PID: 12068
16:08:32:WU00:FS01:0x22:        CWD: C:\ProgramData\FAHClient\work
16:08:32:WU00:FS01:0x22:************************************ OpenMM ************************************
16:08:32:WU00:FS01:0x22:   Revision: 189320d0
16:08:32:WU00:FS01:0x22:********************************************************************************
16:08:32:WU00:FS01:0x22:Project: 16484 (Run 0, Clone 213, Gen 30)
16:08:32:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
16:08:32:WU00:FS01:0x22:Reading tar file core.xml
16:08:32:WU00:FS01:0x22:Reading tar file integrator.xml.bz2
16:08:32:WU00:FS01:0x22:Reading tar file state.xml.bz2
16:08:32:WU00:FS01:0x22:Reading tar file system.xml.bz2
16:08:32:WU00:FS01:0x22:Digital signatures verified
16:08:32:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
16:08:32:WU00:FS01:0x22:Version 0.0.13
16:08:32:WU00:FS01:0x22:  Checkpoint write interval: 125000 steps (5%) [20 total]
16:08:32:WU00:FS01:0x22:  JSON viewer frame write interval: 25000 steps (1%) [100 total]
16:08:32:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (10%) [10 total]
16:08:32:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
16:08:32:WU00:FS01:0x22:There are 3 platforms available.
16:08:32:WU00:FS01:0x22:Platform 0: Reference
16:08:32:WU00:FS01:0x22:Platform 1: CPU
16:08:32:WU00:FS01:0x22:Platform 2: OpenCL
16:08:32:WU00:FS01:0x22:  opencl-device 0 specified
16:08:44:WU00:FS01:0x22:Attempting to create OpenCL context:
16:08:44:WU00:FS01:0x22:  Configuring platform OpenCL
16:08:52:WU00:FS01:0x22:  Using OpenCL on platformId 0 and gpu 0
16:08:52:WU00:FS01:0x22:Completed 0 out of 2500000 steps (0%)
16:08:53:WU00:FS01:0x22:Checkpoint completed at step 0
16:18:19:WU00:FS01:0x22:Completed 25000 out of 2500000 steps (1%)
16:27:44:WU00:FS01:0x22:Completed 50000 out of 2500000 steps (2%)
16:37:05:WU00:FS01:0x22:Completed 75000 out of 2500000 steps (3%)
16:46:29:WU00:FS01:0x22:Completed 100000 out of 2500000 steps (4%)
16:55:51:WU00:FS01:0x22:Completed 125000 out of 2500000 steps (5%)
16:55:53:WU00:FS01:0x22:Checkpoint completed at step 125000
17:05:16:WU00:FS01:0x22:Completed 150000 out of 2500000 steps (6%)
17:14:36:WU00:FS01:0x22:Completed 175000 out of 2500000 steps (7%)
17:22:41:WU00:FS01:0x22:An exception occurred at step 196783: Particle coordinate is nan
17:22:41:WU00:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
17:22:41:WU00:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
17:22:42:WARNING:WU00:FS01:FahCore returned: CORE_RESTART (98 = 0x62)
17:22:42:WU00:FS01:Starting
17:22:42:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 7756 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -gpu-vendor amd -gpu 0 -gpu-usage 100
17:22:42:WU00:FS01:Started FahCore on PID 3204
17:22:42:WU00:FS01:Core PID:7896
17:22:42:WU00:FS01:FahCore 0x22 started
17:22:43:WU00:FS01:0x22:*********************** Log Started 2021-11-18T17:22:42Z 17:22:43:WU00:FS01:0x22:*****************************************************************17:22:43:WU00:FS01:0x22:Project: 16484 (Run 0, Clone 213, Gen 30)
17:22:54:WU00:FS01:0x22:Attempting to create OpenCL context:
17:22:54:WU00:FS01:0x22:  Configuring platform OpenCL
17:23:02:WU00:FS01:0x22:  Using OpenCL on platformId 0 and gpu 0
17:23:02:WU00:FS01:0x22:Completed 125000 out of 2500000 steps (5%)
18:10:09:WU00:FS01:0x22:Completed 250000 out of 2500000 steps (10%)
18:10:11:WU00:FS01:0x22:Checkpoint completed at step 250000
18:19:37:WU00:FS01:0x22:Completed 275000 out of 2500000 steps (11%)
19:35:01:WU00:FS01:0x22:Completed 475000 out of 2500000 steps (19%)
19:35:05:WU00:FS01:0x22:An exception occurred at step 475142: Particle coordinate is nan
19:35:05:WU00:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
19:35:05:WU00:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
19:35:05:WARNING:WU00:FS01:FahCore returned: CORE_RESTART (98 = 0x62)
19:35:06:WU00:FS01:Starting
19:35:06:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 7756 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -gpu-vendor amd -gpu 0 -gpu-usage 100
19:35:06:WU00:FS01:Started FahCore on PID 324
19:35:06:WU00:FS01:Core PID:308
19:35:06:WU00:FS01:FahCore 0x22 started
19:35:07:WU00:FS01:0x22:*********************** Log Started 2021-11-18T19:35:06Z 
19:35:07:WU00:FS01:0x22:Project: 16484 (Run 0, Clone 213, Gen 30)

19:35:19:WU00:FS01:0x22:Attempting to create OpenCL context:
19:35:27:WU00:FS01:0x22:Completed 375000 out of 2500000 steps (15%)
20:22:27:WU00:FS01:0x22:Completed 500000 out of 2500000 steps (20%)
20:22:29:WU00:FS01:0x22:Checkpoint completed at step 500000
20:31:46:WU00:FS01:0x22:Completed 525000 out of 2500000 steps (21%)
20:40:33:WU00:FS01:0x22:An exception occurred at step 548685: Particle coordinate is nan
20:40:33:WU00:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
20:40:33:WU00:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
20:40:34:WARNING:WU00:FS01:FahCore returned: CORE_RESTART (98 = 0x62)
20:40:34:WU00:FS01:Starting
20:40:35:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 7756 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -gpu-vendor amd -gpu 0 -gpu-usage 100
20:40:35:WU00:FS01:Started FahCore on PID 8104
20:40:35:WU00:FS01:Core PID:11964
20:40:35:WU00:FS01:FahCore 0x22 started

20:40:35:WU00:FS01:0x22:Project: 16484 (Run 0, Clone 213, Gen 30)

20:40:54:WU00:FS01:0x22:Completed 500000 out of 2500000 steps (20%)

23:08:24:WU00:FS01:0x22:Completed 900000 out of 2500000 steps (36%)
23:09:00:WU00:FS01:0x22:An exception occurred at step 901591: Particle coordinate is nan
23:09:00:WU00:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
23:09:00:WU00:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
23:09:01:WARNING:WU00:FS01:FahCore returned: CORE_RESTART (98 = 0x62)
23:09:01:WU00:FS01:Starting

23:09:02:WU00:FS01:0x22:Project: 16484 (Run 0, Clone 213, Gen 30)
23:09:19:WU00:FS01:0x22:Completed 875000 out of 2500000 steps (35%)

03:30:35:WU00:FS01:0x22:Completed 1575000 out of 2500000 steps (63%)
03:37:23:WU00:FS01:0x22:An exception occurred at step 1593096: Particle coordinate is nan
03:37:23:WU00:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
03:37:23:WU00:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
03:37:24:WARNING:WU00:FS01:FahCore returned: CORE_RESTART (98 = 0x62)
03:37:24:WU00:FS01:Starting

03:37:25:WU00:FS01:0x22:Project: 16484 (Run 0, Clone 213, Gen 30)

03:37:46:WU00:FS01:0x22:Completed 1500000 out of 2500000 steps (60%)

05:46:25:WU00:FS01:0x22:Completed 1850000 out of 2500000 steps (74%)
05:50:35:WU00:FS01:0x22:An exception occurred at step 1859156: Particle coordinate is nan
05:50:35:WU00:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
05:50:35:WU00:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
05:50:35:WARNING:WU00:FS01:FahCore returned: CORE_RESTART (98 = 0x62)
05:50:35:WU00:FS01:Starting
05:50:36:WU00:FS01:0x22:Project: 16484 (Run 0, Clone 213, Gen 30)

05:50:54:WU00:FS01:0x22:Completed 1750000 out of 2500000 steps (70%)

07:13:43:WU00:FS01:0x22:Completed 1975000 out of 2500000 steps (79%)
07:22:56:WU00:FS01:0x22:An exception occurred at step 1999465: Particle coordinate is nan
07:22:56:WU00:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
07:22:56:WU00:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
07:22:57:WARNING:WU00:FS01:FahCore returned: CORE_RESTART (98 = 0x62)
07:22:57:WU00:FS01:Starting

07:22:59:WU00:FS01:0x22:Project: 16484 (Run 0, Clone 213, Gen 30)
07:23:19:WU00:FS01:0x22:Completed 1875000 out of 2500000 steps (75%)
******************************* Date: 2021-11-19 **********************
PaulTV
Posts: 179
Joined: Mon Jan 25, 2021 4:53 pm
Location: Netherlands

Re: Project: 16484 lots of NaN ?

Post by PaulTV »

Hmmm... I've had 3 jobs of that project without any issues, but on an nVidia card. I'm not an expert, but read before that those kind of issues may indicate hardware problems. One such error once every while isn't a big issue, but your logging shows that error too often. It kinda looks like this is about a laptop, correct? Are CPU and GPU temperature within reasonable limits?
Image

Ryzen 5800X / RTX 4090 / Windows 11
Ryzen 5600X / RTX 3070 Ti / Ubuntu 20.04
Ryzen 5600 / RTX 3060 Ti / Windows 11
Knish
Posts: 232
Joined: Tue Mar 17, 2020 5:20 am

Re: Project: 16484 lots of NaN ?

Post by Knish »

PC. as far as i can tell GPU-Z says it's a steady 71C, but i haven't been checking at all other than with the human skin test which didn't set off any concerns. Oh well, it uploaded successfully yay. maybe it's just one of those silicon lottery things that blessed my card
Knish
Posts: 232
Joined: Tue Mar 17, 2020 5:20 am

Re: Project: 16484 lots of NaN ?

Post by Knish »

oh, that particular machine just so happened to finally d/l the 0.0.18 core once that WU finished
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Project: 16484 lots of NaN ?

Post by Neil-B »

The researcher for p16484 has chosen to keep using 0.0.13 for that series so the client wouldn't have downloaded the new core for it ... as it stands if you see any more of that series they will probably still use the old core.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Post Reply