Page 2 of 2

Re: SOLVED: CPU stuck at Ready, waiting for FahCore Run

Posted: Thu Dec 03, 2020 9:04 pm
by Crunchtimer
JimboPalmer wrote:And thank you for telling us what worked and what didn''t. We need feedback and you always gave us feedback.
Hi, I'm back!
Everything has been running super smooth for 5 months now until recently when CPU-folding started behaving.
I'm still running the following setup:

- 1 CPU slot with 16 CPUs (threads) assigned
- 1 CPU slot with 6 CPUs (threads) assigned
- 2 GPU slots with 1 CPU (thread) each

Is F@H disliking 2*3 now?

Thanks!

Code: Select all

20:57:59:WU05:FS04:Removing old file 'work/05/logfile_01-20201203-202558.txt'
20:57:59:WU05:FS04:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8 -dir 05 -suffix 01 -version 706 -lifeline 1442 -checkpoint 15 -np 6
20:57:59:WU05:FS04:Started FahCore on PID 49270
20:57:59:WU05:FS04:Core PID:49274
20:57:59:WU05:FS04:FahCore 0xa8 started
20:57:59:WU05:FS04:0xa8:*********************** Log Started 2020-12-03T20:57:59Z ***********************
20:57:59:WU05:FS04:0xa8:************************** Gromacs Folding@home Core ***************************
20:57:59:WU05:FS04:0xa8:       Core: Gromacs
20:57:59:WU05:FS04:0xa8:       Type: 0xa8
20:57:59:WU05:FS04:0xa8:    Version: 0.0.9
20:57:59:WU05:FS04:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
20:57:59:WU05:FS04:0xa8:  Copyright: 2020 foldingathome.org
20:57:59:WU05:FS04:0xa8:   Homepage: https://foldingathome.org/
20:57:59:WU05:FS04:0xa8:       Date: Oct 28 2020
20:57:59:WU05:FS04:0xa8:       Time: 22:15:07
20:57:59:WU05:FS04:0xa8:   Compiler: GNU 8.3.0
20:57:59:WU05:FS04:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
20:57:59:WU05:FS04:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
20:57:59:WU05:FS04:0xa8:   Platform: linux2 4.15.0-108-generic
20:57:59:WU05:FS04:0xa8:       Bits: 64
20:57:59:WU05:FS04:0xa8:       Mode: Release
20:57:59:WU05:FS04:0xa8:       SIMD: avx2_256
20:57:59:WU05:FS04:0xa8:     OpenMP: ON
20:57:59:WU05:FS04:0xa8:       CUDA: OFF
20:57:59:WU05:FS04:0xa8:       Args: -dir 05 -suffix 01 -version 706 -lifeline 49270 -checkpoint 15 -np
20:57:59:WU05:FS04:0xa8:             6
20:57:59:WU05:FS04:0xa8:************************************ libFAH ************************************
20:57:59:WU05:FS04:0xa8:       Date: Oct 28 2020
20:57:59:WU05:FS04:0xa8:       Time: 22:12:00
20:57:59:WU05:FS04:0xa8:   Compiler: GNU 8.3.0
20:57:59:WU05:FS04:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
20:57:59:WU05:FS04:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
20:57:59:WU05:FS04:0xa8:   Platform: linux2 4.15.0-108-generic
20:57:59:WU05:FS04:0xa8:       Bits: 64
20:57:59:WU05:FS04:0xa8:       Mode: Release
20:57:59:WU05:FS04:0xa8:************************************ CBang *************************************
20:57:59:WU05:FS04:0xa8:       Date: Oct 28 2020
20:57:59:WU05:FS04:0xa8:       Time: 22:11:46
20:57:59:WU05:FS04:0xa8:   Compiler: GNU 8.3.0
20:57:59:WU05:FS04:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
20:57:59:WU05:FS04:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
20:57:59:WU05:FS04:0xa8:   Platform: linux2 4.15.0-108-generic
20:57:59:WU05:FS04:0xa8:       Bits: 64
20:57:59:WU05:FS04:0xa8:       Mode: Release
20:57:59:WU05:FS04:0xa8:************************************ System ************************************
20:57:59:WU05:FS04:0xa8:        CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
20:57:59:WU05:FS04:0xa8:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
20:57:59:WU05:FS04:0xa8:       CPUs: 24
20:57:59:WU05:FS04:0xa8:     Memory: 15.54GiB
20:57:59:WU05:FS04:0xa8:Free Memory: 9.35GiB
20:57:59:WU05:FS04:0xa8:    Threads: POSIX_THREADS
20:57:59:WU05:FS04:0xa8: OS Version: 5.4
20:57:59:WU05:FS04:0xa8:Has Battery: false
20:57:59:WU05:FS04:0xa8: On Battery: false
20:57:59:WU05:FS04:0xa8: UTC Offset: 1
20:57:59:WU05:FS04:0xa8:        PID: 49274
20:57:59:WU05:FS04:0xa8:        CWD: /var/lib/fahclient/work
20:57:59:WU05:FS04:0xa8:********************************************************************************
20:57:59:WU05:FS04:0xa8:Project: 16926 (Run 54, Clone 59, Gen 7)
20:57:59:WU05:FS04:0xa8:Unit: 0x0000000c8120d1cc5fbd3a70ac803505
20:57:59:WU05:FS04:0xa8:Reading tar file core.xml
20:57:59:WU05:FS04:0xa8:Reading tar file frame7.tpr
20:57:59:WU05:FS04:0xa8:Digital signatures verified
20:57:59:WU05:FS04:0xa8:Calling: mdrun -c frame7.gro -s frame7.tpr -x frame7.xtc -cpt 15 -nt 6 -ntmpi 1
20:57:59:WU05:FS04:0xa8:Steps: first=0 total=0
20:58:00:WU05:FS04:FahCore returned: INTERRUPTED (102 = 0x66)
20:58:59:WU05:FS04:Starting
20:58:59:WU05:FS04:Removing old file 'work/05/logfile_01-20201203-202658.txt'
20:58:59:WU05:FS04:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8 -dir 05 -suffix 01 -version 706 -lifeline 1442 -checkpoint 15 -np 6
20:58:59:WU05:FS04:Started FahCore on PID 49300
20:58:59:WU05:FS04:Core PID:49304
20:58:59:WU05:FS04:FahCore 0xa8 started
20:58:59:WU05:FS04:FahCore returned: INTERRUPTED (102 = 0x66)

Re: SOLVED: CPU stuck at Ready, waiting for FahCore Run

Posted: Thu Dec 03, 2020 9:30 pm
by Joe_H
Something went wrong in Project 16826 over the past weekend. Assignments of its WUs was suspended early this week, you can discard that WU. There are a couple other recent topics that covered the problem.

Which version of the F@h client are you running? There is also a known issue with the Linux versions of the client where the client does not discard a WU that gets this type of error after a few retries, supposedly fixed in the most recent version. But I have not seen any reports to confirm that or not.

Re: SOLVED: CPU stuck at Ready, waiting for FahCore Run

Posted: Sat Dec 05, 2020 7:45 am
by Crunchtimer
Joe_H wrote:Something went wrong in Project 16826 over the past weekend. Assignments of its WUs was suspended early this week, you can discard that WU. There are a couple other recent topics that covered the problem.

Which version of the F@h client are you running? There is also a known issue with the Linux versions of the client where the client does not discard a WU that gets this type of error after a few retries, supposedly fixed in the most recent version. But I have not seen any reports to confirm that or not.
I'm currently having problem with project 16926 and I'm running 7.6.21 on Ubuntu, so maybe that's the problem then?

Thanks!