core22 0.0.10 released to full FAH!

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: core22 0.0.10 released to full FAH!

Post by Nuitari »

Pausing folding and resuming it also seem to cause forces to blow up, lost all progress on 6 WUs that way...
Image
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: core22 0.0.10 released to full FAH!

Post by JohnChodera »

Thanks for the reports about standby/hibernate, all---we're working on improving that behavior in upcoming core22 builds.

@Nuitari: Are you saying that pausing and resuming *without* hibernating is causing forces to *immediately* blow up in 0.0.10? Does this happen reliably on all projects, or only certain projects, and does it happen at the beginning as well?
If you could post a few logs or list some PROJ,RUN,CLONE,GEN tuples that this happens with, that would really help us!

We've solved some other problems with compressed WUs, so expect 0.0.11 to go into testing in the next day or two. We're working on addressing these other issues as well as quickly as we can.

~ John Chodera // MSKCC
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: core22 0.0.10 released to full FAH!

Post by JohnChodera »

One other quick note: 13414-5 will be complete in the next couple of days, and the next generation of these will make improvements that aim to mitigate some of the reported issues with them.
We're hoping to get a progress bar posted so you can see how much work is left in each sprint for the COVID Moonshot, and hope to hit a steady pace of a new sprint every ~2 weeks featuring a new pair of projects full of compounds to prioritize.

~ John Chodera // MSKCC
JimF
Posts: 652
Joined: Thu Jan 21, 2010 2:03 pm

Re: core22 0.0.10 released to full FAH!

Post by JimF »

Thank you. For those of us who have been around here for a while, that is remarkable progress.
TPL
Posts: 104
Joined: Sun Apr 19, 2020 11:37 am

Re: core22 0.0.10 released to full FAH!

Post by TPL »

For these very slow WUs, have anyone else tested stopping CPU slot? I'm having one now which had TPF 1.5h and very low and alternating GPU load but 100% CPU load. When I stopped CPU slot it went to 60% GPU load, 5-25% CPU load jumping up and down but TPF dropped to 5min 10-30sec. This one is Project: 13415 (Run 3483, Clone 8, Gen 1)
BobWilliams757
Posts: 493
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: core22 0.0.10 released to full FAH!

Post by BobWilliams757 »

TPL wrote:For these very slow WUs, have anyone else tested stopping CPU slot? I'm having one now which had TPF 1.5h and very low and alternating GPU load but 100% CPU load. When I stopped CPU slot it went to 60% GPU load, 5-25% CPU load jumping up and down but TPF dropped to 5min 10-30sec. This one is Project: 13415 (Run 3483, Clone 8, Gen 1)
Interesting find. I was wondering about CPU resources myself, since even with a slow onboard GPU my CPU use can be 10% or so, For those of you (most people) with much more powerful GPU's, it stands to reason that the GPU needs more CPU power to feed the beast. This would especially impact those with dated or lower powered CPU's, those running more PCI lanes with multiple GPU cards, folding on both GPU and CPU, and quite a few other variables I'm sure.
Fold them if you get them!
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: core22 0.0.10 released to full FAH!

Post by foldy »

One CPU thread should be enough to feed one GPU. Maybe keep one additional CPU thread free for Operating System. The rest of CPU threads can run folding too.
ajm
Posts: 754
Joined: Sat Mar 21, 2020 5:22 am
Location: Lucerne, Switzerland

Re: core22 0.0.10 released to full FAH!

Post by ajm »

But those 1341x are experimental in some ways. Maybe those of them that are really slow can benefit from more CPU work? It would be worth a few systematic tests.
TPL
Posts: 104
Joined: Sun Apr 19, 2020 11:37 am

Re: core22 0.0.10 released to full FAH!

Post by TPL »

Only with 13415. Other "normal" WUs are loading GPU full. 13415 have been always lower but then there are some runs that are really low. I tested also with 2/4 threads and GPU load was about 10% less than "normal" with this WU but slowed down the more the more it go on. Then I stopped CPU folding completely and TPF returned close to usual for 13415. That for my GPU has been around 80% at max and this is very slow GPU.

But still, those "normal" WUs are too large for it to complete before timeout.

That load jumping up and down is strange. If I'm using 3 threads for CPU folding, then GPU load will be jumping. If not, GPU load is pretty stable but CPU load is jumping.

Maybe this is something useful to know for John.
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: core22 0.0.10 released to full FAH!

Post by Neil-B »

Actually for my slow GPUs 13415s are really nice (2x) points but yes there are slower ones spaced even with those iirc - I'll now have to go and check :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
HaloJones
Posts: 920
Joined: Thu Jul 24, 2008 10:16 am

Re: core22 0.0.10 released to full FAH!

Post by HaloJones »

I don't run the cpus on my gpu rigs and can say that slow units are slow units even if there are 3 cpu threads available to feed them.
single 1070

Image
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: core22 0.0.10 released to full FAH!

Post by Nuitari »

JohnChodera wrote: @Nuitari: Are you saying that pausing and resuming *without* hibernating is causing forces to *immediately* blow up in 0.0.10? Does this happen reliably on all projects, or only certain projects, and does it happen at the beginning as well?
If you could post a few logs or list some PROJ,RUN,CLONE,GEN tuples that this happens with, that would really help us!
Yes without hibernating. I had to restart the FAHClient (it was nagging an unresponsive server without getting reassigned for one of the slots).
This is all on project 13415, slots with other projects resumed without issues.
That rig has 7 GPU slots, 5x rx570 (All on "USB" PCIe 1x), 1x rx560 (Direct PCIe 16x), 1x built-in apu (AMD Carrizo)

I didn't fully parse the logs before (it is time consuming after all), but its not a guarantee that it will fail. Some units did proceed to get processed further.
I'm attaching the splitted out logs for all FS for the WUs when they resumed.

Startup Situation:
FS00: Project: 16907 (Run 2, Clone 17, Gen 40) (Core 0x21) -> This one goes on with no problem

FS02: Project: 13415 (Run 3563, Clone 11, Gen 1)

Code: Select all

18:34:25:WU04:FS02:0x22:*********************** Log Started 2020-06-26T18:34:24Z ***********************
18:34:25:WU04:FS02:0x22:*************************** Core22 Folding@home Core ***************************
18:34:25:WU04:FS02:0x22:       Core: Core22
18:34:25:WU04:FS02:0x22:       Type: 0x22
18:34:25:WU04:FS02:0x22:    Version: 0.0.10
18:34:25:WU04:FS02:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:34:25:WU04:FS02:0x22:  Copyright: 2020 foldingathome.org
18:34:25:WU04:FS02:0x22:   Homepage: https://foldingathome.org/
18:34:25:WU04:FS02:0x22:       Date: Jun 16 2020
18:34:25:WU04:FS02:0x22:       Time: 15:55:31
18:34:25:WU04:FS02:0x22:   Revision: 147051aad40bcbec7d4b25105bbedfab425f1dc2
18:34:25:WU04:FS02:0x22:     Branch: core22-0.0.10
18:34:25:WU04:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU04:FS02:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU04:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU04:FS02:0x22:       Bits: 64
18:34:25:WU04:FS02:0x22:       Mode: Release
18:34:25:WU04:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
18:34:25:WU04:FS02:0x22:             <peastman@stanford.edu>
18:34:25:WU04:FS02:0x22:       Args: -dir 04 -suffix 01 -version 706 -lifeline 6532 -checkpoint 15
18:34:25:WU04:FS02:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 6 -gpu 6
18:34:25:WU04:FS02:0x22:************************************ libFAH ************************************
18:34:25:WU04:FS02:0x22:       Date: Jun 2 2020
18:34:25:WU04:FS02:0x22:       Time: 00:07:31
18:34:25:WU04:FS02:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
18:34:25:WU04:FS02:0x22:     Branch: HEAD
18:34:25:WU04:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU04:FS02:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU04:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU04:FS02:0x22:       Bits: 64
18:34:25:WU04:FS02:0x22:       Mode: Release
18:34:25:WU04:FS02:0x22:************************************ CBang *************************************
18:34:25:WU04:FS02:0x22:       Date: May 31 2020
18:34:25:WU04:FS02:0x22:       Time: 20:16:34
18:34:25:WU04:FS02:0x22:   Revision: 75fcee0b8e713cb47f5191a3689d5f4f07244c7f
18:34:25:WU04:FS02:0x22:     Branch: HEAD
18:34:25:WU04:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU04:FS02:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU04:FS02:0x22:             -fPIC
18:34:25:WU04:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU04:FS02:0x22:       Bits: 64
18:34:25:WU04:FS02:0x22:       Mode: Release
18:34:25:WU04:FS02:0x22:************************************ System ************************************
18:34:25:WU04:FS02:0x22:        CPU: AMD A8-9600 RADEON R7, 10 COMPUTE CORES 4C+6G
18:34:25:WU04:FS02:0x22:     CPU ID: AuthenticAMD Family 21 Model 101 Stepping 1
18:34:25:WU04:FS02:0x22:       CPUs: 4
18:34:25:WU04:FS02:0x22:     Memory: 14.59GiB
18:34:25:WU04:FS02:0x22:Free Memory: 13.16GiB
18:34:25:WU04:FS02:0x22:    Threads: POSIX_THREADS
18:34:25:WU04:FS02:0x22: OS Version: 5.6
18:34:25:WU04:FS02:0x22:Has Battery: false
18:34:25:WU04:FS02:0x22: On Battery: false
18:34:25:WU04:FS02:0x22: UTC Offset: -4
18:34:25:WU04:FS02:0x22:        PID: 6538
18:34:25:WU04:FS02:0x22:        CWD: /root/fahclient_saruman/work
18:34:25:WU04:FS02:0x22:********************************************************************************
18:34:25:WU04:FS02:0x22:Project: 13415 (Run 3563, Clone 11, Gen 1)
18:34:25:WU04:FS02:0x22:Unit: 0x0000000312bc7d9a5ef50d25f2c7ba97
18:34:25:WU04:FS02:0x22:Digital signatures verified
18:34:25:WU04:FS02:0x22:Folding@home GPU Core22 Folding@home Core
18:34:25:WU04:FS02:0x22:Version 0.0.10
18:34:25:WU04:FS02:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:34:25:WU04:FS02:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:34:25:WU04:FS02:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:34:25:WU04:FS02:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
18:34:41:WU04:FS02:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
18:34:41:WU04:FS02:0x22:Saving result file ../logfile_01.txt
18:34:41:WU04:FS02:0x22:Saving result file science.log
18:34:41:WU04:FS02:0x22:Saving result file state.xml
FS02: that slots get assigned another unit, Project: 13415 (Run 4543, Clone 12, Gen 0), which it proceeds to complete

FS03: Project: 13415 (Run 4439, Clone 10, Gen 0)

Code: Select all

18:34:25:WU16:FS03:0x22:*********************** Log Started 2020-06-26T18:34:24Z ***********************
18:34:25:WU16:FS03:0x22:*************************** Core22 Folding@home Core ***************************
18:34:25:WU16:FS03:0x22:       Core: Core22
18:34:25:WU16:FS03:0x22:       Type: 0x22
18:34:25:WU16:FS03:0x22:    Version: 0.0.10
18:34:25:WU16:FS03:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:34:25:WU16:FS03:0x22:  Copyright: 2020 foldingathome.org
18:34:25:WU16:FS03:0x22:   Homepage: https://foldingathome.org/
18:34:25:WU16:FS03:0x22:       Date: Jun 16 2020
18:34:25:WU16:FS03:0x22:       Time: 15:55:31
18:34:25:WU16:FS03:0x22:   Revision: 147051aad40bcbec7d4b25105bbedfab425f1dc2
18:34:25:WU16:FS03:0x22:     Branch: core22-0.0.10
18:34:25:WU16:FS03:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU16:FS03:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU16:FS03:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU16:FS03:0x22:       Bits: 64
18:34:25:WU16:FS03:0x22:       Mode: Release
18:34:25:WU16:FS03:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
18:34:25:WU16:FS03:0x22:             <peastman@stanford.edu>
18:34:25:WU16:FS03:0x22:       Args: -dir 16 -suffix 01 -version 706 -lifeline 6520 -checkpoint 15
18:34:25:WU16:FS03:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 1 -gpu 1
18:34:25:WU16:FS03:0x22:************************************ libFAH ************************************
18:34:25:WU16:FS03:0x22:       Date: Jun 2 2020
18:34:25:WU16:FS03:0x22:       Time: 00:07:31
18:34:25:WU16:FS03:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
18:34:25:WU16:FS03:0x22:     Branch: HEAD
18:34:25:WU16:FS03:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU16:FS03:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU16:FS03:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU16:FS03:0x22:       Bits: 64
18:34:25:WU16:FS03:0x22:       Mode: Release
18:34:25:WU16:FS03:0x22:************************************ CBang *************************************
18:34:25:WU16:FS03:0x22:       Date: May 31 2020
18:34:25:WU16:FS03:0x22:       Time: 20:16:34
18:34:25:WU16:FS03:0x22:   Revision: 75fcee0b8e713cb47f5191a3689d5f4f07244c7f
18:34:25:WU16:FS03:0x22:     Branch: HEAD
18:34:25:WU16:FS03:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU16:FS03:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU16:FS03:0x22:             -fPIC
18:34:25:WU16:FS03:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU16:FS03:0x22:       Bits: 64
18:34:25:WU16:FS03:0x22:       Mode: Release
18:34:25:WU16:FS03:0x22:************************************ System ************************************
18:34:25:WU16:FS03:0x22:        CPU: AMD A8-9600 RADEON R7, 10 COMPUTE CORES 4C+6G
18:34:25:WU16:FS03:0x22:     CPU ID: AuthenticAMD Family 21 Model 101 Stepping 1
18:34:25:WU16:FS03:0x22:       CPUs: 4
18:34:25:WU16:FS03:0x22:     Memory: 14.59GiB
18:34:25:WU16:FS03:0x22:Free Memory: 13.18GiB
18:34:25:WU16:FS03:0x22:    Threads: POSIX_THREADS
18:34:25:WU16:FS03:0x22: OS Version: 5.6
18:34:25:WU16:FS03:0x22:Has Battery: false
18:34:25:WU16:FS03:0x22: On Battery: false
18:34:25:WU16:FS03:0x22: UTC Offset: -4
18:34:25:WU16:FS03:0x22:        PID: 6524
18:34:25:WU16:FS03:0x22:        CWD: /root/fahclient_saruman/work
18:34:25:WU16:FS03:0x22:********************************************************************************
18:34:25:WU16:FS03:0x22:Project: 13415 (Run 4439, Clone 10, Gen 0)
18:34:25:WU16:FS03:0x22:Unit: 0x0000000112bc7d9a5ef50d06e8b200ee
18:34:25:WU16:FS03:0x22:Digital signatures verified
18:34:25:WU16:FS03:0x22:Folding@home GPU Core22 Folding@home Core
18:34:25:WU16:FS03:0x22:Version 0.0.10
18:34:25:WU16:FS03:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:34:25:WU16:FS03:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:34:25:WU16:FS03:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:34:25:WU16:FS03:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
18:34:41:WU16:FS03:0x22:Completed 750000 out of 1000000 steps (75%)
18:37:04:WU16:FS03:0x22:Completed 760000 out of 1000000 steps (76%)
18:39:19:WU16:FS03:0x22:Completed 770000 out of 1000000 steps (77%)
18:41:33:WU16:FS03:0x22:Completed 780000 out of 1000000 steps (78%)
18:43:48:WU16:FS03:0x22:Completed 790000 out of 1000000 steps (79%)
18:55:34:WU16:FS03:0x22:Watchdog triggered, requesting soft shutdown down
19:05:34:WU16:FS03:0x22:Watchdog shutdown failed, hard shutdown triggered
19:05:34:WARNING:WU16:FS03:FahCore returned: WU_STALLED (127 = 0x7f)
19:05:35:WU16:FS03:Starting
--- SNIP the startup of the core ---
19:05:35:WU16:FS03:0x22:Project: 13415 (Run 4439, Clone 10, Gen 0)
19:05:35:WU16:FS03:0x22:Unit: 0x0000000112bc7d9a5ef50d06e8b200ee
19:05:35:WU16:FS03:0x22:Digital signatures verified
19:05:35:WU16:FS03:0x22:Folding@home GPU Core22 Folding@home Core
19:05:35:WU16:FS03:0x22:Version 0.0.10
19:05:35:WU16:FS03:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
19:05:35:WU16:FS03:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
19:05:35:WU16:FS03:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
19:05:35:WU16:FS03:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
19:05:53:WU16:FS03:0x22:ERROR:Discrepancy: Forces are blowing up! 3196 2
19:05:53:WU16:FS03:0x22:Saving result file ../logfile_01.txt
19:05:53:WU16:FS03:0x22:Saving result file science.log
19:05:53:WU16:FS03:0x22:Saving result file state.xml
19:05:53:WU16:FS03:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

At that point a new WU is sent, which promptly fails:

19:05:55:WU05:FS03:0x22:Project: 13415 (Run 3825, Clone 12, Gen 1)
19:05:55:WU05:FS03:0x22:Unit: 0x0000000112bc7d9a5ef50d1b1bcf99b5
19:05:55:WU05:FS03:0x22:Reading tar file core.xml
19:05:55:WU05:FS03:0x22:Reading tar file integrator.xml
19:05:55:WU05:FS03:0x22:Reading tar file state.xml
19:05:55:WU05:FS03:0x22:Reading tar file system.xml
19:05:55:WU05:FS03:0x22:Digital signatures verified
19:05:55:WU05:FS03:0x22:Folding@home GPU Core22 Folding@home Core
19:05:55:WU05:FS03:0x22:Version 0.0.10
19:05:55:WU05:FS03:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
19:05:55:WU05:FS03:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
19:05:55:WU05:FS03:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
19:05:55:WU05:FS03:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
19:06:13:WU05:FS03:0x22:ERROR:Discrepancy: Forces are blowing up! 12 0
19:06:13:WU05:FS03:0x22:Saving result file ../logfile_01.txt
19:06:14:WU05:FS03:0x22:Saving result file science.log
19:06:14:WU05:FS03:0x22:Saving result file state.xml
19:06:14:WU05:FS03:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
It then receives:
19:06:15:WU08:FS03:0x22:Project: 13415 (Run 3942, Clone 3, Gen 1)
That one gets processed succesfully.

FS04: Project: 13415 (Run 3423, Clone 10, Gen 1)

Code: Select all

18:34:25:WU12:FS04:0x22:*********************** Log Started 2020-06-26T18:34:24Z ***********************
18:34:25:WU12:FS04:0x22:*************************** Core22 Folding@home Core ***************************
18:34:25:WU12:FS04:0x22:       Core: Core22
18:34:25:WU12:FS04:0x22:       Type: 0x22
18:34:25:WU12:FS04:0x22:    Version: 0.0.10
18:34:25:WU12:FS04:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:34:25:WU12:FS04:0x22:  Copyright: 2020 foldingathome.org
18:34:25:WU12:FS04:0x22:   Homepage: https://foldingathome.org/
18:34:25:WU12:FS04:0x22:       Date: Jun 16 2020
18:34:25:WU12:FS04:0x22:       Time: 15:55:31
18:34:25:WU12:FS04:0x22:   Revision: 147051aad40bcbec7d4b25105bbedfab425f1dc2
18:34:25:WU12:FS04:0x22:     Branch: core22-0.0.10
18:34:25:WU12:FS04:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU12:FS04:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU12:FS04:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU12:FS04:0x22:       Bits: 64
18:34:25:WU12:FS04:0x22:       Mode: Release
18:34:25:WU12:FS04:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
18:34:25:WU12:FS04:0x22:             <peastman@stanford.edu>
18:34:25:WU12:FS04:0x22:       Args: -dir 12 -suffix 01 -version 706 -lifeline 6525 -checkpoint 15
18:34:25:WU12:FS04:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 2 -gpu 2
18:34:25:WU12:FS04:0x22:************************************ libFAH ************************************
18:34:25:WU12:FS04:0x22:       Date: Jun 2 2020
18:34:25:WU12:FS04:0x22:       Time: 00:07:31
18:34:25:WU12:FS04:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
18:34:25:WU12:FS04:0x22:     Branch: HEAD
18:34:25:WU12:FS04:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU12:FS04:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU12:FS04:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU12:FS04:0x22:       Bits: 64
18:34:25:WU12:FS04:0x22:       Mode: Release
18:34:25:WU12:FS04:0x22:************************************ CBang *************************************
18:34:25:WU12:FS04:0x22:       Date: May 31 2020
18:34:25:WU12:FS04:0x22:       Time: 20:16:34
18:34:25:WU12:FS04:0x22:   Revision: 75fcee0b8e713cb47f5191a3689d5f4f07244c7f
18:34:25:WU12:FS04:0x22:     Branch: HEAD
18:34:25:WU12:FS04:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU12:FS04:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU12:FS04:0x22:             -fPIC
18:34:25:WU12:FS04:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU12:FS04:0x22:       Bits: 64
18:34:25:WU12:FS04:0x22:       Mode: Release
18:34:25:WU12:FS04:0x22:************************************ System ************************************
18:34:25:WU12:FS04:0x22:        CPU: AMD A8-9600 RADEON R7, 10 COMPUTE CORES 4C+6G
18:34:25:WU12:FS04:0x22:     CPU ID: AuthenticAMD Family 21 Model 101 Stepping 1
18:34:25:WU12:FS04:0x22:       CPUs: 4
18:34:25:WU12:FS04:0x22:     Memory: 14.59GiB
18:34:25:WU12:FS04:0x22:Free Memory: 13.17GiB
18:34:25:WU12:FS04:0x22:    Threads: POSIX_THREADS
18:34:25:WU12:FS04:0x22: OS Version: 5.6
18:34:25:WU12:FS04:0x22:Has Battery: false
18:34:25:WU12:FS04:0x22: On Battery: false
18:34:25:WU12:FS04:0x22: UTC Offset: -4
18:34:25:WU12:FS04:0x22:        PID: 6531
18:34:25:WU12:FS04:0x22:        CWD: /root/fahclient_saruman/work
18:34:25:WU12:FS04:0x22:********************************************************************************
18:34:25:WU12:FS04:0x22:Project: 13415 (Run 3423, Clone 10, Gen 1)
18:34:25:WU12:FS04:0x22:Unit: 0x0000000212bc7d9a5ef50d2cb5d52d3f
18:34:25:WU12:FS04:0x22:Digital signatures verified
18:34:25:WU12:FS04:0x22:Folding@home GPU Core22 Folding@home Core
18:34:25:WU12:FS04:0x22:Version 0.0.10
18:34:25:WU12:FS04:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:34:25:WU12:FS04:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:34:25:WU12:FS04:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:34:25:WU12:FS04:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
18:34:41:WU12:FS04:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
18:34:41:WU12:FS04:0x22:Saving result file ../logfile_01.txt
18:34:41:WU12:FS04:0x22:Saving result file science.log
18:34:41:WU12:FS04:0x22:Saving result file state.xml
18:34:41:WU12:FS04:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

18:34:45:WU06:FS04:0x22:Project: 13415 (Run 4544, Clone 12, Gen 0)
18:34:45:WU06:FS04:0x22:Unit: 0x0000000012bc7d9a5ef50d017d45764c
18:34:45:WU06:FS04:0x22:Reading tar file core.xml
18:34:45:WU06:FS04:0x22:Reading tar file integrator.xml
18:34:45:WU06:FS04:0x22:Reading tar file state.xml
18:34:45:WU06:FS04:0x22:Reading tar file system.xml
18:34:45:WU06:FS04:0x22:Digital signatures verified
18:34:45:WU06:FS04:0x22:Folding@home GPU Core22 Folding@home Core
18:34:45:WU06:FS04:0x22:Version 0.0.10
18:34:45:WU06:FS04:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:34:45:WU06:FS04:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:34:45:WU06:FS04:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:34:45:WU06:FS04:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
18:35:01:WU06:FS04:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
18:35:01:WU06:FS04:0x22:Saving result file ../logfile_01.txt
18:35:01:WU06:FS04:0x22:Saving result file science.log
18:35:01:WU06:FS04:0x22:Saving result file state.xml
18:35:01:WU06:FS04:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

18:35:03:WU07:FS04:0x22:Project: 13415 (Run 3272, Clone 12, Gen 1)
18:35:03:WU07:FS04:0x22:Unit: 0x0000000212bc7d9a5ef50d33ecb1bcf7
18:35:03:WU07:FS04:0x22:Reading tar file core.xml
18:35:03:WU07:FS04:0x22:Reading tar file integrator.xml
18:35:03:WU07:FS04:0x22:Reading tar file state.xml
18:35:03:WU07:FS04:0x22:Reading tar file system.xml
18:35:03:WU07:FS04:0x22:Digital signatures verified
18:35:03:WU07:FS04:0x22:Folding@home GPU Core22 Folding@home Core
18:35:03:WU07:FS04:0x22:Version 0.0.10
18:35:03:WU07:FS04:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:35:03:WU07:FS04:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:35:03:WU07:FS04:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:35:03:WU07:FS04:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
18:35:20:WU07:FS04:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
18:35:20:WU07:FS04:0x22:Saving result file ../logfile_01.txt
18:35:20:WU07:FS04:0x22:Saving result file science.log
18:35:20:WU07:FS04:0x22:Saving result file state.xml
18:35:20:WU07:FS04:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
It then receives
18:35:22:WU06:FS04:0x22:Project: 13415 (Run 3127, Clone 8, Gen 1)
At that point it processes succesfully.

FS05: Nothing assigned to start with

Code: Select all

18:34:27:WU00:FS05:Assigned to work server 18.188.125.154
18:34:27:WU00:FS05:Requesting new work unit for slot 05: READY gpu:3:Ellesmere XT [Radeon RX 470/480/570/580/590] from 18.188.125.154
18:34:27:WU00:FS05:Connecting to 18.188.125.154:8080
18:34:28:WU00:FS05:Downloading 438.99KiB
18:34:29:WU00:FS05:Download complete
18:34:29:WU00:FS05:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13415 run:3864 clone:12 gen:1 core:0x22 unit:0x0000000112bc7d9a5ef50d1aa0139002
18:34:29:WU00:FS05:Starting
18:34:29:WU00:FS05:Running FahCore: /usr/bin/FAHCoreWrapper /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 6490 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 3 -gpu 3
18:34:29:WU00:FS05:Started FahCore on PID 6569
18:34:29:WU00:FS05:Core PID:6573
18:34:29:WU00:FS05:FahCore 0x22 started
18:34:30:WU00:FS05:0x22:*********************** Log Started 2020-06-26T18:34:29Z ***********************
18:34:30:WU00:FS05:0x22:*************************** Core22 Folding@home Core ***************************
18:34:30:WU00:FS05:0x22:       Core: Core22
18:34:30:WU00:FS05:0x22:       Type: 0x22
18:34:30:WU00:FS05:0x22:    Version: 0.0.10
18:34:30:WU00:FS05:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:34:30:WU00:FS05:0x22:  Copyright: 2020 foldingathome.org
18:34:30:WU00:FS05:0x22:   Homepage: https://foldingathome.org/
18:34:30:WU00:FS05:0x22:       Date: Jun 16 2020
18:34:30:WU00:FS05:0x22:       Time: 15:55:31
18:34:30:WU00:FS05:0x22:   Revision: 147051aad40bcbec7d4b25105bbedfab425f1dc2
18:34:30:WU00:FS05:0x22:     Branch: core22-0.0.10
18:34:30:WU00:FS05:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:30:WU00:FS05:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:30:WU00:FS05:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:30:WU00:FS05:0x22:       Bits: 64
18:34:30:WU00:FS05:0x22:       Mode: Release
18:34:30:WU00:FS05:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
18:34:30:WU00:FS05:0x22:             <peastman@stanford.edu>
18:34:30:WU00:FS05:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 6569 -checkpoint 15
18:34:30:WU00:FS05:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 3 -gpu 3
18:34:30:WU00:FS05:0x22:************************************ libFAH ************************************
18:34:30:WU00:FS05:0x22:       Date: Jun 2 2020
18:34:30:WU00:FS05:0x22:       Time: 00:07:31
18:34:30:WU00:FS05:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
18:34:30:WU00:FS05:0x22:     Branch: HEAD
18:34:30:WU00:FS05:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:30:WU00:FS05:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:30:WU00:FS05:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:30:WU00:FS05:0x22:       Bits: 64
18:34:30:WU00:FS05:0x22:       Mode: Release
18:34:30:WU00:FS05:0x22:************************************ CBang *************************************
18:34:30:WU00:FS05:0x22:       Date: May 31 2020
18:34:30:WU00:FS05:0x22:       Time: 20:16:34
18:34:30:WU00:FS05:0x22:   Revision: 75fcee0b8e713cb47f5191a3689d5f4f07244c7f
18:34:30:WU00:FS05:0x22:     Branch: HEAD
18:34:30:WU00:FS05:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:30:WU00:FS05:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:30:WU00:FS05:0x22:             -fPIC
18:34:30:WU00:FS05:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:30:WU00:FS05:0x22:       Bits: 64
18:34:30:WU00:FS05:0x22:       Mode: Release
18:34:30:WU00:FS05:0x22:************************************ System ************************************
18:34:30:WU00:FS05:0x22:        CPU: AMD A8-9600 RADEON R7, 10 COMPUTE CORES 4C+6G
18:34:30:WU00:FS05:0x22:     CPU ID: AuthenticAMD Family 21 Model 101 Stepping 1
18:34:30:WU00:FS05:0x22:       CPUs: 4
18:34:30:WU00:FS05:0x22:     Memory: 14.59GiB
18:34:30:WU00:FS05:0x22:Free Memory: 12.64GiB
18:34:30:WU00:FS05:0x22:    Threads: POSIX_THREADS
18:34:30:WU00:FS05:0x22: OS Version: 5.6
18:34:30:WU00:FS05:0x22:Has Battery: false
18:34:30:WU00:FS05:0x22: On Battery: false
18:34:30:WU00:FS05:0x22: UTC Offset: -4
18:34:30:WU00:FS05:0x22:        PID: 6573
18:34:30:WU00:FS05:0x22:        CWD: /root/fahclient_saruman/work
18:34:30:WU00:FS05:0x22:********************************************************************************
18:34:30:WU00:FS05:0x22:Project: 13415 (Run 3864, Clone 12, Gen 1)
18:34:30:WU00:FS05:0x22:Unit: 0x0000000112bc7d9a5ef50d1aa0139002
18:34:30:WU00:FS05:0x22:Reading tar file core.xml
18:34:30:WU00:FS05:0x22:Reading tar file integrator.xml
18:34:30:WU00:FS05:0x22:Reading tar file state.xml
18:34:30:WU00:FS05:0x22:Reading tar file system.xml
18:34:30:WU00:FS05:0x22:Digital signatures verified
18:34:30:WU00:FS05:0x22:Folding@home GPU Core22 Folding@home Core
18:34:30:WU00:FS05:0x22:Version 0.0.10
18:34:30:WU00:FS05:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:34:30:WU00:FS05:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:34:30:WU00:FS05:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:34:30:WU00:FS05:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
22:27:56:WU00:FS05:0x22:Folding@home Core Shutdown: FINISHED_UNIT

FS06: Project: 16435 (Run 3033, Clone 4, Gen 8)

Code: Select all

18:34:25:WU02:FS06:0x22:*********************** Log Started 2020-06-26T18:34:24Z ***********************
18:34:25:WU02:FS06:0x22:*************************** Core22 Folding@home Core ***************************
18:34:25:WU02:FS06:0x22:       Core: Core22
18:34:25:WU02:FS06:0x22:       Type: 0x22
18:34:25:WU02:FS06:0x22:    Version: 0.0.10
18:34:25:WU02:FS06:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:34:25:WU02:FS06:0x22:  Copyright: 2020 foldingathome.org
18:34:25:WU02:FS06:0x22:   Homepage: https://foldingathome.org/
18:34:25:WU02:FS06:0x22:       Date: Jun 16 2020
18:34:25:WU02:FS06:0x22:       Time: 15:55:31
18:34:25:WU02:FS06:0x22:   Revision: 147051aad40bcbec7d4b25105bbedfab425f1dc2
18:34:25:WU02:FS06:0x22:     Branch: core22-0.0.10
18:34:25:WU02:FS06:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU02:FS06:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU02:FS06:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU02:FS06:0x22:       Bits: 64
18:34:25:WU02:FS06:0x22:       Mode: Release
18:34:25:WU02:FS06:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
18:34:25:WU02:FS06:0x22:             <peastman@stanford.edu>
18:34:25:WU02:FS06:0x22:       Args: -dir 02 -suffix 01 -version 706 -lifeline 6511 -checkpoint 15
18:34:25:WU02:FS06:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 4 -gpu 4
18:34:25:WU02:FS06:0x22:************************************ libFAH ************************************
18:34:25:WU02:FS06:0x22:       Date: Jun 2 2020
18:34:25:WU02:FS06:0x22:       Time: 00:07:31
18:34:25:WU02:FS06:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
18:34:25:WU02:FS06:0x22:     Branch: HEAD
18:34:25:WU02:FS06:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU02:FS06:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU02:FS06:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU02:FS06:0x22:       Bits: 64
18:34:25:WU02:FS06:0x22:       Mode: Release
18:34:25:WU02:FS06:0x22:************************************ CBang *************************************
18:34:25:WU02:FS06:0x22:       Date: May 31 2020
18:34:25:WU02:FS06:0x22:       Time: 20:16:34
18:34:25:WU02:FS06:0x22:   Revision: 75fcee0b8e713cb47f5191a3689d5f4f07244c7f
18:34:25:WU02:FS06:0x22:     Branch: HEAD
18:34:25:WU02:FS06:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU02:FS06:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU02:FS06:0x22:             -fPIC
18:34:25:WU02:FS06:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU02:FS06:0x22:       Bits: 64
18:34:25:WU02:FS06:0x22:       Mode: Release
18:34:25:WU02:FS06:0x22:************************************ System ************************************
18:34:25:WU02:FS06:0x22:        CPU: AMD A8-9600 RADEON R7, 10 COMPUTE CORES 4C+6G
18:34:25:WU02:FS06:0x22:     CPU ID: AuthenticAMD Family 21 Model 101 Stepping 1
18:34:25:WU02:FS06:0x22:       CPUs: 4
18:34:25:WU02:FS06:0x22:     Memory: 14.59GiB
18:34:25:WU02:FS06:0x22:Free Memory: 13.20GiB
18:34:25:WU02:FS06:0x22:    Threads: POSIX_THREADS
18:34:25:WU02:FS06:0x22: OS Version: 5.6
18:34:25:WU02:FS06:0x22:Has Battery: false
18:34:25:WU02:FS06:0x22: On Battery: false
18:34:25:WU02:FS06:0x22: UTC Offset: -4
18:34:25:WU02:FS06:0x22:        PID: 6517
18:34:25:WU02:FS06:0x22:        CWD: /root/fahclient_saruman/work
18:34:25:WU02:FS06:0x22:********************************************************************************
18:34:25:WU02:FS06:0x22:Project: 16435 (Run 3033, Clone 4, Gen 8)
18:34:25:WU02:FS06:0x22:Unit: 0x0000001703854c135e9a4ef747f4d0a8
18:34:25:WU02:FS06:0x22:Digital signatures verified
18:34:25:WU02:FS06:0x22:Folding@home GPU Core22 Folding@home Core
18:34:25:WU02:FS06:0x22:Version 0.0.10
18:34:25:WU02:FS06:0x22:  Checkpoint write interval: 250000 steps (5%) [20 total]
18:34:25:WU02:FS06:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
18:34:25:WU02:FS06:0x22:  XTC frame write interval: 10000 steps (0.2%) [500 total]
18:34:25:WU02:FS06:0x22:  Global context and integrator variables write interval: disabled
Completes!

FS07: Project: 13415 (Run 4168, Clone 10, Gen 0)

Code: Select all

18:34:25:WU01:FS07:0x22:*********************** Log Started 2020-06-26T18:34:24Z ***********************
18:34:25:WU01:FS07:0x22:*************************** Core22 Folding@home Core ***************************
18:34:25:WU01:FS07:0x22:       Core: Core22
18:34:25:WU01:FS07:0x22:       Type: 0x22
18:34:25:WU01:FS07:0x22:    Version: 0.0.10
18:34:25:WU01:FS07:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:34:25:WU01:FS07:0x22:  Copyright: 2020 foldingathome.org
18:34:25:WU01:FS07:0x22:   Homepage: https://foldingathome.org/
18:34:25:WU01:FS07:0x22:       Date: Jun 16 2020
18:34:25:WU01:FS07:0x22:       Time: 15:55:31
18:34:25:WU01:FS07:0x22:   Revision: 147051aad40bcbec7d4b25105bbedfab425f1dc2
18:34:25:WU01:FS07:0x22:     Branch: core22-0.0.10
18:34:25:WU01:FS07:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU01:FS07:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU01:FS07:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU01:FS07:0x22:       Bits: 64
18:34:25:WU01:FS07:0x22:       Mode: Release
18:34:25:WU01:FS07:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
18:34:25:WU01:FS07:0x22:             <peastman@stanford.edu>
18:34:25:WU01:FS07:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 6505 -checkpoint 15
18:34:25:WU01:FS07:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 5 -gpu 5
18:34:25:WU01:FS07:0x22:************************************ libFAH ************************************
18:34:25:WU01:FS07:0x22:       Date: Jun 2 2020
18:34:25:WU01:FS07:0x22:       Time: 00:07:31
18:34:25:WU01:FS07:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
18:34:25:WU01:FS07:0x22:     Branch: HEAD
18:34:25:WU01:FS07:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU01:FS07:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU01:FS07:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU01:FS07:0x22:       Bits: 64
18:34:25:WU01:FS07:0x22:       Mode: Release
18:34:25:WU01:FS07:0x22:************************************ CBang *************************************
18:34:25:WU01:FS07:0x22:       Date: May 31 2020
18:34:25:WU01:FS07:0x22:       Time: 20:16:34
18:34:25:WU01:FS07:0x22:   Revision: 75fcee0b8e713cb47f5191a3689d5f4f07244c7f
18:34:25:WU01:FS07:0x22:     Branch: HEAD
18:34:25:WU01:FS07:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
18:34:25:WU01:FS07:0x22:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
18:34:25:WU01:FS07:0x22:             -fPIC
18:34:25:WU01:FS07:0x22:   Platform: linux2 4.19.76-linuxkit
18:34:25:WU01:FS07:0x22:       Bits: 64
18:34:25:WU01:FS07:0x22:       Mode: Release
18:34:25:WU01:FS07:0x22:************************************ System ************************************
18:34:25:WU01:FS07:0x22:        CPU: AMD A8-9600 RADEON R7, 10 COMPUTE CORES 4C+6G
18:34:25:WU01:FS07:0x22:     CPU ID: AuthenticAMD Family 21 Model 101 Stepping 1
18:34:25:WU01:FS07:0x22:       CPUs: 4
18:34:25:WU01:FS07:0x22:     Memory: 14.59GiB
18:34:25:WU01:FS07:0x22:Free Memory: 13.21GiB
18:34:25:WU01:FS07:0x22:    Threads: POSIX_THREADS
18:34:25:WU01:FS07:0x22: OS Version: 5.6
18:34:25:WU01:FS07:0x22:Has Battery: false
18:34:25:WU01:FS07:0x22: On Battery: false
18:34:25:WU01:FS07:0x22: UTC Offset: -4
18:34:25:WU01:FS07:0x22:        PID: 6509
18:34:25:WU01:FS07:0x22:        CWD: /root/fahclient_saruman/work
18:34:25:WU01:FS07:0x22:********************************************************************************
18:34:25:WU01:FS07:0x22:Project: 13415 (Run 4168, Clone 10, Gen 0)
18:34:25:WU01:FS07:0x22:Unit: 0x0000000212bc7d9a5ef50d0ff6ab07cb
18:34:25:WU01:FS07:0x22:Digital signatures verified
18:34:25:WU01:FS07:0x22:Folding@home GPU Core22 Folding@home Core
18:34:25:WU01:FS07:0x22:Version 0.0.10
18:34:25:WU01:FS07:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:34:25:WU01:FS07:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:34:25:WU01:FS07:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:34:25:WU01:FS07:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
18:34:40:WU01:FS07:0x22:ERROR:Force RMSE error of 38.7167 with threshold of 5
18:34:40:WU01:FS07:0x22:Saving result file ../logfile_01.txt
18:34:40:WU01:FS07:0x22:Saving result file science.log
18:34:40:WU01:FS07:0x22:Saving result file state.xml
18:34:41:WU01:FS07:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
18:34:41:WARNING:WU01:FS07:FahCore returned: BAD_WORK_UNIT (114 = 0x72)

18:35:44:WU05:FS07:0x22:Project: 13415 (Run 3385, Clone 12, Gen 1)
18:35:44:WU05:FS07:0x22:Unit: 0x0000000212bc7d9a5ef50d2f831645f7
18:35:44:WU05:FS07:0x22:Reading tar file core.xml
18:35:44:WU05:FS07:0x22:Reading tar file integrator.xml
18:35:44:WU05:FS07:0x22:Reading tar file state.xml
18:35:44:WU05:FS07:0x22:Reading tar file system.xml
18:35:44:WU05:FS07:0x22:Digital signatures verified
18:35:44:WU05:FS07:0x22:Folding@home GPU Core22 Folding@home Core
18:35:44:WU05:FS07:0x22:Version 0.0.10
18:35:44:WU05:FS07:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:35:44:WU05:FS07:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:35:44:WU05:FS07:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:35:44:WU05:FS07:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
18:36:03:WU05:FS07:0x22:ERROR:Force RMSE error of 8.68969 with threshold of 5
18:36:03:WU05:FS07:0x22:Saving result file ../logfile_01.txt
18:36:03:WU05:FS07:0x22:Saving result file science.log
18:36:03:WU05:FS07:0x22:Saving result file state.xml
18:36:03:WU05:FS07:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

18:36:05:WU07:FS07:0x22:Project: 13415 (Run 3088, Clone 12, Gen 1)
18:36:05:WU07:FS07:0x22:Unit: 0x0000000212bc7d9a5ef50d39afa3d594
18:36:05:WU07:FS07:0x22:Reading tar file core.xml
18:36:05:WU07:FS07:0x22:Reading tar file integrator.xml
18:36:05:WU07:FS07:0x22:Reading tar file state.xml
18:36:05:WU07:FS07:0x22:Reading tar file system.xml
18:36:05:WU07:FS07:0x22:Digital signatures verified
18:36:05:WU07:FS07:0x22:Folding@home GPU Core22 Folding@home Core
18:36:05:WU07:FS07:0x22:Version 0.0.10
18:36:05:WU07:FS07:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:36:05:WU07:FS07:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:36:05:WU07:FS07:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:36:05:WU07:FS07:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
18:36:23:WU07:FS07:0x22:ERROR:Force RMSE error of 9.70296 with threshold of 5
18:36:23:WU07:FS07:0x22:Saving result file ../logfile_01.txt
18:36:23:WU07:FS07:0x22:Saving result file science.log
18:36:23:WU07:FS07:0x22:Saving result file state.xml
18:36:23:WU07:FS07:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

18:36:25:WU05:FS07:0x22:Project: 13415 (Run 2529, Clone 4, Gen 1)
18:36:25:WU05:FS07:0x22:Unit: 0x0000000812bc7d9a5ef1ae63304cfb44
18:36:25:WU05:FS07:0x22:Reading tar file core.xml
18:36:25:WU05:FS07:0x22:Reading tar file integrator.xml
18:36:25:WU05:FS07:0x22:Reading tar file state.xml
18:36:25:WU05:FS07:0x22:Reading tar file system.xml
18:36:25:WU05:FS07:0x22:Digital signatures verified
18:36:25:WU05:FS07:0x22:Folding@home GPU Core22 Folding@home Core
18:36:25:WU05:FS07:0x22:Version 0.0.10
18:36:25:WU05:FS07:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
18:36:25:WU05:FS07:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
18:36:25:WU05:FS07:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
18:36:25:WU05:FS07:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
18:36:44:WU05:FS07:0x22:ERROR:Force RMSE error of 9.79447 with threshold of 5
18:36:44:WU05:FS07:0x22:Saving result file ../logfile_01.txt
18:36:44:WU05:FS07:0x22:Saving result file science.log
18:36:44:WU05:FS07:0x22:Saving result file state.xml
18:36:44:WU05:FS07:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
It then goes on to get and complete 18:36:46:WU07:FS07:0x22:Project: 13415 (Run 3401, Clone 5, Gen 1)

Looking at the logs I also found this anomaly

Code: Select all

08:21:16:WU15:FS00:Sending unit results: id:15 state:SEND error:NO_ERROR project:16907 run:2 clone:17 gen:40 core:0x21 unit:0x0000002e0002894c5ecaffdea07863b9

08:21:40:WU20:FS00:0x22:Project: 13415 (Run 4274, Clone 19, Gen 1)
08:21:40:WU20:FS00:0x22:Unit: 0x0000000112bc7d9a5ef50d0cd585700b
08:21:40:WU20:FS00:0x22:Reading tar file core.xml
08:21:40:WU20:FS00:0x22:Reading tar file integrator.xml
08:21:40:WU20:FS00:0x22:Reading tar file state.xml
08:21:40:WU20:FS00:0x22:Reading tar file system.xml
08:21:40:WU20:FS00:0x22:Digital signatures verified
08:21:40:WU20:FS00:0x22:Folding@home GPU Core22 Folding@home Core
08:21:40:WU20:FS00:0x22:Version 0.0.10
08:21:40:WU20:FS00:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
08:21:40:WU20:FS00:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
08:21:40:WU20:FS00:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
08:21:40:WU20:FS00:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
08:21:49:WU20:FS00:0x22:ERROR:Discrepancy: Forces are blowing up! 0 1
08:21:49:WU20:FS00:0x22:Saving result file ../logfile_01.txt
08:21:49:WU20:FS00:0x22:Saving result file science.log
08:21:49:WU20:FS00:0x22:Saving result file state.xml
08:21:49:WU20:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

08:21:51:WU22:FS00:0x22:Project: 13415 (Run 3384, Clone 20, Gen 0)
08:21:51:WU22:FS00:0x22:Unit: 0x0000000012bc7d9a5ef50d2f139facf2
08:21:51:WU22:FS00:0x22:Reading tar file core.xml
08:21:51:WU22:FS00:0x22:Reading tar file integrator.xml
08:21:51:WU22:FS00:0x22:Reading tar file state.xml
08:21:51:WU22:FS00:0x22:Reading tar file system.xml
08:21:51:WU22:FS00:0x22:Digital signatures verified
08:21:51:WU22:FS00:0x22:Folding@home GPU Core22 Folding@home Core
08:21:51:WU22:FS00:0x22:Version 0.0.10
08:21:51:WU22:FS00:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
08:21:51:WU22:FS00:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
08:21:51:WU22:FS00:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
08:21:51:WU22:FS00:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
08:22:01:WU22:FS00:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
08:22:01:WU22:FS00:0x22:Saving result file ../logfile_01.txt
08:22:01:WU22:FS00:0x22:Saving result file science.log
08:22:01:WU22:FS00:0x22:Saving result file state.xml
08:22:01:WU22:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

08:22:03:WU20:FS00:0x22:Project: 13415 (Run 2844, Clone 19, Gen 1)
08:22:03:WU20:FS00:0x22:Unit: 0x0000000412bc7d9a5ef1ae52aa15101e
08:22:03:WU20:FS00:0x22:Reading tar file core.xml
08:22:03:WU20:FS00:0x22:Reading tar file integrator.xml
08:22:03:WU20:FS00:0x22:Reading tar file state.xml
08:22:03:WU20:FS00:0x22:Reading tar file system.xml
08:22:03:WU20:FS00:0x22:Digital signatures verified
08:22:03:WU20:FS00:0x22:Folding@home GPU Core22 Folding@home Core
08:22:03:WU20:FS00:0x22:Version 0.0.10
08:22:03:WU20:FS00:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
08:22:03:WU20:FS00:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
08:22:03:WU20:FS00:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
08:22:03:WU20:FS00:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
08:22:13:WU20:FS00:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
08:22:13:WU20:FS00:0x22:Saving result file ../logfile_01.txt
08:22:13:WU20:FS00:0x22:Saving result file science.log
08:22:13:WU20:FS00:0x22:Saving result file state.xml
08:22:13:WU20:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

08:22:15:WU22:FS00:0x22:Project: 13415 (Run 3389, Clone 20, Gen 0)
08:22:15:WU22:FS00:0x22:Unit: 0x0000000012bc7d9a5ef50d2ea9b8a7b2
08:22:15:WU22:FS00:0x22:Reading tar file core.xml
08:22:15:WU22:FS00:0x22:Reading tar file integrator.xml
08:22:15:WU22:FS00:0x22:Reading tar file state.xml
08:22:15:WU22:FS00:0x22:Reading tar file system.xml
08:22:15:WU22:FS00:0x22:Digital signatures verified
08:22:15:WU22:FS00:0x22:Folding@home GPU Core22 Folding@home Core
08:22:15:WU22:FS00:0x22:Version 0.0.10
08:22:15:WU22:FS00:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
08:22:15:WU22:FS00:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
08:22:15:WU22:FS00:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
08:22:15:WU22:FS00:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
08:22:24:WU22:FS00:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
08:22:24:WU22:FS00:0x22:Saving result file ../logfile_01.txt
08:22:24:WU22:FS00:0x22:Saving result file science.log
08:22:24:WU22:FS00:0x22:Saving result file state.xml
08:22:24:WU22:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

08:22:26:WU20:FS00:0x22:Project: 13415 (Run 2367, Clone 20, Gen 0)
08:22:26:WU20:FS00:0x22:Unit: 0x0000000412bc7d9a5ef1ae6c869d3b8e
08:22:26:WU20:FS00:0x22:Reading tar file core.xml
08:22:26:WU20:FS00:0x22:Reading tar file integrator.xml
08:22:26:WU20:FS00:0x22:Reading tar file state.xml
08:22:26:WU20:FS00:0x22:Reading tar file system.xml
08:22:26:WU20:FS00:0x22:Digital signatures verified
08:22:26:WU20:FS00:0x22:Folding@home GPU Core22 Folding@home Core
08:22:26:WU20:FS00:0x22:Version 0.0.10
08:22:26:WU20:FS00:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
08:22:26:WU20:FS00:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
08:22:26:WU20:FS00:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
08:22:26:WU20:FS00:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
08:22:36:WU20:FS00:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
08:22:36:WU20:FS00:0x22:Saving result file ../logfile_01.txt
08:22:36:WU20:FS00:0x22:Saving result file science.log
08:22:36:WU20:FS00:0x22:Saving result file state.xml
08:22:36:WU20:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

08:22:38:WU22:FS00:0x22:Project: 13415 (Run 3396, Clone 20, Gen 0)
08:22:38:WU22:FS00:0x22:Unit: 0x0000000012bc7d9a5ef50d2e13340be0
08:22:38:WU22:FS00:0x22:Reading tar file core.xml
08:22:38:WU22:FS00:0x22:Reading tar file integrator.xml
08:22:38:WU22:FS00:0x22:Reading tar file state.xml
08:22:38:WU22:FS00:0x22:Reading tar file system.xml
08:22:38:WU22:FS00:0x22:Digital signatures verified
08:22:38:WU22:FS00:0x22:Folding@home GPU Core22 Folding@home Core
08:22:38:WU22:FS00:0x22:Version 0.0.10
08:22:38:WU22:FS00:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
08:22:38:WU22:FS00:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
08:22:38:WU22:FS00:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
08:22:38:WU22:FS00:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
08:22:47:WU22:FS00:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
08:22:48:WU22:FS00:0x22:Saving result file ../logfile_01.txt
08:22:48:WU22:FS00:0x22:Saving result file science.log
08:22:48:WU22:FS00:0x22:Saving result file state.xml
08:22:48:WU22:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT

08:22:50:WU20:FS00:0x22:Project: 13415 (Run 3231, Clone 20, Gen 0)
08:22:50:WU20:FS00:0x22:Unit: 0x0000000112bc7d9a5ef50d343d1d640f
08:22:50:WU20:FS00:0x22:Reading tar file core.xml
08:22:50:WU20:FS00:0x22:Reading tar file integrator.xml
08:22:50:WU20:FS00:0x22:Reading tar file state.xml
08:22:50:WU20:FS00:0x22:Reading tar file system.xml
08:22:50:WU20:FS00:0x22:Digital signatures verified
08:22:50:WU20:FS00:0x22:Folding@home GPU Core22 Folding@home Core
08:22:50:WU20:FS00:0x22:Version 0.0.10
08:22:50:WU20:FS00:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
08:22:50:WU20:FS00:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
08:22:50:WU20:FS00:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
08:22:50:WU20:FS00:0x22:  Global context and integrator variables write interval: 250 steps (0.025%) [4000 total]
08:22:59:WU20:FS00:0x22:ERROR:Discrepancy: Forces are blowing up! 4 0
08:22:59:WU20:FS00:0x22:Saving result file ../logfile_01.txt
08:22:59:WU20:FS00:0x22:Saving result file science.log
08:22:59:WU20:FS00:0x22:Saving result file state.xml
08:22:59:WU20:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
At that point, FS00 gets Project: 14253 (Run 1430, Clone 4, Gen 87), which has been processing fine since...
I really don't know why that sequence of fauly units happenned. The slot did manage to do a 13415 in the past with no problems.
Last edited by Joe_H on Sat Jun 27, 2020 8:21 pm, edited 1 time in total.
Reason: change Quote tags to Code for log segments
Image
rofwuliya
Posts: 1
Joined: Sun May 17, 2020 12:42 pm

Re: core22 0.0.10 released to full FAH!

Post by rofwuliya »

It seems that my Linux desktop is way more snappier than with previous GPU cores. Well done! Thank you!
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: core22 0.0.10 released to full FAH!

Post by JohnChodera »

@Nuitari: Thanks for the wide variety of reports here. We've been investigating the RMSE errors, but I hadn't seen those stalled errors before---did you do anything to trigger that, or did it happen on its own?

Code: Select all

18:43:48:WU16:FS03:0x22:Completed 790000 out of 1000000 steps (79%)
18:55:34:WU16:FS03:0x22:Watchdog triggered, requesting soft shutdown down
19:05:34:WU16:FS03:0x22:Watchdog shutdown failed, hard shutdown triggered
19:05:34:WARNING:WU16:FS03:FahCore returned: WU_STALLED (127 = 0x7f)
19:05:35:WU16:FS03:Starting
--- SNIP the startup of the core ---
Will go through this in more detail after we get 0.0.11 out, which will hopefully fix some segfaults we've been seeing.

~ John Chodera // MSKCC
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: core22 0.0.10 released to full FAH!

Post by Nuitari »

It happened on its own.
Nothing in the log mentions a clock skew being detected.
Image
Post Reply