Page 1 of 1

Projevt 17118 on 4090

Posted: Mon Oct 30, 2023 4:52 am
by PaulTV
Hola,

Last night I got a job for project 17118 on my 4090, which is probably by mistake. It was ready in only 6 minutes or so, and got me just 18k points. This project is probably meant for smaller GPUs.

https://apps.foldingathome.org/wu#proje ... =338&gen=0

Cheers,
Paul

Re: Projevt 17118 on 4090

Posted: Mon Oct 30, 2023 2:43 pm
by pyrocyborg
This is a very fast benchmark, I think. Happens sometimes.

Re: Projevt 17118 on 4090

Posted: Mon Oct 30, 2023 3:38 pm
by Joe_H
pyrocyborg wrote: Mon Oct 30, 2023 2:43 pm This is a very fast benchmark, I think. Happens sometimes.
Exactly. There are a few benchmarking projects to check performance on various GPUs. In this case the description was copy / pasted from a previous project for Core_22, but is testing Core_23.

Project description - https://stats.foldingathome.org/project/17118

Re: Projevt 17118 on 4090

Posted: Mon Oct 30, 2023 4:16 pm
by PaulTV
Thanks! I should have realized it's a performance project, I guess I was still half asleep this morning when posting...

Re: Projevt 17118 on 4090

Posted: Tue Oct 31, 2023 11:55 pm
by wdanwatts
How do I get my machine to stop jamming up on this project?

Re: Projevt 17118 on 4090

Posted: Tue Oct 31, 2023 11:56 pm
by wdanwatts
This system keeps cycling

Code: Select all

23:51:31:WU00:FS00:Starting
23:51:31:WU00:FS00:Removing old file 'work/00/logfile_01-20231031-231156.txt'
23:51:31:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/openmm-core-23/centos-7.9.2009-64bit/release/0x23-8.0.3/Core_23.fah/FahCore_23 -dir 00 -suffix 01 -version 706 -lifeline 1446 -checkpoint 30 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
23:51:31:WU00:FS00:Started FahCore on PID 4787
23:51:31:WU00:FS00:Core PID:4791
23:51:31:WU00:FS00:FahCore 0x23 started
23:51:32:WU00:FS00:0x23:*********************** Log Started 2023-10-31T23:51:31Z ***********************
23:51:32:WU00:FS00:0x23:*************************** Core23 Folding@home Core ***************************
23:51:32:WU00:FS00:0x23:       Core: Core23
23:51:32:WU00:FS00:0x23:       Type: 0x23
23:51:32:WU00:FS00:0x23:    Version: 8.0.3
23:51:32:WU00:FS00:0x23:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:51:32:WU00:FS00:0x23:  Copyright: 2022 foldingathome.org
23:51:32:WU00:FS00:0x23:   Homepage: https://foldingathome.org/
23:51:32:WU00:FS00:0x23:       Date: Aug 3 2023
23:51:32:WU00:FS00:0x23:       Time: 08:28:22
23:51:32:WU00:FS00:0x23:   Revision: 199cb870317d05441d0a301287d9ef61254fa32b
23:51:32:WU00:FS00:0x23:     Branch: HEAD
23:51:32:WU00:FS00:0x23:   Compiler: GNU 7.5.0
23:51:32:WU00:FS00:0x23:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
23:51:32:WU00:FS00:0x23:             -fdata-sections -O3 -funroll-loops -fno-pie
23:51:32:WU00:FS00:0x23:             -DOPENMM_VERSION="\"8.0.0\""
23:51:32:WU00:FS00:0x23:   Platform: linux 5.15.0-1041-azure
23:51:32:WU00:FS00:0x23:       Bits: 64
23:51:32:WU00:FS00:0x23:       Mode: Release
23:51:32:WU00:FS00:0x23:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
23:51:32:WU00:FS00:0x23:             <peastman@stanford.edu>
23:51:32:WU00:FS00:0x23:       Args: -dir 00 -suffix 01 -version 706 -lifeline 4787 -checkpoint 30
23:51:32:WU00:FS00:0x23:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
23:51:32:WU00:FS00:0x23:             0 -gpu 0
23:51:32:WU00:FS00:0x23:************************************ libFAH ************************************
23:51:32:WU00:FS00:0x23:       Date: Aug 3 2023
23:51:32:WU00:FS00:0x23:       Time: 08:27:48
23:51:32:WU00:FS00:0x23:   Revision: 112c2234abe20611a05652defc3c7f854cbf927f
23:51:32:WU00:FS00:0x23:     Branch: HEAD
23:51:32:WU00:FS00:0x23:   Compiler: GNU 7.5.0
23:51:32:WU00:FS00:0x23:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
23:51:32:WU00:FS00:0x23:             -fdata-sections -O3 -funroll-loops -fno-pie
23:51:32:WU00:FS00:0x23:   Platform: linux 5.15.0-1041-azure
23:51:32:WU00:FS00:0x23:       Bits: 64
23:51:32:WU00:FS00:0x23:       Mode: Release
23:51:32:WU00:FS00:0x23:************************************ CBang *************************************
23:51:32:WU00:FS00:0x23:    Version: 1.7.2
23:51:32:WU00:FS00:0x23:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:51:32:WU00:FS00:0x23:        Org: Cauldron Development LLC
23:51:32:WU00:FS00:0x23:  Copyright: Cauldron Development LLC, 2003-2023
23:51:32:WU00:FS00:0x23:   Homepage: https://cauldrondevelopment.com/
23:51:32:WU00:FS00:0x23:    License: GPL 2+
23:51:32:WU00:FS00:0x23:       Date: Aug 3 2023
23:51:32:WU00:FS00:0x23:       Time: 08:27:30
23:51:32:WU00:FS00:0x23:   Revision: eae4b58965bdd4d54ea9eb77972674352b37a547
23:51:32:WU00:FS00:0x23:     Branch: HEAD
23:51:32:WU00:FS00:0x23:   Compiler: GNU 7.5.0
23:51:32:WU00:FS00:0x23:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
23:51:32:WU00:FS00:0x23:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
23:51:32:WU00:FS00:0x23:   Platform: linux 5.15.0-1041-azure
23:51:32:WU00:FS00:0x23:       Bits: 64
23:51:32:WU00:FS00:0x23:       Mode: Release
23:51:32:WU00:FS00:0x23:************************************ System ************************************
23:51:32:WU00:FS00:0x23:        CPU: AMD Phenom(tm) II X2 545 Processor
23:51:32:WU00:FS00:0x23:     CPU ID: AuthenticAMD Family 16 Model 4 Stepping 2
23:51:32:WU00:FS00:0x23:       CPUs: 2
23:51:32:WU00:FS00:0x23:     Memory: 3.81GiB
23:51:32:WU00:FS00:0x23:Free Memory: 824.84MiB
23:51:32:WU00:FS00:0x23:    Threads: POSIX_THREADS
23:51:32:WU00:FS00:0x23: OS Version: 6.5
23:51:32:WU00:FS00:0x23:Has Battery: false
23:51:32:WU00:FS00:0x23: On Battery: false
23:51:32:WU00:FS00:0x23: UTC Offset: -5
23:51:32:WU00:FS00:0x23:        PID: 4791
23:51:32:WU00:FS00:0x23:        CWD: /var/lib/fahclient/work
23:51:32:WU00:FS00:0x23:       Exec: /var/lib/fahclient/cores/cores.foldingathome.org/openmm-core-23/centos-7.9.2009-64bit/release/0x23-8.0.3/Core_23.fah/FahCore_23
23:51:32:WU00:FS00:0x23:************************************ OpenMM ************************************
23:51:32:WU00:FS00:0x23:    Version: 8.0.0
23:51:32:WU00:FS00:0x23:********************************************************************************
23:51:32:WU00:FS00:0x23:Project: 17118 (Run 2, Clone 406, Gen 0)
23:51:32:WU00:FS00:0x23:Digital signatures verified
23:51:32:WU00:FS00:0x23:Folding@home GPU Core23 Folding@home Core
23:51:32:WU00:FS00:0x23:Version 8.0.3
23:51:32:WU00:FS00:0x23:  Checkpoint write interval: 433117 steps (50%) [2 total]
23:51:32:WU00:FS00:0x23:  JSON viewer frame write interval: 8662 steps (1%) [100 total]
23:51:32:WU00:FS00:0x23:  XTC frame write interval: 866234 steps (1e+02%) [1 total]
23:51:32:WU00:FS00:0x23:  Global context and integrator variables write interval: disabled
23:51:32:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

Re: Projevt 17118 on 4090

Posted: Wed Nov 01, 2023 1:20 am
by BobWilliams757
wdanwatts wrote: Tue Oct 31, 2023 11:55 pm How do I get my machine to stop jamming up on this project?
It might be worth showing more of your log. I can see the last line you posted, and assume it crashed after that, but the why and/or the specifics might be down further in the log. If it doesn't go beyond that point and just restarts at least we will know that.


Also....

Does it reboot the machine, or just kill the work unit?

What OS, and basic (at least) system configuration?

Is the machine usually stable for folding?

Has it been multiple work units, every unit from that project, etc?


And anything else that might be of importance. Some of these benchmark work units are experimental to some extent, and knowing what machines they cause issues on might help them straighten things out.

Re: Projevt 17118 on 4090

Posted: Wed Nov 01, 2023 12:25 pm
by wdanwatts
It got better (after a day or two). I'm runnung Fedora Linux 38 (Workstation Edition) on an AMD Phenom™ II X2 545 × 2 with a NVIDIA GeForce GTX 1660 SUPER GPU. It had been running ~ 1 million 'points' per day.
This is what the end of the problem looked like:

Code: Select all

... 05:53:01:WU00:FS00:0x23:************************************ OpenMM ************************************
05:53:01:WU00:FS00:0x23:    Version: 8.0.0
05:53:01:WU00:FS00:0x23:********************************************************************************
05:53:01:WU00:FS00:0x23:Project: 17118 (Run 2, Clone 406, Gen 0)
05:53:01:WU00:FS00:0x23:Digital signatures verified
05:53:01:WU00:FS00:0x23:Folding@home GPU Core23 Folding@home Core
05:53:01:WU00:FS00:0x23:Version 8.0.3
05:53:01:WU00:FS00:0x23:  Checkpoint write interval: 433117 steps (50%) [2 total]
05:53:01:WU00:FS00:0x23:  JSON viewer frame write interval: 8662 steps (1%) [100 total]
05:53:01:WU00:FS00:0x23:  XTC frame write interval: 866234 steps (1e+02%) [1 total]
05:53:01:WU00:FS00:0x23:  Global context and integrator variables write interval: disabled
05:53:02:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
05:53:04:WARNING:WU00:FS00:Past final deadline 2023-11-01T05:53:03Z, dumping
05:53:04:WU00:FS00:Cleaning up
05:53:04:WU00:FS00:Connecting to assign1.foldingathome.org:80
05:53:04:WU00:FS00:Assigned to work server 129.32.209.200
05:53:04:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:TU116 [GeForce GTX 1660 SUPER] from 129.32.209.200
05:53:04:WU00:FS00:Connecting to 129.32.209.200:8080
Now it is back to running jobs again. It would be useful to know how to short-circuit this problem (if it happens again) so I don't have to wait until the job times out.

Re: Projevt 17118 on 4090

Posted: Thu Nov 02, 2023 3:53 am
by BobWilliams757
Interesting that it just hung like that until it timed out. In any case, if you get continued problems you should report them, since it's a benchmarking project and looking for any possible issues.

Re: Projevt 17118 on 4090

Posted: Sun Nov 05, 2023 2:12 pm
by toTOW
wdanwatts wrote: Tue Oct 31, 2023 11:56 pm This system keeps cycling

Code: Select all

23:51:31:WU00:FS00:Starting
23:51:31:WU00:FS00:Removing old file 'work/00/logfile_01-20231031-231156.txt'
23:51:31:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/openmm-core-23/centos-7.9.2009-64bit/release/0x23-8.0.3/Core_23.fah/FahCore_23 -dir 00 -suffix 01 -version 706 -lifeline 1446 -checkpoint 30 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
23:51:31:WU00:FS00:Started FahCore on PID 4787
23:51:31:WU00:FS00:Core PID:4791
23:51:31:WU00:FS00:FahCore 0x23 started
23:51:32:WU00:FS00:0x23:*********************** Log Started 2023-10-31T23:51:31Z ***********************
23:51:32:WU00:FS00:0x23:*************************** Core23 Folding@home Core ***************************
23:51:32:WU00:FS00:0x23:       Core: Core23
23:51:32:WU00:FS00:0x23:       Type: 0x23
23:51:32:WU00:FS00:0x23:    Version: 8.0.3
23:51:32:WU00:FS00:0x23:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:51:32:WU00:FS00:0x23:  Copyright: 2022 foldingathome.org
23:51:32:WU00:FS00:0x23:   Homepage: https://foldingathome.org/
23:51:32:WU00:FS00:0x23:       Date: Aug 3 2023
23:51:32:WU00:FS00:0x23:       Time: 08:28:22
23:51:32:WU00:FS00:0x23:   Revision: 199cb870317d05441d0a301287d9ef61254fa32b
23:51:32:WU00:FS00:0x23:     Branch: HEAD
23:51:32:WU00:FS00:0x23:   Compiler: GNU 7.5.0
23:51:32:WU00:FS00:0x23:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
23:51:32:WU00:FS00:0x23:             -fdata-sections -O3 -funroll-loops -fno-pie
23:51:32:WU00:FS00:0x23:             -DOPENMM_VERSION="\"8.0.0\""
23:51:32:WU00:FS00:0x23:   Platform: linux 5.15.0-1041-azure
23:51:32:WU00:FS00:0x23:       Bits: 64
23:51:32:WU00:FS00:0x23:       Mode: Release
23:51:32:WU00:FS00:0x23:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
23:51:32:WU00:FS00:0x23:             <peastman@stanford.edu>
23:51:32:WU00:FS00:0x23:       Args: -dir 00 -suffix 01 -version 706 -lifeline 4787 -checkpoint 30
23:51:32:WU00:FS00:0x23:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
23:51:32:WU00:FS00:0x23:             0 -gpu 0
23:51:32:WU00:FS00:0x23:************************************ libFAH ************************************
23:51:32:WU00:FS00:0x23:       Date: Aug 3 2023
23:51:32:WU00:FS00:0x23:       Time: 08:27:48
23:51:32:WU00:FS00:0x23:   Revision: 112c2234abe20611a05652defc3c7f854cbf927f
23:51:32:WU00:FS00:0x23:     Branch: HEAD
23:51:32:WU00:FS00:0x23:   Compiler: GNU 7.5.0
23:51:32:WU00:FS00:0x23:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
23:51:32:WU00:FS00:0x23:             -fdata-sections -O3 -funroll-loops -fno-pie
23:51:32:WU00:FS00:0x23:   Platform: linux 5.15.0-1041-azure
23:51:32:WU00:FS00:0x23:       Bits: 64
23:51:32:WU00:FS00:0x23:       Mode: Release
23:51:32:WU00:FS00:0x23:************************************ CBang *************************************
23:51:32:WU00:FS00:0x23:    Version: 1.7.2
23:51:32:WU00:FS00:0x23:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:51:32:WU00:FS00:0x23:        Org: Cauldron Development LLC
23:51:32:WU00:FS00:0x23:  Copyright: Cauldron Development LLC, 2003-2023
23:51:32:WU00:FS00:0x23:   Homepage: https://cauldrondevelopment.com/
23:51:32:WU00:FS00:0x23:    License: GPL 2+
23:51:32:WU00:FS00:0x23:       Date: Aug 3 2023
23:51:32:WU00:FS00:0x23:       Time: 08:27:30
23:51:32:WU00:FS00:0x23:   Revision: eae4b58965bdd4d54ea9eb77972674352b37a547
23:51:32:WU00:FS00:0x23:     Branch: HEAD
23:51:32:WU00:FS00:0x23:   Compiler: GNU 7.5.0
23:51:32:WU00:FS00:0x23:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
23:51:32:WU00:FS00:0x23:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
23:51:32:WU00:FS00:0x23:   Platform: linux 5.15.0-1041-azure
23:51:32:WU00:FS00:0x23:       Bits: 64
23:51:32:WU00:FS00:0x23:       Mode: Release
23:51:32:WU00:FS00:0x23:************************************ System ************************************
23:51:32:WU00:FS00:0x23:        CPU: AMD Phenom(tm) II X2 545 Processor
23:51:32:WU00:FS00:0x23:     CPU ID: AuthenticAMD Family 16 Model 4 Stepping 2
23:51:32:WU00:FS00:0x23:       CPUs: 2
23:51:32:WU00:FS00:0x23:     Memory: 3.81GiB
23:51:32:WU00:FS00:0x23:Free Memory: 824.84MiB
23:51:32:WU00:FS00:0x23:    Threads: POSIX_THREADS
23:51:32:WU00:FS00:0x23: OS Version: 6.5
23:51:32:WU00:FS00:0x23:Has Battery: false
23:51:32:WU00:FS00:0x23: On Battery: false
23:51:32:WU00:FS00:0x23: UTC Offset: -5
23:51:32:WU00:FS00:0x23:        PID: 4791
23:51:32:WU00:FS00:0x23:        CWD: /var/lib/fahclient/work
23:51:32:WU00:FS00:0x23:       Exec: /var/lib/fahclient/cores/cores.foldingathome.org/openmm-core-23/centos-7.9.2009-64bit/release/0x23-8.0.3/Core_23.fah/FahCore_23
23:51:32:WU00:FS00:0x23:************************************ OpenMM ************************************
23:51:32:WU00:FS00:0x23:    Version: 8.0.0
23:51:32:WU00:FS00:0x23:********************************************************************************
23:51:32:WU00:FS00:0x23:Project: 17118 (Run 2, Clone 406, Gen 0)
23:51:32:WU00:FS00:0x23:Digital signatures verified
23:51:32:WU00:FS00:0x23:Folding@home GPU Core23 Folding@home Core
23:51:32:WU00:FS00:0x23:Version 8.0.3
23:51:32:WU00:FS00:0x23:  Checkpoint write interval: 433117 steps (50%) [2 total]
23:51:32:WU00:FS00:0x23:  JSON viewer frame write interval: 8662 steps (1%) [100 total]
23:51:32:WU00:FS00:0x23:  XTC frame write interval: 866234 steps (1e+02%) [1 total]
23:51:32:WU00:FS00:0x23:  Global context and integrator variables write interval: disabled
23:51:32:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Try to run this command in a terminal, it will print the actual issue that is causing the INTERRUPTED errore :

Code: Select all

./var/lib/fahclient/cores/cores.foldingathome.org/openmm-core-23/centos-7.9.2009-64bit/release/0x23-8.0.3/Core_23.fah/FahCore_23 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0

Re: Projevt 17118 on 4090

Posted: Wed Nov 29, 2023 12:51 pm
by wdanwatts
On my Fedora v 39 I get

Code: Select all

./FahCore_23: error while loading shared libraries: libOpenMM.so.8.0: cannot open shared object file: No such file or directory
How do I get libOpenMM.so.8.0 ?

Re: Projevt 17118 on 4090

Posted: Wed Nov 29, 2023 2:02 pm
by bikeaddict
wdanwatts wrote: Wed Nov 29, 2023 12:51 pm How do I get libOpenMM.so.8.0 ?
This library and many more libOpenMM files should be downloaded by the client as part of the core files. It's here on my Fedora system:

Code: Select all

/var/lib/fahclient/cores/cores.foldingathome.org/openmm-core-23/centos-7.9.2009-64bit/release/0x23-8.0.3/Core_23.fah/libOpenMM.so.8.0
You can try removing that directory and then restart the client to make it download the core again. Also make sure there aren't directory permission problems preventing the files from being written. Also check for errors in /var/lib/fahclient/log.txt.

Re: Projevt 17118 on 4090

Posted: Wed Nov 29, 2023 5:46 pm
by wdanwatts
A system reboot started a string of Core 22 jobs so my lack of a Core 23 library is not critical at this time.