Computer goes blank, won't revive-cold reboot required

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Computer goes blank, won't revive-cold reboot required

Post by hrsetrdr »

Started folding on 4/13/2022 (5 days) with a new GeForce RTX 2060 6GB in the following system:
O/S: Linux-x86_64
Nvidia driver version : 470.103.01
AMD Phenom(tm) II X4 965 Processor
System RAM:8 GB DDR3
EVGA Supernova 750W G6 (brand new)

First couple days everything running as hoped. last couple days(and twice today) the display will go blank and system can't be awakened, requiring a power off/on action.
I'm not seeing anything in the FAHcontrol log or the Ubuntu system logs that would shed light on this, but perhaps I'm not looking in the right place.
FAH log:

Code: Select all

*********************** Log Started 2022-04-18T17:22:18Z ***********************
17:22:18:******************************* libFAH ********************************
17:22:18:         Date: Oct 20 2020
17:22:18:         Time: 20:36:39
17:22:18:     Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
17:22:18:       Branch: master
17:22:18:     Compiler: GNU 8.3.0
17:22:18:      Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
17:22:18:               -fdata-sections -O3 -funroll-loops -fno-pie
17:22:18:     Platform: linux2 5.8.0-1-amd64
17:22:18:         Bits: 64
17:22:18:         Mode: Release
17:22:18:****************************** FAHClient ******************************
17:22:18:      Version: 7.6.21
17:22:18:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:22:18:    Copyright: 2020 foldingathome.org
17:22:18:     Homepage: https://foldingathome.org/
17:22:18:         Date: Oct 20 2020
17:22:18:         Time: 20:39:00
17:22:18:     Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
17:22:18:       Branch: master
17:22:18:     Compiler: GNU 8.3.0
17:22:18:      Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
17:22:18:               -fdata-sections -O3 -funroll-loops -fno-pie
17:22:18:     Platform: linux2 5.8.0-1-amd64
17:22:18:         Bits: 64
17:22:18:         Mode: Release
17:22:18:         Args: --child /etc/fahclient/config.xml --run-as fahclient
17:22:18:               --pid-file=/var/run/fahclient.pid --daemon
17:22:18:       Config: /etc/fahclient/config.xml
17:22:18:******************************** CBang ********************************
17:22:18:         Date: Oct 20 2020
17:22:18:         Time: 18:37:59
17:22:18:     Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
17:22:18:       Branch: master
17:22:18:     Compiler: GNU 8.3.0
17:22:18:      Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
17:22:18:               -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
17:22:18:     Platform: linux2 5.8.0-1-amd64
17:22:18:         Bits: 64
17:22:18:         Mode: Release
17:22:18:******************************* System ********************************
17:22:18:          CPU: AMD Phenom(tm) II X4 965 Processor
17:22:18:       CPU ID: AuthenticAMD Family 16 Model 4 Stepping 3
17:22:18:         CPUs: 4
17:22:18:       Memory: 7.77GiB
17:22:18:  Free Memory: 6.99GiB
17:22:18:      Threads: POSIX_THREADS
17:22:18:   OS Version: 5.4
17:22:18:  Has Battery: false
17:22:18:   On Battery: false
17:22:18:   UTC Offset: -7
17:22:18:          PID: 1275
17:22:18:          CWD: /var/lib/fahclient
17:22:18:           OS: Linux 5.4.0-107-generic x86_64
17:22:18:      OS Arch: AMD64
17:22:18:         GPUs: 1
17:22:18:        GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 TU106 [Geforce RTX 2060]
17:22:18:CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:11.4
17:22:18:       OpenCL: Not detected: Failed to open dynamic library 'libOpenCL.so':
17:22:18:               libOpenCL.so: cannot open shared object file: No such file or
17:22:18:               directory
17:22:18:***********************************************************************
17:22:18:<config>
17:22:18:  <!-- Client Control -->
17:22:18:  <fold-anon v='true'/>
17:22:18:
17:22:18:  <!-- Folding Core -->
17:22:18:  <core-priority v='low'/>
17:22:18:
17:22:18:  <!-- Network -->
17:22:18:  <proxy v=':8080'/>
17:22:18:
17:22:18:  <!-- Slot Control -->
17:22:18:  <power v='full'/>
17:22:18:
17:22:18:  <!-- User Information -->
17:22:18:  <passkey v='*****'/>
17:22:18:  <team v='32'/>
17:22:18:  <user v='hrsetrdr'/>
17:22:18:
17:22:18:  <!-- Folding Slots -->
17:22:18:  <slot id='0' type='CPU'/>
17:22:18:  <slot id='1' type='GPU'>
17:22:18:    <pci-bus v='1'/>
17:22:18:    <pci-slot v='0'/>
17:22:18:  </slot>
17:22:18:</config>
17:22:18:Trying to access database...
17:22:18:Successfully acquired database lock
17:22:18:FS00:Initialized folding slot 00: cpu:3
17:22:18:FS01:Initialized folding slot 01: gpu:1:0 TU106 [Geforce RTX 2060]
17:22:18:WU02:FS00:Starting
17:22:18:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-sse2/a8-0.0.12/Core_a8.fah/FahCore_a8 -dir 02 -suffix 01 -version 706 -lifeline 1275 -checkpoint 15 -np 3
17:22:18:WU02:FS00:Started FahCore on PID 1297
17:22:18:WU02:FS00:Core PID:1302
17:22:18:WU02:FS00:FahCore 0xa8 started
17:22:18:WU01:FS01:Starting
17:22:18:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.20/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 1275 -checkpoint 15 -cuda-device 0 -gpu-vendor nvidia -gpu -1 -gpu-usage 100
17:22:18:WU01:FS01:Started FahCore on PID 1341
17:22:18:WU01:FS01:Core PID:1345
17:22:18:WU01:FS01:FahCore 0x22 started
17:22:19:WU02:FS00:0xa8:*********************** Log Started 2022-04-18T17:22:18Z ***********************
17:22:19:WU02:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
17:22:19:WU02:FS00:0xa8:       Core: Gromacs
17:22:19:WU02:FS00:0xa8:       Type: 0xa8
17:22:19:WU02:FS00:0xa8:    Version: 0.0.12
17:22:19:WU02:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:22:19:WU02:FS00:0xa8:  Copyright: 2020 foldingathome.org
17:22:19:WU02:FS00:0xa8:   Homepage: https://foldingathome.org/
17:22:19:WU02:FS00:0xa8:       Date: Jan 16 2021
17:22:19:WU02:FS00:0xa8:       Time: 19:22:05
17:22:19:WU02:FS00:0xa8:   Compiler: GNU 8.3.0
17:22:19:WU02:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
17:22:19:WU02:FS00:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
17:22:19:WU02:FS00:0xa8:   Platform: linux2 4.15.0-128-generic
17:22:19:WU02:FS00:0xa8:       Bits: 64
17:22:19:WU02:FS00:0xa8:       Mode: Release
17:22:19:WU02:FS00:0xa8:       SIMD: sse2
17:22:19:WU02:FS00:0xa8:     OpenMP: ON
17:22:19:WU02:FS00:0xa8:       CUDA: OFF
17:22:19:WU02:FS00:0xa8:       Args: -dir 02 -suffix 01 -version 706 -lifeline 1297 -checkpoint 15 -np 3
17:22:19:WU02:FS00:0xa8:************************************ libFAH ************************************
17:22:19:WU02:FS00:0xa8:       Date: Jan 16 2021
17:22:19:WU02:FS00:0xa8:       Time: 19:21:38
17:22:19:WU02:FS00:0xa8:   Compiler: GNU 8.3.0
17:22:19:WU02:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
17:22:19:WU02:FS00:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
17:22:19:WU02:FS00:0xa8:   Platform: linux2 4.15.0-128-generic
17:22:19:WU02:FS00:0xa8:       Bits: 64
17:22:19:WU02:FS00:0xa8:       Mode: Release
17:22:19:WU02:FS00:0xa8:************************************ CBang *************************************
17:22:19:WU02:FS00:0xa8:       Date: Jan 16 2021
17:22:19:WU02:FS00:0xa8:       Time: 19:21:24
17:22:19:WU02:FS00:0xa8:   Compiler: GNU 8.3.0
17:22:19:WU02:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
17:22:19:WU02:FS00:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
17:22:19:WU02:FS00:0xa8:   Platform: linux2 4.15.0-128-generic
17:22:19:WU02:FS00:0xa8:       Bits: 64
17:22:19:WU02:FS00:0xa8:       Mode: Release
17:22:19:WU02:FS00:0xa8:************************************ System ************************************
17:22:19:WU02:FS00:0xa8:        CPU: AMD Phenom(tm) II X4 965 Processor
17:22:19:WU02:FS00:0xa8:     CPU ID: AuthenticAMD Family 16 Model 4 Stepping 3
17:22:19:WU02:FS00:0xa8:       CPUs: 4
17:22:19:WU02:FS00:0xa8:     Memory: 7.77GiB
17:22:19:WU02:FS00:0xa8:Free Memory: 6.98GiB
17:22:19:WU02:FS00:0xa8:    Threads: POSIX_THREADS
17:22:19:WU02:FS00:0xa8: OS Version: 5.4
17:22:19:WU02:FS00:0xa8:Has Battery: false
17:22:19:WU02:FS00:0xa8: On Battery: false
17:22:19:WU02:FS00:0xa8: UTC Offset: -7
17:22:19:WU02:FS00:0xa8:        PID: 1302
17:22:19:WU02:FS00:0xa8:        CWD: /var/lib/fahclient/work
17:22:19:WU02:FS00:0xa8:********************************************************************************
17:22:19:WU02:FS00:0xa8:Project: 17444 (Run 0, Clone 4150, Gen 16)
17:22:19:WU02:FS00:0xa8:Unit: 0x00000000000000000000000000000000
17:22:19:WU02:FS00:0xa8:Digital signatures verified
17:22:19:WU02:FS00:0xa8:Calling: mdrun -c frame16.gro -s frame16.tpr -x frame16.xtc -cpi state.cpt -cpt 15 -nt 3 -ntmpi 1
17:22:19:WU02:FS00:0xa8:Steps: first=2000000 total=2125000
17:22:19:WU01:FS01:0x22:*********************** Log Started 2022-04-18T17:22:18Z ***********************
17:22:19:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
17:22:19:WU01:FS01:0x22:       Core: Core22
17:22:19:WU01:FS01:0x22:       Type: 0x22
17:22:19:WU01:FS01:0x22:    Version: 0.0.20
17:22:19:WU01:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:22:19:WU01:FS01:0x22:  Copyright: 2020 foldingathome.org
17:22:19:WU01:FS01:0x22:   Homepage: https://foldingathome.org/
17:22:19:WU01:FS01:0x22:       Date: Jan 20 2022
17:22:19:WU01:FS01:0x22:       Time: 00:57:52
17:22:19:WU01:FS01:0x22:   Revision: 3f211b8a4346514edbff34e3cb1c0e0ec951373c
17:22:19:WU01:FS01:0x22:     Branch: HEAD
17:22:19:WU01:FS01:0x22:   Compiler: GNU 9.4.0
17:22:19:WU01:FS01:0x22:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
17:22:19:WU01:FS01:0x22:             -fdata-sections -O3 -funroll-loops -fno-pie
17:22:19:WU01:FS01:0x22:             -DOPENMM_VERSION="\"7.7.0\""
17:22:19:WU01:FS01:0x22:   Platform: linux 5.11.0-1025-azure
17:22:19:WU01:FS01:0x22:       Bits: 64
17:22:19:WU01:FS01:0x22:       Mode: Release
17:22:19:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
17:22:19:WU01:FS01:0x22:             <peastman@stanford.edu>
17:22:19:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 1341 -checkpoint 15
17:22:19:WU01:FS01:0x22:             -cuda-device 0 -gpu-vendor nvidia -gpu -1 -gpu-usage 100
17:22:19:WU01:FS01:0x22:************************************ libFAH ************************************
17:22:19:WU01:FS01:0x22:       Date: Jan 20 2022
17:22:19:WU01:FS01:0x22:       Time: 00:57:22
17:22:19:WU01:FS01:0x22:   Revision: 9f4ad694e75c2350d4bb6b8b5b769ba27e483a2f
17:22:19:WU01:FS01:0x22:     Branch: HEAD
17:22:19:WU01:FS01:0x22:   Compiler: GNU 9.4.0
17:22:19:WU01:FS01:0x22:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
17:22:19:WU01:FS01:0x22:             -fdata-sections -O3 -funroll-loops -fno-pie
17:22:19:WU01:FS01:0x22:   Platform: linux 5.11.0-1025-azure
17:22:19:WU01:FS01:0x22:       Bits: 64
17:22:19:WU01:FS01:0x22:       Mode: Release
17:22:19:WU01:FS01:0x22:************************************ CBang *************************************
17:22:19:WU01:FS01:0x22:       Date: Jan 20 2022
17:22:19:WU01:FS01:0x22:       Time: 00:57:00
17:22:19:WU01:FS01:0x22:   Revision: ab023d155b446906d55b0f6c9a1eedeea04f7a1a
17:22:19:WU01:FS01:0x22:     Branch: HEAD
17:22:19:WU01:FS01:0x22:   Compiler: GNU 9.4.0
17:22:19:WU01:FS01:0x22:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
17:22:19:WU01:FS01:0x22:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
17:22:19:WU01:FS01:0x22:   Platform: linux 5.11.0-1025-azure
17:22:19:WU01:FS01:0x22:       Bits: 64
17:22:19:WU01:FS01:0x22:       Mode: Release
17:22:19:WU01:FS01:0x22:************************************ System ************************************
17:22:19:WU01:FS01:0x22:        CPU: AMD Phenom(tm) II X4 965 Processor
17:22:19:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 16 Model 4 Stepping 3
17:22:19:WU01:FS01:0x22:       CPUs: 4
17:22:19:WU01:FS01:0x22:     Memory: 7.77GiB
17:22:19:WU01:FS01:0x22:Free Memory: 6.94GiB
17:22:19:WU01:FS01:0x22:    Threads: POSIX_THREADS
17:22:19:WU01:FS01:0x22: OS Version: 5.4
17:22:19:WU01:FS01:0x22:Has Battery: false
17:22:19:WU01:FS01:0x22: On Battery: false
17:22:19:WU01:FS01:0x22: UTC Offset: -7
17:22:19:WU01:FS01:0x22:        PID: 1345
17:22:19:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
17:22:19:WU01:FS01:0x22:************************************ OpenMM ************************************
17:22:19:WU01:FS01:0x22:    Version: 7.7.0
17:22:19:WU01:FS01:0x22:********************************************************************************
17:22:19:WU01:FS01:0x22:Project: 18037 (Run 7, Clone 38, Gen 20)
17:22:19:WU01:FS01:0x22:Digital signatures verified
17:22:19:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
17:22:19:WU01:FS01:0x22:Version 0.0.20
17:22:19:WU01:FS01:0x22:  Checkpoint write interval: 250000 steps (5%) [20 total]
17:22:19:WU01:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
17:22:19:WU01:FS01:0x22:  XTC frame write interval: 50000 steps (1%) [100 total]
17:22:19:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
17:22:19:WU01:FS01:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
17:22:19:WU01:FS01:0x22:Please consider upgrading your client version.
17:22:19:WU01:FS01:0x22:There are 3 platforms available.
17:22:19:WU01:FS01:0x22:Platform 0: Reference
17:22:19:WU01:FS01:0x22:Platform 1: OpenCL
17:22:19:WU01:FS01:0x22:  opencl-device -1 specified
17:22:19:WU01:FS01:0x22:Platform 2: CUDA
17:22:19:WU01:FS01:0x22:  cuda-device 0 specified
17:22:23:WU02:FS00:0xa8:Completed 56791 out of 125000 steps (45%)
17:22:26:WU01:FS01:0x22:Attempting to create CUDA context:
17:22:26:WU01:FS01:0x22:  Configuring platform CUDA
17:23:04:WU01:FS01:0x22:  Using CUDA and gpu 0
17:23:04:WU01:FS01:0x22:Completed 500000 out of 5000000 steps (10%)
17:25:09:WU02:FS00:0xa8:Completed 57500 out of 125000 steps (46%)
17:26:05:WU01:FS01:0x22:Completed 550000 out of 5000000 steps (11%)
17:29:07:WU01:FS01:0x22:Completed 600000 out of 5000000 steps (12%)
17:30:23:WU02:FS00:0xa8:Completed 58750 out of 125000 steps (47%)
17:32:12:WU01:FS01:0x22:Completed 650000 out of 5000000 steps (13%)
17:35:15:WU01:FS01:0x22:Completed 700000 out of 5000000 steps (14%)
17:35:44:WU02:FS00:0xa8:Completed 60000 out of 125000 steps (48%)
17:38:18:WU01:FS01:0x22:Completed 750000 out of 5000000 steps (15%)
17:38:33:WU01:FS01:0x22:Checkpoint completed at step 750000
17:40:41:WU02:FS00:0xa8:Completed 61250 out of 125000 steps (49%)
17:41:37:WU01:FS01:0x22:Completed 800000 out of 5000000 steps (16%)
17:44:40:WU01:FS01:0x22:Completed 850000 out of 5000000 steps (17%)
17:45:40:WU02:FS00:0xa8:Completed 62500 out of 125000 steps (50%)
17:47:44:WU01:FS01:0x22:Completed 900000 out of 5000000 steps (18%)
17:50:44:WU02:FS00:0xa8:Completed 63750 out of 125000 steps (51%)
17:50:47:WU01:FS01:0x22:Completed 950000 out of 5000000 steps (19%)
17:53:51:WU01:FS01:0x22:Completed 1000000 out of 5000000 steps (20%)
17:54:05:WU01:FS01:0x22:Checkpoint completed at step 1000000
 
Any thoughts?
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
BobWilliams757
Posts: 493
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: Computer goes blank, won't revive-cold reboot required

Post by BobWilliams757 »

Without a doubt folding can be tough on a system, but I view it more as the best system stability tool available.

Did you do any monitoring of temps and such with the new card and PS?
Fold them if you get them!
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Computer goes blank, won't revive-cold reboot required

Post by Neil-B »

Have you got another system that you could remote monitor this rig - I ask as in the past I have had issues where a sleeping monitor won't come back to life under various circumstances even though the rig was actually still working fine (in those cases updating the monitor firmware and grabbing the latest drives for the monitors cleared my issues once rebooted after updates) - and for inspecting the logs it may be worth checking the previous log to the current working one as it might have messages at the end of it that are relevant.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
aetch
Posts: 447
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: Computer goes blank, won't revive-cold reboot required

Post by aetch »

Every time FAHClient starts it archives the previous log and starts a new one.
You can find the previous logs in the folder - /var/lib/fahclient/logs

Beyond that, I'd suggest a couple of things:-
1). download and install the proprietary driver directly from nvidia/geforce website.
It's an absolute pain to install as it needs to be done from the command line but needs to be done to get your GPU to fold correctly.
This is pretty much how I installed the driver:-
Use the web browser to download the driver to the machine (I was using a desktop linux, use ythe method that suits you best.
Remote in from another machine using SSH/Putty
sudo telinit 3 (this will terminate the desktop session on the linux machine)
navigate to the folder you downloaded the driver to
sudo bash nvidia-linux-x86_64_xxx.xx.run, note:-the install will likely fail due to missing dependencies, take note of and install them
rinse and repeat until the install succeeds
reboot the system
note:- during later operating system updates/upgrades you will very likely have to re-install the driver, I don't know why but they seem to uninstall the driver.

2). disable automatic updates this can be done through the "software & updates" control panel on the desktop, I don't know how to do it from the command line. Automatic updates are the absolute enemy of folders as they interrupt running folding projects and bring them to a crawl.

3). manually set the size of your CPU slot to 2 cores, this is to allow the operating space to operate, without it the operating system is sluggish at best and locks up at worst. Just remember the Phenom II is nearly 15 years old and it needs space to breath.
Folding Rigs - None (25-Jun-2022)

ImageImage
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Re: Computer goes blank, won't revive-cold reboot required

Post by hrsetrdr »

BobWilliams757 wrote: Mon Apr 18, 2022 6:21 pm Without a doubt folding can be tough on a system, but I view it more as the best system stability tool available.

Did you do any monitoring of temps and such with the new card and PS?
I've got the GPU fans set to 70%, GPU temp running 82 C.

lm-sensors reports the system temps as:

Code: Select all

~$ sensors
it8728-isa-0228
Adapter: ISA adapter
in0:           1.32 V  (min =  +0.00 V, max =  +3.06 V)
in1:           1.50 V  (min =  +0.00 V, max =  +3.06 V)
in2:           2.00 V  (min =  +0.00 V, max =  +3.06 V)
+3.3V:         3.34 V  (min =  +0.00 V, max =  +6.12 V)
in4:           1.99 V  (min =  +0.00 V, max =  +3.06 V)
in5:           2.23 V  (min =  +0.00 V, max =  +3.06 V)
in6:           2.23 V  (min =  +0.00 V, max =  +3.06 V)
3VSB:          3.38 V  (min =  +0.00 V, max =  +6.12 V)
Vbat:          3.14 V  
fan1:        3125 RPM  (min =    0 RPM)
fan2:           0 RPM  (min =    0 RPM)
fan3:           0 RPM  (min =    0 RPM)
temp1:        +63.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp2:        +78.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermal diode
temp3:        +94.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = Intel PECI
intrusion0:  ALARM

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +78.0°C  (high = +70.0°C)
                       (crit = +79.0°C, hyst = +77.0°C)
 

"temp3: +94.0°C (low = +127.0°C, high = +127.0°C) sensor = Intel PECI"
Hmmm, +94.0°C seems pretty warm, maybe I better replace the 120mm front case fan, I had unplugged it because it was so loud.
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Re: Computer goes blank, won't revive-cold reboot required

Post by hrsetrdr »

@Neil-B & @aetch,

I got my punch list of tasks to check into, thanks.
another system to remote monitor this rig
download and install the proprietary driver
disable automatic updates
manually set the size of your CPU slot to 2 cores
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Re: Computer goes blank, won't revive-cold reboot required

Post by hrsetrdr »

Wow, cooling [actually lack of] really hit home when I shut the rig down and opened the case. I replaced both the front and rear 120mm fans, am seeing lower temps all around, am still keeping a eye on this machine. If the problem persists I'll likely be blaming the age of the AM3 system, and pull the trigger on parts for a new build.
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Re: Computer goes blank, won't revive-cold reboot required

Post by hrsetrdr »

As a backup or for migration to another machine, can I just copy the .FAHClient directory to a usb drive?
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Computer goes blank, won't revive-cold reboot required

Post by Joe_H »

As backup that should work fine. It is a bit trickier to use for migration. The client is assigned an unique ID the first time it connects to the servers, so as long as the original machine is never started up for folding it should be okay. But having two with the same ID active can cause problems.

The other issue is that the data folder includes configuration information and slot definitions that may not apply to the new machine. On startup the client may reconfigure successfully, but there may be problems.

So in general it would be better to install the client from the downloaded installer. You can use the info stored in the config.xml file from the backup copy as a source to copy and paste your user info such as username, team, and passkey.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Re: Computer goes blank, won't revive-cold reboot required

Post by hrsetrdr »

Well I'm going to throw caution to the wind and say that this problem has been solved. I replaced the original Ubuntu(Mate) install with Ubuntu Studio, which by default come with the XFCE DE. The Nvidia 470.xx.xx driver happened to be installed automatically. So far so good, if the problem returns I'm going to have to regard that as a hardware issue, of course I'll test this [new] rtx2060 in a different machine. But, I've been thinking this AM3 might be showing signs of age, and need to move on.
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
aetch
Posts: 447
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: Computer goes blank, won't revive-cold reboot required

Post by aetch »

You really need to download and install the proprietary driver from the nvidia/geforce website.
The drivers available from the ubuntu repository don't have all the components required to fold.

The GPU section of your system info should look more like this

Code: Select all

21:40:45:            CWD: /var/lib/fahclient
21:40:45:             OS: Linux 5.13.0-30-generic x86_64
21:40:45:        OS Arch: AMD64
21:40:45:           GPUs: 2
21:40:45:          GPU 0: Bus:0 Slot:2 Func:0 INTEL:1 KBL GT2 [HD Graphics 630]
21:40:45:          GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:7 GP102 [GeForce GTX 1080 Ti] 11380
21:40:45:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:11.6
21:40:45:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:3.0 Driver:510.54
with both OpenCL and CUD being available to the GPU.
Folding Rigs - None (25-Jun-2022)

ImageImage
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Re: Computer goes blank, won't revive-cold reboot required

Post by hrsetrdr »

aetch wrote: Wed Apr 20, 2022 6:45 pm You really need to download and install the proprietary driver from the nvidia/geforce website.
The drivers available from the ubuntu repository don't have all the components required to fold.

The GPU section of your system info should look more like this

Code: Select all

21:40:45:            CWD: /var/lib/fahclient
21:40:45:             OS: Linux 5.13.0-30-generic x86_64
21:40:45:        OS Arch: AMD64
21:40:45:           GPUs: 2
21:40:45:          GPU 0: Bus:0 Slot:2 Func:0 INTEL:1 KBL GT2 [HD Graphics 630]
21:40:45:          GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:7 GP102 [GeForce GTX 1080 Ti] 11380
21:40:45:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:11.6
21:40:45:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:3.0 Driver:510.54
with both Open Cl and CUD being available to the GPU.
Hmmm, O.K., mine has CUDA but not OpenCL. How can one tell whether a given driver will have either feature? Here is the latest Production Branch version but nothing is mentioned about Open CL or CUDA.
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Re: Computer goes blank, won't revive-cold reboot required

Post by hrsetrdr »

aetch wrote: Wed Apr 20, 2022 6:45 pm You really need to download and install the proprietary driver from the nvidia/geforce website.
The drivers available from the ubuntu repository don't have all the components required to fold.
What worked to add the Open CL libraries:

Code: Select all

 apt install ocl-icd-opencl-dev
viewtopic.php?t=36824
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
aetch
Posts: 447
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: Computer goes blank, won't revive-cold reboot required

Post by aetch »

If it works it's all good, there's a lot of answers I don't know.
Folding Rigs - None (25-Jun-2022)

ImageImage
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Re: Computer goes blank, won't revive-cold reboot required

Post by hrsetrdr »

aetch wrote: Fri Apr 22, 2022 8:20 am If it works it's all good, there's a lot of answers I don't know.
All thoughts and ideas lead us to solving the common goal. ;)
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
Post Reply