Page 1 of 1

NVIDIA GPU units now failing on CENTOS7

Posted: Mon Jan 03, 2022 2:36 pm
by stfarley
I have been using this hardware for over a year.
Recently the GPU work units have been failing.
If I pause and restart the slot it will eventually get a work unit that succeeds.
I have installed the latest NVIDIA drivers

Here is a sample from the log

Code: Select all

14:26:38:WU02:FS02:Sending unit results: id:02 state:SEND error:FAILED project:18432 run:93 clone:4 gen:149 core:0x22 unit:0x0000000400000095000048000000005d
14:26:38:WU02:FS02:Connecting to 129.32.209.202:8080
14:26:38:WU01:FS02:Connecting to assign1.foldingathome.org:80
14:26:38:WU02:FS02:Server responded WORK_ACK (400)
14:26:38:WU02:FS02:Cleaning up
14:26:39:WU01:FS02:Assigned to work server 34.72.228.44
14:26:39:WU01:FS02:Requesting new work unit for slot 02: gpu:1:0 TU106 [Geforce RTX 2060] from 34.72.228.44
14:26:39:WU01:FS02:Connecting to 34.72.228.44:8080
14:26:39:WU01:FS02:Downloading 8.27MiB
14:26:45:WU01:FS02:Download 100.00%
14:26:45:WU01:FS02:Download complete
14:26:45:WU01:FS02:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:18021 run:22 clone:10 gen:70 core:0x22 unit:0x0000000a000000460000466500000016
14:26:45:WU01:FS02:Starting
14:26:45:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /root/folding/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 20495 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
14:26:45:WU01:FS02:Started FahCore on PID 31517
14:26:45:WU01:FS02:Core PID:31521
14:26:45:WU01:FS02:FahCore 0x22 started
14:26:45:WARNING:WU01:FS02:FahCore returned: FAILED_2 (1 = 0x1)
14:26:45:WU01:FS02:Starting
14:26:45:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /root/folding/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 20495 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
14:26:45:WU01:FS02:Started FahCore on PID 31523
14:26:45:WU01:FS02:Core PID:31527
14:26:45:WU01:FS02:FahCore 0x22 started
14:26:46:WARNING:WU01:FS02:FahCore returned: FAILED_2 (1 = 0x1)

Re: NVIDIA GPU units now failing on CENTOS7

Posted: Mon Jan 03, 2022 5:02 pm
by Joe_H
The recent update to Core_22 uses a newer glibc than is available in CentOS 7. See this topic - viewtopic.php?f=74&t=37598.

They are looking into it, there may a fixed Core_22 version out in the near future that will use the version of glibc available on CentOS 7.

Re: NVIDIA GPU units now failing on CENTOS7

Posted: Tue Jan 04, 2022 10:35 am
by toTOW
This issue will be fixed with the upcoming v0.0.19 of Core 22 ...

To avoid future issues like this, it would be a good idea to update to a distribution that provides more frequent updates.