NVIDIA GPU units now failing on CENTOS7

Moderators: Site Moderators, FAHC Science Team

Post Reply
stfarley
Posts: 1
Joined: Fri Nov 19, 2021 1:18 pm

NVIDIA GPU units now failing on CENTOS7

Post by stfarley »

I have been using this hardware for over a year.
Recently the GPU work units have been failing.
If I pause and restart the slot it will eventually get a work unit that succeeds.
I have installed the latest NVIDIA drivers

Here is a sample from the log

Code: Select all

14:26:38:WU02:FS02:Sending unit results: id:02 state:SEND error:FAILED project:18432 run:93 clone:4 gen:149 core:0x22 unit:0x0000000400000095000048000000005d
14:26:38:WU02:FS02:Connecting to 129.32.209.202:8080
14:26:38:WU01:FS02:Connecting to assign1.foldingathome.org:80
14:26:38:WU02:FS02:Server responded WORK_ACK (400)
14:26:38:WU02:FS02:Cleaning up
14:26:39:WU01:FS02:Assigned to work server 34.72.228.44
14:26:39:WU01:FS02:Requesting new work unit for slot 02: gpu:1:0 TU106 [Geforce RTX 2060] from 34.72.228.44
14:26:39:WU01:FS02:Connecting to 34.72.228.44:8080
14:26:39:WU01:FS02:Downloading 8.27MiB
14:26:45:WU01:FS02:Download 100.00%
14:26:45:WU01:FS02:Download complete
14:26:45:WU01:FS02:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:18021 run:22 clone:10 gen:70 core:0x22 unit:0x0000000a000000460000466500000016
14:26:45:WU01:FS02:Starting
14:26:45:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /root/folding/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 20495 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
14:26:45:WU01:FS02:Started FahCore on PID 31517
14:26:45:WU01:FS02:Core PID:31521
14:26:45:WU01:FS02:FahCore 0x22 started
14:26:45:WARNING:WU01:FS02:FahCore returned: FAILED_2 (1 = 0x1)
14:26:45:WU01:FS02:Starting
14:26:45:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /root/folding/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 20495 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
14:26:45:WU01:FS02:Started FahCore on PID 31523
14:26:45:WU01:FS02:Core PID:31527
14:26:45:WU01:FS02:FahCore 0x22 started
14:26:46:WARNING:WU01:FS02:FahCore returned: FAILED_2 (1 = 0x1)
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: NVIDIA GPU units now failing on CENTOS7

Post by Joe_H »

The recent update to Core_22 uses a newer glibc than is available in CentOS 7. See this topic - viewtopic.php?f=74&t=37598.

They are looking into it, there may a fixed Core_22 version out in the near future that will use the version of glibc available on CentOS 7.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: NVIDIA GPU units now failing on CENTOS7

Post by toTOW »

This issue will be fixed with the upcoming v0.0.19 of Core 22 ...

To avoid future issues like this, it would be a good idea to update to a distribution that provides more frequent updates.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply