no more cuda computing

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
promeneur
Posts: 198
Joined: Tue Aug 07, 2012 11:59 am
Hardware configuration: openSUSE Tumbleweed, x86_64,Asrock B760M-HDV/M.2 D4, Intel Core i3-12100, 16 GB, Intel UHD Graphics 730, NVIDIA GeForce GT 1030, Edup-Love EP-9651GS Wi-Fi Bluetooth, multicard reader USB 3.0 startech.com 35fcreadbu3, Epson XP 7100, Headset Bluetooth 3.0 Philips SHQ7300

no more cuda computing

Post by promeneur »

Today, after one week without any problem, nvidia gpu slot diabled and no more igp slot

the log

Code: Select all

*********************** Log Started 2021-11-23T07:01:42Z ***********************
07:01:42:******************************* libFAH ********************************
07:01:42:           Date: Oct 20 2020
07:01:42:           Time: 20:36:41
07:01:42:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
07:01:42:         Branch: master
07:01:42:       Compiler: GNU 4.9.4
07:01:42:        Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections
07:01:42:                 -O3 -funroll-loops
07:01:42:       Platform: linux2 5.8.0-1-amd64
07:01:42:           Bits: 64
07:01:42:           Mode: Release
07:01:42:****************************** FAHClient ******************************
07:01:42:        Version: 7.6.21
07:01:42:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
07:01:42:      Copyright: 2020 foldingathome.org
07:01:42:       Homepage: https://foldingathome.org/
07:01:42:           Date: Oct 20 2020
07:01:42:           Time: 20:38:59
07:01:42:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
07:01:42:         Branch: master
07:01:42:       Compiler: GNU 4.9.4
07:01:42:        Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections
07:01:42:                 -O3 -funroll-loops
07:01:42:       Platform: linux2 5.8.0-1-amd64
07:01:42:           Bits: 64
07:01:42:           Mode: Release
07:01:42:           Args: /etc/fahclient/config.xml
07:01:42:                 --pid-file=/run/fahclient/fahclient.pid
07:01:42:         Config: /etc/fahclient/config.xml
07:01:42:******************************** CBang ********************************
07:01:42:           Date: Oct 20 2020
07:01:42:           Time: 18:38:01
07:01:42:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
07:01:42:         Branch: master
07:01:42:       Compiler: GNU 4.9.4
07:01:42:        Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections
07:01:42:                 -O3 -funroll-loops -fPIC
07:01:42:       Platform: linux2 5.8.0-1-amd64
07:01:42:           Bits: 64
07:01:42:           Mode: Release
07:01:42:******************************* System ********************************
07:01:42:            CPU: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
07:01:42:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
07:01:42:           CPUs: 4
07:01:42:         Memory: 15.50GiB
07:01:42:    Free Memory: 14.59GiB
07:01:42:        Threads: POSIX_THREADS
07:01:42:     OS Version: 5.3
07:01:42:    Has Battery: false
07:01:42:     On Battery: false
07:01:42:     UTC Offset: 1
07:01:42:            PID: 1875
07:01:42:            CWD: /var/lib/fahclient
07:01:42:             OS: Linux 5.3.18-59.34-default x86_64
07:01:42:        OS Arch: AMD64
07:01:42:           GPUs: 1
07:01:42:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:1
07:01:42:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:3.5 Driver:11.4
07:01:42:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:3.0 Driver:470.86
07:01:42:OpenCL Device 1: Platform:1 Device:0 Bus:NA Slot:NA Compute:2.0 Driver:1.3
07:01:42:***********************************************************************
07:01:42:<config>
07:01:42:  <!-- Network -->
07:01:42:  <proxy v=':8080'/>
07:01:42:
07:01:42:  <!-- Slot Control -->
07:01:42:  <power v='FULL'/>
07:01:42:
07:01:42:  <!-- User Information -->
07:01:42:  <passkey v='*****'/>
07:01:42:  <team v='51'/>
07:01:42:  <user v='philippe_roubach'/>
07:01:42:
07:01:42:  <!-- Folding Slots -->
07:01:42:  <slot id='0' type='CPU'/>
07:01:42:  <slot id='2' type='GPU'>
07:01:42:    <pci-bus v='1'/>
07:01:42:    <pci-slot v='0'/>
07:01:42:  </slot>
07:01:42:  <slot id='1' type='GPU'>
07:01:42:    <gpu-beta v='True'/>
07:01:42:    <opencl-index v='0'/>
07:01:42:    <paused v='true'/>
07:01:42:    <pci-bus v='0'/>
07:01:42:    <pci-slot v='2'/>
07:01:42:  </slot>
07:01:42:</config>
07:01:42:Trying to access database...
07:01:42:Successfully acquired database lock
07:01:42:ERROR:GPU with PCI bus 0 and slot 2 not found.: Deleting folding slot.
07:01:42:FS00:Initialized folding slot 00: cpu:3
07:01:42:WARNING:FS02:Disabling beta GPU slot 02: gpu:1:0.  Beta GPUs can be tested for no points by setting ``gpu-beta=true`` in the configuration.
07:01:42:WARNING:FS01:``opencl-index`` 0 did not match GPU
07:01:42:WARNING:FS01:No CUDA or OpenCL 1.2+ support detected for GPU slot 01: gpu:-1:-1.  Disabling.
07:01:42:WARNING:WU03:No longer matches Slot 2's configuration and there are no other matching slots, dumping
07:01:42:WU03:FS02:Sending unit results: id:03 state:SEND error:DUMPED project:16608 run:128 clone:1 gen:109 core:0x22 unit:0x000000010000006d000040e000000080
07:01:42:WU03:FS02:Connecting to 66.170.111.50:8080
07:01:42:WU01:FS00:Starting
07:01:42:WARNING:WU01:FS00:Changed SMP threads from 2 to 3 this can cause some work units to fail
07:01:42:WARNING:WU01:FS00:AS lowered CPUs from 3 to 2
07:01:42:WARNING:WU03:FS02:WorkServer connection failed on port 8080 trying 80
07:01:42:WU03:FS02:Connecting to 66.170.111.50:80
07:01:42:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8 -dir 01 -suffix 01 -version 706 -lifeline 1875 -checkpoint 15 -np 2
07:01:42:WU01:FS00:Started FahCore on PID 1975
07:01:42:WU01:FS00:Core PID:1979
07:01:42:WU01:FS00:FahCore 0xa8 started
07:01:43:WU01:FS00:0xa8:*********************** Log Started 2021-11-23T07:01:42Z ***********************
07:01:43:WU01:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
07:01:43:WU01:FS00:0xa8:       Core: Gromacs
07:01:43:WU01:FS00:0xa8:       Type: 0xa8
07:01:43:WU01:FS00:0xa8:    Version: 0.0.12
07:01:43:WU01:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
07:01:43:WU01:FS00:0xa8:  Copyright: 2020 foldingathome.org
07:01:43:WU01:FS00:0xa8:   Homepage: https://foldingathome.org/
07:01:43:WU01:FS00:0xa8:       Date: Jan 16 2021
07:01:43:WU01:FS00:0xa8:       Time: 19:24:44
07:01:43:WU01:FS00:0xa8:   Compiler: GNU 8.3.0
07:01:43:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
07:01:43:WU01:FS00:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
07:01:43:WU01:FS00:0xa8:   Platform: linux2 4.15.0-128-generic
07:01:43:WU01:FS00:0xa8:       Bits: 64
07:01:43:WU01:FS00:0xa8:       Mode: Release
07:01:43:WU01:FS00:0xa8:       SIMD: avx2_256
07:01:43:WU01:FS00:0xa8:     OpenMP: ON
07:01:43:WU01:FS00:0xa8:       CUDA: OFF
07:01:43:WU01:FS00:0xa8:       Args: -dir 01 -suffix 01 -version 706 -lifeline 1975 -checkpoint 15 -np 2
07:01:43:WU01:FS00:0xa8:************************************ libFAH ************************************
07:01:43:WU01:FS00:0xa8:       Date: Jan 16 2021
07:01:43:WU01:FS00:0xa8:       Time: 19:21:38
07:01:43:WU01:FS00:0xa8:   Compiler: GNU 8.3.0
07:01:43:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
07:01:43:WU01:FS00:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
07:01:43:WU01:FS00:0xa8:   Platform: linux2 4.15.0-128-generic
07:01:43:WU01:FS00:0xa8:       Bits: 64
07:01:43:WU01:FS00:0xa8:       Mode: Release
07:01:43:WU01:FS00:0xa8:************************************ CBang *************************************
07:01:43:WU01:FS00:0xa8:       Date: Jan 16 2021
07:01:43:WU01:FS00:0xa8:       Time: 19:21:24
07:01:43:WU01:FS00:0xa8:   Compiler: GNU 8.3.0
07:01:43:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
07:01:43:WU01:FS00:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
07:01:43:WU01:FS00:0xa8:   Platform: linux2 4.15.0-128-generic
07:01:43:WU01:FS00:0xa8:       Bits: 64
07:01:43:WU01:FS00:0xa8:       Mode: Release
07:01:43:WU01:FS00:0xa8:************************************ System ************************************
07:01:43:WU01:FS00:0xa8:        CPU: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
07:01:43:WU01:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
07:01:43:WU01:FS00:0xa8:       CPUs: 4
07:01:43:WU01:FS00:0xa8:     Memory: 15.50GiB
07:01:43:WU01:FS00:0xa8:Free Memory: 14.33GiB
07:01:43:WU01:FS00:0xa8:    Threads: POSIX_THREADS
07:01:43:WU01:FS00:0xa8: OS Version: 5.3
07:01:43:WU01:FS00:0xa8:Has Battery: false
07:01:43:WU01:FS00:0xa8: On Battery: false
07:01:43:WU01:FS00:0xa8: UTC Offset: 1
07:01:43:WU01:FS00:0xa8:        PID: 1979
07:01:43:WU01:FS00:0xa8:        CWD: /var/lib/fahclient/work
07:01:43:WU01:FS00:0xa8:********************************************************************************
07:01:43:WU01:FS00:0xa8:Project: 18210 (Run 21210, Clone 0, Gen 31)
07:01:43:WU01:FS00:0xa8:Unit: 0x00000000000000000000000000000000
07:01:43:WU01:FS00:0xa8:Digital signatures verified
07:01:43:WU01:FS00:0xa8:Calling: mdrun -c frame31.gro -s frame31.tpr -x frame31.xtc -cpi state.cpt -cpt 15 -nt 2 -ntmpi 1
07:01:43:WU01:FS00:0xa8:Steps: first=3875000 total=4000000
07:01:44:WARNING:WU03:FS02:Exception: Failed to send results to work server: Failed to connect to 66.170.111.50:80: Network is unreachable
07:01:44:WU03:FS02:Trying to send results to collection server
07:01:44:WU03:FS02:Connecting to 128.252.203.13:8080
07:01:44:WARNING:WU03:FS02:WorkServer connection failed on port 8080 trying 80
07:01:44:WU03:FS02:Connecting to 128.252.203.13:80
07:01:45:ERROR:WU03:FS02:Exception: Failed to connect to 128.252.203.13:80: Network is unreachable
07:01:45:WU03:FS02:Sending unit results: id:03 state:SEND error:DUMPED project:16608 run:128 clone:1 gen:109 core:0x22 unit:0x000000010000006d000040e000000080
07:01:45:WU03:FS02:Connecting to 66.170.111.50:8080
07:01:45:WARNING:WU03:FS02:WorkServer connection failed on port 8080 trying 80
07:01:45:WU03:FS02:Connecting to 66.170.111.50:80
07:01:46:WARNING:WU03:FS02:Exception: Failed to send results to work server: Failed to connect to 66.170.111.50:80: Network is unreachable
07:01:46:WU03:FS02:Trying to send results to collection server
07:01:46:WU03:FS02:Connecting to 128.252.203.13:8080
07:01:46:WARNING:WU03:FS02:WorkServer connection failed on port 8080 trying 80
07:01:46:WU03:FS02:Connecting to 128.252.203.13:80
07:01:47:ERROR:WU03:FS02:Exception: Failed to connect to 128.252.203.13:80: Network is unreachable
07:01:49:WU01:FS00:0xa8:Completed 102222 out of 125000 steps (81%)
07:02:43:Removing old file 'configs/config-20211117-130008.xml'
07:02:43:Saving configuration to /etc/fahclient/config.xml
07:02:43:<config>
07:02:43:  <!-- Network -->
07:02:43:  <proxy v=':8080'/>
07:02:43:
07:02:43:  <!-- Slot Control -->
07:02:43:  <power v='FULL'/>
07:02:43:
07:02:43:  <!-- User Information -->
07:02:43:  <passkey v='*****'/>
07:02:43:  <team v='51'/>
07:02:43:  <user v='philippe_roubach'/>
07:02:43:
07:02:43:  <!-- Folding Slots -->
07:02:43:  <slot id='0' type='CPU'/>
07:02:43:  <slot id='2' type='GPU'>
07:02:43:    <pci-bus v='1'/>
07:02:43:    <pci-slot v='0'/>
07:02:43:  </slot>
07:02:43:</config>
07:02:46:WU03:FS02:Sending unit results: id:03 state:SEND error:DUMPED project:16608 run:128 clone:1 gen:109 core:0x22 unit:0x000000010000006d000040e000000080
07:02:46:WU03:FS02:Connecting to 66.170.111.50:8080
07:02:46:WU03:FS02:Server responded WORK_ACK (400)
07:02:46:WU03:FS02:Cleaning up
07:05:10:WU01:FS00:0xa8:Completed 102500 out of 125000 steps (82%)
07:20:14:WU01:FS00:0xa8:Completed 103750 out of 125000 steps (83%)
07:36:09:WU01:FS00:0xa8:Completed 105000 out of 125000 steps (84%)
07:53:04:WU01:FS00:0xa8:Completed 106250 out of 125000 steps (85%)
08:08:05:WU01:FS00:0xa8:Completed 107500 out of 125000 steps (86%)
08:23:19:WU01:FS00:0xa8:Completed 108750 out of 125000 steps (87%)
08:38:25:WU01:FS00:0xa8:Completed 110000 out of 125000 steps (88%)

Image
promeneur
Posts: 198
Joined: Tue Aug 07, 2012 11:59 am
Hardware configuration: openSUSE Tumbleweed, x86_64,Asrock B760M-HDV/M.2 D4, Intel Core i3-12100, 16 GB, Intel UHD Graphics 730, NVIDIA GeForce GT 1030, Edup-Love EP-9651GS Wi-Fi Bluetooth, multicard reader USB 3.0 startech.com 35fcreadbu3, Epson XP 7100, Headset Bluetooth 3.0 Philips SHQ7300

Re: no more cuda computing

Post by promeneur »

ok

i deleted the nvidia slot
i restarted fahclient
then
no more problem. igpu (intel) and dgpu (nvidia) slots are well detected and came back.
Image
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: no more cuda computing

Post by toTOW »

Make sure you have access to the Internet when you start the client. If the client fails to download GPUs.txt at startup, all GPUs will be disabled until it can download it again.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
promeneur
Posts: 198
Joined: Tue Aug 07, 2012 11:59 am
Hardware configuration: openSUSE Tumbleweed, x86_64,Asrock B760M-HDV/M.2 D4, Intel Core i3-12100, 16 GB, Intel UHD Graphics 730, NVIDIA GeForce GT 1030, Edup-Love EP-9651GS Wi-Fi Bluetooth, multicard reader USB 3.0 startech.com 35fcreadbu3, Epson XP 7100, Headset Bluetooth 3.0 Philips SHQ7300

Re: no more cuda computing

Post by promeneur »

toTOW wrote: If the client fails to download GPUs.txt at startup, all GPUs will be disabled until it can download it again.
This is not the case. The cpu slot is not disabled.
toTOW wrote:Make sure you have access to the Internet when you start the client.
The user has no access to settings to make sure internet access is activated when fahclient starts.

I assume we must modify the systemd fahclient service file according fahclient waits for the internet access for starting. It is too techie.

I think fah@home must supply such a service file.
Image
promeneur
Posts: 198
Joined: Tue Aug 07, 2012 11:59 am
Hardware configuration: openSUSE Tumbleweed, x86_64,Asrock B760M-HDV/M.2 D4, Intel Core i3-12100, 16 GB, Intel UHD Graphics 730, NVIDIA GeForce GT 1030, Edup-Love EP-9651GS Wi-Fi Bluetooth, multicard reader USB 3.0 startech.com 35fcreadbu3, Epson XP 7100, Headset Bluetooth 3.0 Philips SHQ7300

Re: no more cuda computing

Post by promeneur »

In the fahclient service file, I found this:

[Unit]
Description=Folding@Home V7 Client
Documentation=https://foldingathome.org/support/faq/i ... des/linux/
After=syslog.target network-online.target remote-fs.target
Wants=network-online.target

I guess fahclient waits for the network is activated.

I conclude that the problem is not a network problem. No ?
Image
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: no more cuda computing

Post by toTOW »

promeneur wrote:
toTOW wrote: If the client fails to download GPUs.txt at startup, all GPUs will be disabled until it can download it again.
This is not the case. The cpu slot is not disabled.
The file is called GPUs.txt ... :roll:
promeneur wrote:In the fahclient service file, I found this:

[Unit]
Description=Folding@Home V7 Client
Documentation=https://foldingathome.org/support/faq/i ... des/linux/
After=syslog.target network-online.target remote-fs.target
Wants=network-online.target

I guess fahclient waits for the network is activated.

I conclude that the problem is not a network problem. No ?
It's not because network manager service is started that the link to Internet is operational ... it might wait for user actions (like login to a portal or something) or for the wireless network to be in range ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply