OpenMM bug in NVIDIA CUDA part --> BAD_WORK_UNIT on Linux

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
steffenmoser
Posts: 6
Joined: Fri Nov 07, 2014 10:10 am

OpenMM bug in NVIDIA CUDA part --> BAD_WORK_UNIT on Linux

Post by steffenmoser »

Hi all,

for a few days, I am encountering a problem with running Folding@Home on Linux (OpenSUSE Leap 15.2). My system consists of an old dual-socket AMD Opteron 4386 workstation driving one NVIDIA GPU GP104 [GeForce GTX 1080]. Most probably induced by an OpenSUSE update, NVIDIA's "nvcc" compiler halts with an error when compiling the OpenMM's CUDA kernel. The most interesting lines are the following ones:

Code: Select all

22:01:13:WU00:FS01:0x22:Failed to create CUDA context:
22:01:13:WU00:FS01:0x22:Error launching CUDA compiler: 256
22:01:13:WU00:FS01:0x22:/tmp/openmmTempKernel0x31093790_4120.cu(1003): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
22:01:13:WU00:FS01:0x22:
22:01:13:WU00:FS01:0x22:/tmp/openmmTempKernel0x31093790_4120.cu(1004): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
22:01:13:WU00:FS01:0x22:
22:01:13:WU00:FS01:0x22:2 errors detected in the compilation of "/tmp/openmmTempKernel0x31093790_4120.cu".
Full log output can be found at the end of this posting. My system contains of following software components:
  • Linux distribution: OpenSUSE Leap 15.2, fully patched
  • Linux kernel: 5.3.18-lp152.63-default
  • NVIDIA drivers: NVIDIA-SMI 460.56, Driver Version: 460.56, CUDA Version: 11.2
  • NVCC version: Cuda compilation tools, release 11.2, V11.2.142
  • GCC version: 7.5.0
Downgrading the CUDA tools to 11.1 or 11.0 does not fix it - I see the same error. Whether downgrading GCC does help, I have not found out, yet - as the version I use is the oldest which comes with the distribution. The error seems to be related to a certain OpenMM and GCC version, as there are reports about it in the OpenMM community, as one can find here: https://github.com/openmm/openmm/issues/2648.

Does anybody have an idea what I could do to re-enable GPU folding on my rig? Do any other people see this error?

Thank you very much in advance for any helpful comment!

Kind regards,
Steffen


Here you'll find the full log:

Code: Select all

steffen@blackbird:~/dnet/fah$ ./FAHClient 
21:58:09:Read GPUs.txt
21:58:10:******************************* libFAH ********************************
21:58:10:           Date: Oct 20 2020
21:58:10:           Time: 20:36:41
21:58:10:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
21:58:10:         Branch: master
21:58:10:       Compiler: GNU 4.9.4
21:58:10:        Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections
21:58:10:                 -O3 -funroll-loops
21:58:10:       Platform: linux2 5.8.0-1-amd64
21:58:10:           Bits: 64
21:58:10:           Mode: Release
21:58:10:****************************** FAHClient ******************************
21:58:10:        Version: 7.6.21
21:58:10:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:58:10:      Copyright: 2020 foldingathome.org
21:58:10:       Homepage: https://foldingathome.org/
21:58:10:           Date: Oct 20 2020
21:58:10:           Time: 20:38:59
21:58:10:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
21:58:10:         Branch: master
21:58:10:       Compiler: GNU 4.9.4
21:58:10:        Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections
21:58:10:                 -O3 -funroll-loops
21:58:10:       Platform: linux2 5.8.0-1-amd64
21:58:10:           Bits: 64
21:58:10:           Mode: Release
21:58:10:         Config: /home/steffen/dnet/fah/config.xml
21:58:10:******************************** CBang ********************************
21:58:10:           Date: Oct 20 2020
21:58:10:           Time: 18:38:01
21:58:10:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
21:58:10:         Branch: master
21:58:10:       Compiler: GNU 4.9.4
21:58:10:        Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections
21:58:10:                 -O3 -funroll-loops -fPIC
21:58:10:       Platform: linux2 5.8.0-1-amd64
21:58:10:           Bits: 64
21:58:10:           Mode: Release
21:58:10:******************************* System ********************************
21:58:10:            CPU: AMD Opteron(tm) Processor 4386
21:58:10:         CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
21:58:10:           CPUs: 16
21:58:10:         Memory: 125.84GiB
21:58:10:    Free Memory: 2.00GiB
21:58:10:        Threads: POSIX_THREADS
21:58:10:     OS Version: 5.3
21:58:10:    Has Battery: false
21:58:10:     On Battery: false
21:58:10:     UTC Offset: 1
21:58:10:            PID: 31699
21:58:10:            CWD: /home/steffen/dnet/fah
21:58:10:             OS: Linux 5.3.18-lp152.63-default x86_64
21:58:10:        OS Arch: AMD64
21:58:10:           GPUs: 2
21:58:10:          GPU 0: Bus:3 Slot:0 Func:0 AMD:5 R575A [Radeon R7 250/HD 7700]
21:58:10:          GPU 1: Bus:66 Slot:0 Func:0 NVIDIA:8 GP104 [GeForce GTX 1080] 8873
21:58:10:  CUDA Device 0: Platform:0 Device:0 Bus:66 Slot:0 Compute:6.1 Driver:11.2
21:58:10:OpenCL Device 0: Platform:0 Device:0 Bus:66 Slot:0 Compute:1.2 Driver:460.56
21:58:10:***********************************************************************
21:58:10:<config>
21:58:10:  <!-- Slot Control -->
21:58:10:  <power v='FULL'/>
21:58:10:
21:58:10:  <!-- User Information -->

21:58:10:
21:58:10:  <!-- Folding Slots -->
21:58:10:  <slot id='1' type='GPU'>
21:58:10:    <client-type v='beta'/>
21:58:10:    <max-packet-size v='big'/>
21:58:10:    <pci-bus v='66'/>
21:58:10:    <pci-slot v='0'/>
21:58:10:  </slot>
21:58:10:  <slot id='0' type='CPU'>
21:58:10:    <client-type v='beta'/>
21:58:10:    <cpus v='14'/>
21:58:10:    <max-packet-size v='big'/>
21:58:10:  </slot>
21:58:10:  <slot id='2' type='GPU'>
21:58:10:    <pci-bus v='3'/>
21:58:10:    <pci-slot v='0'/>
21:58:10:  </slot>
21:58:10:</config>
21:58:10:Trying to access database...
21:58:10:Successfully acquired database lock
21:58:10:FS01:Set client configured
21:58:10:FS01:Initialized folding slot 01: gpu:66:0 GP104 [GeForce GTX 1080] 8873
21:58:10:FS00:Initialized folding slot 00: cpu:14
21:58:10:WARNING:FS02:No CUDA or OpenCL 1.2+ support detected for GPU slot 02: gpu:3:0 R575A [Radeon R7 250/HD 7700].  Disabling.
21:58:10:WU00:FS01:Connecting to assign1.foldingathome.org:80
21:58:10:WU01:FS00:Connecting to assign1.foldingathome.org:80
21:58:11:WU00:FS01:Connecting to assign1.foldingathome.org:80
21:58:11:WU01:FS00:Connecting to assign1.foldingathome.org:80
21:58:12:WU00:FS01:Assigned to work server 128.174.73.74
21:58:12:WU00:FS01:Requesting new work unit for slot 01: gpu:66:0 GP104 [GeForce GTX 1080] 8873 from 128.174.73.74
21:58:12:WU00:FS01:Connecting to 128.174.73.74:8080
21:58:12:WU01:FS00:Assigned to work server 128.252.203.2
21:58:12:WU01:FS00:Requesting new work unit for slot 00: cpu:14 from 128.252.203.2
21:58:12:WU01:FS00:Connecting to 128.252.203.2:8080
21:58:13:WU01:FS00:Downloading 5.73MiB
21:58:15:WU01:FS00:Download complete
21:58:15:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:17225 run:0 clone:96 gen:9 core:0xa7 unit:0x0000000a80fccb026040320ce6098ea0
21:58:16:WU01:FS00:Starting
21:58:16:WU01:FS00:Running FahCore: /home/steffen/dnet/fah/FAHCoreWrapper /home/steffen/dnet/fah/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 31699 -checkpoint 15 -np 14
21:58:16:WU01:FS00:Started FahCore on PID 31898
21:58:16:WU01:FS00:Core PID:31902
21:58:16:WU01:FS00:FahCore 0xa7 started
21:58:16:WU01:FS00:0xa7:*********************** Log Started 2021-03-06T21:58:16Z ***********************
21:58:16:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
21:58:16:WU01:FS00:0xa7:       Type: 0xa7
21:58:16:WU01:FS00:0xa7:       Core: Gromacs
21:58:16:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 31898 -checkpoint 15 -np
21:58:16:WU01:FS00:0xa7:             14
21:58:16:WU01:FS00:0xa7:************************************ CBang *************************************
21:58:16:WU01:FS00:0xa7:       Date: Nov 27 2019
21:58:16:WU01:FS00:0xa7:       Time: 11:26:54
21:58:16:WU01:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
21:58:16:WU01:FS00:0xa7:     Branch: master
21:58:16:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
21:58:16:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
21:58:16:WU01:FS00:0xa7:             -fno-pie -fPIC
21:58:16:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
21:58:16:WU01:FS00:0xa7:       Bits: 64
21:58:16:WU01:FS00:0xa7:       Mode: Release
21:58:16:WU01:FS00:0xa7:************************************ System ************************************
21:58:16:WU01:FS00:0xa7:        CPU: AMD Opteron(tm) Processor 4386
21:58:16:WU01:FS00:0xa7:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
21:58:16:WU01:FS00:0xa7:       CPUs: 16
21:58:16:WU01:FS00:0xa7:     Memory: 125.84GiB
21:58:16:WU01:FS00:0xa7:Free Memory: 1.96GiB
21:58:16:WU01:FS00:0xa7:    Threads: POSIX_THREADS
21:58:16:WU01:FS00:0xa7: OS Version: 5.3
21:58:16:WU01:FS00:0xa7:Has Battery: false
21:58:16:WU01:FS00:0xa7: On Battery: false
21:58:16:WU01:FS00:0xa7: UTC Offset: 1
21:58:16:WU01:FS00:0xa7:        PID: 31902
21:58:16:WU01:FS00:0xa7:        CWD: /home/steffen/dnet/fah/work
21:58:16:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
21:58:16:WU01:FS00:0xa7:    Version: 0.0.19
21:58:16:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:58:16:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
21:58:16:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
21:58:16:WU01:FS00:0xa7:       Date: Nov 26 2019
21:58:16:WU01:FS00:0xa7:       Time: 00:41:42
21:58:16:WU01:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
21:58:16:WU01:FS00:0xa7:     Branch: master
21:58:16:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
21:58:16:WU01:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
21:58:16:WU01:FS00:0xa7:             -fno-pie
21:58:16:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
21:58:16:WU01:FS00:0xa7:       Bits: 64
21:58:16:WU01:FS00:0xa7:       Mode: Release
21:58:16:WU01:FS00:0xa7:************************************ Build *************************************
21:58:16:WU01:FS00:0xa7:       SIMD: avx_256
21:58:16:WU01:FS00:0xa7:********************************************************************************
21:58:16:WU01:FS00:0xa7:Project: 17225 (Run 0, Clone 96, Gen 9)
21:58:16:WU01:FS00:0xa7:Unit: 0x0000000a80fccb026040320ce6098ea0
21:58:16:WU01:FS00:0xa7:Reading tar file core.xml
21:58:16:WU01:FS00:0xa7:Reading tar file frame9.tpr
21:58:16:WU01:FS00:0xa7:Digital signatures verified
21:58:16:WU01:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
21:58:16:WU01:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3
21:58:16:WU01:FS00:0xa7:Calling: mdrun -s frame9.tpr -o frame9.trr -x frame9.xtc -cpt 15 -nt 12
21:58:16:WU01:FS00:0xa7:Steps: first=1125000 total=125000
21:58:18:WU01:FS00:0xa7:Completed 1 out of 125000 steps (0%)
21:58:23:WU00:FS01:Downloading 10.83MiB
21:58:28:WU00:FS01:Download complete
21:58:28:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:17714 run:16 clone:0 gen:23 core:0x22 unit:0x00000000000000170000453200000010
21:58:29:WU00:FS01:Starting
21:58:29:WU00:FS01:Running FahCore: /home/steffen/dnet/fah/FAHCoreWrapper /home/steffen/dnet/fah/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 31699 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
21:58:29:WU00:FS01:Started FahCore on PID 32097
21:58:29:WU00:FS01:Core PID:32101
21:58:29:WU00:FS01:FahCore 0x22 started
21:58:29:WU00:FS01:0x22:*********************** Log Started 2021-03-06T21:58:29Z ***********************
21:58:29:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
21:58:29:WU00:FS01:0x22:       Core: Core22
21:58:29:WU00:FS01:0x22:       Type: 0x22
21:58:29:WU00:FS01:0x22:    Version: 0.0.13
21:58:29:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:58:29:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
21:58:29:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
21:58:29:WU00:FS01:0x22:       Date: Sep 19 2020
21:58:29:WU00:FS01:0x22:       Time: 01:10:35
21:58:29:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
21:58:29:WU00:FS01:0x22:     Branch: core22-0.0.13
21:58:29:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
21:58:29:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
21:58:29:WU00:FS01:0x22:             -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
21:58:29:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
21:58:29:WU00:FS01:0x22:       Bits: 64
21:58:29:WU00:FS01:0x22:       Mode: Release
21:58:29:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
21:58:29:WU00:FS01:0x22:             <peastman@stanford.edu>
21:58:29:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 32097 -checkpoint 15
21:58:29:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
21:58:29:WU00:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100
21:58:29:WU00:FS01:0x22:************************************ libFAH ************************************
21:58:29:WU00:FS01:0x22:       Date: Sep 15 2020
21:58:29:WU00:FS01:0x22:       Time: 05:14:43
21:58:29:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
21:58:29:WU00:FS01:0x22:     Branch: HEAD
21:58:29:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
21:58:29:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
21:58:29:WU00:FS01:0x22:             -funroll-loops
21:58:29:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
21:58:29:WU00:FS01:0x22:       Bits: 64
21:58:29:WU00:FS01:0x22:       Mode: Release
21:58:29:WU00:FS01:0x22:************************************ CBang *************************************
21:58:29:WU00:FS01:0x22:       Date: Sep 15 2020
21:58:29:WU00:FS01:0x22:       Time: 05:11:04
21:58:29:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
21:58:29:WU00:FS01:0x22:     Branch: HEAD
21:58:29:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
21:58:29:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
21:58:29:WU00:FS01:0x22:             -funroll-loops -fPIC
21:58:29:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
21:58:29:WU00:FS01:0x22:       Bits: 64
21:58:29:WU00:FS01:0x22:       Mode: Release
21:58:29:WU00:FS01:0x22:************************************ System ************************************
21:58:29:WU00:FS01:0x22:        CPU: AMD Opteron(tm) Processor 4386
21:58:29:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
21:58:29:WU00:FS01:0x22:       CPUs: 16
21:58:29:WU00:FS01:0x22:     Memory: 125.84GiB
21:58:29:WU00:FS01:0x22:Free Memory: 1.69GiB
21:58:29:WU00:FS01:0x22:    Threads: POSIX_THREADS
21:58:29:WU00:FS01:0x22: OS Version: 5.3
21:58:29:WU00:FS01:0x22:Has Battery: false
21:58:29:WU00:FS01:0x22: On Battery: false
21:58:29:WU00:FS01:0x22: UTC Offset: 1
21:58:29:WU00:FS01:0x22:        PID: 32101
21:58:29:WU00:FS01:0x22:        CWD: /home/steffen/dnet/fah/work
21:58:29:WU00:FS01:0x22:************************************ OpenMM ************************************
21:58:29:WU00:FS01:0x22:   Revision: 189320d0
21:58:29:WU00:FS01:0x22:********************************************************************************
21:58:29:WU00:FS01:0x22:Project: 17714 (Run 16, Clone 0, Gen 23)
21:58:29:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
21:58:29:WU00:FS01:0x22:Reading tar file core.xml
21:58:29:WU00:FS01:0x22:Reading tar file integrator.xml
21:58:29:WU00:FS01:0x22:Reading tar file state.xml
21:58:29:WU00:FS01:0x22:Reading tar file system.xml
21:58:30:WU00:FS01:0x22:Digital signatures verified
21:58:30:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
21:58:30:WU00:FS01:0x22:Version 0.0.13
21:58:30:WU00:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
21:58:30:WU00:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
21:58:30:WU00:FS01:0x22:  XTC frame write interval: 25000 steps (2.5%) [40 total]
21:58:30:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
21:58:30:WU00:FS01:0x22:There are 4 platforms available.
21:58:30:WU00:FS01:0x22:Platform 0: Reference
21:58:30:WU00:FS01:0x22:Platform 1: CPU
21:58:30:WU00:FS01:0x22:Platform 2: OpenCL
21:58:30:WU00:FS01:0x22:  opencl-device 0 specified
21:58:30:WU00:FS01:0x22:Platform 3: CUDA
21:58:30:WU00:FS01:0x22:  cuda-device 0 specified
21:58:43:WU00:FS01:0x22:Attempting to create CUDA context:
21:58:43:WU00:FS01:0x22:  Configuring platform CUDA
21:58:45:WU00:FS01:0x22:Failed to create CUDA context:
[b]21:58:45:WU00:FS01:0x22:Error launching CUDA compiler: 256
21:58:45:WU00:FS01:0x22:/tmp/openmmTempKernel0x2b2a77b0_32101.cu(1003): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
21:58:45:WU00:FS01:0x22:
21:58:45:WU00:FS01:0x22:/tmp/openmmTempKernel0x2b2a77b0_32101.cu(1004): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
[/b]21:58:45:WU00:FS01:0x22:
21:58:45:WU00:FS01:0x22:2 errors detected in the compilation of "/tmp/openmmTempKernel0x2b2a77b0_32101.cu".
21:58:45:WU00:FS01:0x22:Attempting to create OpenCL context:
21:58:45:WU00:FS01:0x22:  Configuring platform OpenCL
21:58:45:WU00:FS01:0x22:Failed to create OpenCL context:
21:58:45:WU00:FS01:0x22:This Integrator is already bound to a context
21:58:45:WU00:FS01:0x22:ERROR:125: Failed to create a GPU-enabled OpenMM Context.
21:58:45:WU00:FS01:0x22:Saving result file ../logfile_01.txt
21:58:45:WU00:FS01:0x22:Saving result file science.log
21:58:45:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
21:58:46:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
21:58:46:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:17714 run:16 clone:0 gen:23 core:0x22 unit:0x00000000000000170000453200000010
21:58:46:WU00:FS01:Uploading 2.79KiB to 128.174.73.74
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: OpenMM bug in NVIDIA CUDA part --> BAD_WORK_UNIT on Linu

Post by bruce »

Uninstall the CUDA toolkit and nvcc package. The only part of the nVidia software that's needed is the video driver package that's associated with your GPU. (GPU drivers are customized when they're installed based on the particular type of GPU(s) it detects.

FAHCore_22 downloads a portion of the developer's toolkit that can run with all GPUs greater than Fermi. When the developer toolkit is installed, it creates conflicts with the parts that FAHCoere_22 installs. If you are, in fact, a CUDA developer, some custom changes are needed.)
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: OpenMM bug in NVIDIA CUDA part --> BAD_WORK_UNIT on Linu

Post by JimboPalmer »

Welcome to Folding@Home!

If you have an app that requires the CUDA toolkit, you can set F@H to use OpenCL instead, for about 25% less Points.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
steffenmoser
Posts: 6
Joined: Fri Nov 07, 2014 10:10 am

Re: OpenMM bug in NVIDIA CUDA part --> BAD_WORK_UNIT on Linu

Post by steffenmoser »

bruce wrote:Uninstall the CUDA toolkit and nvcc package. The only part of the nVidia software that's needed is the video driver package that's associated with your GPU. (GPU drivers are customized when they're installed based on the particular type of GPU(s) it detects.

FAHCore_22 downloads a portion of the developer's toolkit that can run with all GPUs greater than Fermi. When the developer toolkit is installed, it creates conflicts with the parts that FAHCoere_22 installs.
Thank you very much - this hint fixed it...
bruce wrote:If you are, in fact, a CUDA developer, some custom changes are needed.)
In fact, I am, but at the moment, I don't develop in CUDA. For this reason, I find it interesting why the CUDA developer framework has not interfered with FAH, before. It has always been installed on this rig. Maybe the update changed any path or environment preferences.

So I think the best thing would be to install the CUDA developer framework into my user home directory when I am going to develop again.

Kind regards,
Steffen
steffenmoser
Posts: 6
Joined: Fri Nov 07, 2014 10:10 am

Re: OpenMM bug in NVIDIA CUDA part --> BAD_WORK_UNIT on Linu

Post by steffenmoser »

JimboPalmer wrote:Welcome to Folding@Home!
Thank you. As far as I know, I've been folding since 2011. :-)
JimboPalmer wrote:If you have an app that requires the CUDA toolkit, you can set F@H to use OpenCL instead, for about 25% less Points.
O.K., that should also be an option. Thank you very much. If I need to do CUDA development, I'll try if moving the whole NVIDIA developer framework to my user's (or another account's) home directory might actually fix it. If not, I'll go for the way you mentioned.

Kind regards,
Steffen
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: OpenMM bug in NVIDIA CUDA part --> BAD_WORK_UNIT on Linu

Post by JimboPalmer »

JimboPalmer wrote:If you have an app that requires the CUDA toolkit, you can set F@H to use OpenCL instead, for about 25% less Points.
steffenmoser wrote:O.K., that should also be an option.
ajm tells us: You can disable CUDA in the Expert options:
FAHControl -> Configure -> Expert -> Click Add under Extra Core Options -> -disable-cuda -> OK -> Save

I hope it is not needed, but it is good to have ways to customize F@H.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: OpenMM bug in NVIDIA CUDA part --> BAD_WORK_UNIT on Linu

Post by Joe_H »

steffenmoser wrote:In fact, I am, but at the moment, I don't develop in CUDA. For this reason, I find it interesting why the CUDA developer framework has not interfered with FAH, before. It has always been installed on this rig. Maybe the update changed any path or environment preferences.
I understand the developers for the Core_22 folding core have been in contact with nVidia about this, the core should be using its copy of the runtime and not what is provided from the dev kit. Last I saw anything on this it was not clear if the problem was in the OpenMM code compiled into the working core, or something related to the dev kit installation.

To support the range of GPUs from Kepler to current models and use CUDA, they had to use specific versions of the runtime. Possibly what you had installed was compatible, then the update changed that to an incompatible version.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply