Another BAD_WORK_UNIT (114 = 0x72) - why?

Moderators: Site Moderators, FAHC Science Team

Post Reply
kwthom
Posts: 29
Joined: Sun Mar 29, 2020 11:06 pm
Location: Jaynes Station, AZ

Another BAD_WORK_UNIT (114 = 0x72) - why?

Post by kwthom »

Potential stability issues?

This has happened a few times; I finally caught the log the last time it happened - after my computer crashed & rebooted:

Thoughts?

Thanks!

Code: Select all

*********************** Log Started 2020-04-29T19:21:19Z ***********************
19:21:55:FS01:Unpaused
19:21:56:WU00:FS01:Starting
19:21:56:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\kwtho\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 5532 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:21:56:WU00:FS01:Started FahCore on PID 11208
19:21:56:WU00:FS01:Core PID:11232
19:21:56:WU00:FS01:FahCore 0x22 started
19:21:56:WU00:FS01:0x22:*********************** Log Started 2020-04-29T19:21:56Z ***********************
19:21:56:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
19:21:56:WU00:FS01:0x22:       Type: 0x22
19:21:56:WU00:FS01:0x22:       Core: Core22
19:21:56:WU00:FS01:0x22:    Website: https://foldingathome.org/
19:21:56:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
19:21:56:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
19:21:56:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
19:21:56:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 11208 -checkpoint 15
19:21:56:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:21:56:WU00:FS01:0x22:     Config: <none>
19:21:56:WU00:FS01:0x22:************************************ Build *************************************
19:21:56:WU00:FS01:0x22:    Version: 0.0.2
19:21:56:WU00:FS01:0x22:       Date: Dec 6 2019
19:21:56:WU00:FS01:0x22:       Time: 21:30:31
19:21:56:WU00:FS01:0x22: Repository: Git
19:21:56:WU00:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
19:21:56:WU00:FS01:0x22:     Branch: HEAD
19:21:56:WU00:FS01:0x22:   Compiler: Visual C++ 2008
19:21:56:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
19:21:56:WU00:FS01:0x22:   Platform: win32 10
19:21:56:WU00:FS01:0x22:       Bits: 64
19:21:56:WU00:FS01:0x22:       Mode: Release
19:21:56:WU00:FS01:0x22:************************************ System ************************************
19:21:56:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
19:21:56:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
19:21:56:WU00:FS01:0x22:       CPUs: 6
19:21:56:WU00:FS01:0x22:     Memory: 15.93GiB
19:21:56:WU00:FS01:0x22:Free Memory: 13.10GiB
19:21:56:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
19:21:56:WU00:FS01:0x22: OS Version: 6.2
19:21:56:WU00:FS01:0x22:Has Battery: false
19:21:56:WU00:FS01:0x22: On Battery: false
19:21:56:WU00:FS01:0x22: UTC Offset: -7
19:21:56:WU00:FS01:0x22:        PID: 11232
19:21:56:WU00:FS01:0x22:        CWD: C:\Users\kwtho\AppData\Roaming\FAHClient\work
19:21:56:WU00:FS01:0x22:         OS: Windows 10 Pro
19:21:56:WU00:FS01:0x22:    OS Arch: AMD64
19:21:56:WU00:FS01:0x22:********************************************************************************
19:21:56:WU00:FS01:0x22:Project: 16435 (Run 2194, Clone 1, Gen 2)
19:21:56:WU00:FS01:0x22:Unit: 0x0000000303854c135e9a4ef8ab85c093
19:21:56:WU00:FS01:0x22:Digital signatures verified
19:21:56:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:21:56:WU00:FS01:0x22:Version 0.0.2
19:21:56:WU00:FS01:0x22:  Found a checkpoint file
19:22:04:WU00:FS01:0x22:ERROR:Guru Meditation #0.3153f6969d0b62 (7.7) '00/01/stepsDone'
19:22:04:WU00:FS01:0x22:WARNING:Unexpected exit() call
19:22:04:WU00:FS01:0x22:WARNING:Unexpected exit from science code
19:22:04:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
19:22:04:WU00:FS01:0x22:Saving result file checkpointState.xml
19:22:04:WU00:FS01:0x22:Saving result file checkpt.crc
19:22:04:WU00:FS01:0x22:Saving result file positions.xtc
19:22:04:WU00:FS01:0x22:Saving result file science.log
19:22:04:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
19:22:05:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
19:22:05:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:16435 run:2194 clone:1 gen:2 core:0x22 unit:0x0000000303854c135e9a4ef8ab85c093
19:22:05:WU00:FS01:Uploading 57.13MiB to 3.133.76.19
19:22:05:WU00:FS01:Connecting to 3.133.76.19:8080
19:22:05:WU01:FS01:Connecting to 65.254.110.245:80
19:22:06:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
19:22:06:WU01:FS01:Connecting to 18.218.241.186:80
19:22:07:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
19:22:07:WU01:FS01:Connecting to 65.254.110.245:80
19:22:07:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
19:22:07:WU01:FS01:Connecting to 18.218.241.186:80
19:22:07:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
19:22:07:ERROR:WU01:FS01:Exception: Could not get an assignment
19:22:08:WU01:FS01:Connecting to 65.254.110.245:80
19:22:08:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
19:22:08:WU01:FS01:Connecting to 18.218.241.186:80
19:22:08:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
19:22:08:WU01:FS01:Connecting to 65.254.110.245:80
19:22:08:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
19:22:08:WU01:FS01:Connecting to 18.218.241.186:80
19:22:09:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
19:22:09:ERROR:WU01:FS01:Exception: Could not get an assignment
19:22:11:WU00:FS01:Upload 6.02%
19:23:29:WU00:FS01:Upload 98.24%
19:23:30:WU00:FS01:Upload complete
19:23:30:WU00:FS01:Server responded WORK_ACK (400)
19:23:30:WU00:FS01:Cleaning up
Image
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Post by Neil-B »

Up Front: There are a number of issues currently in lay that link with this Project and those servers which may/may no relate to the issue you are having and the team have been working on these for number of days and have yet to resolve them.

However, the type of failure I am seeing (and I am not a GPU specialist) is similar to some where people have been having one of a variety of issues with their GPUs - hopefully one of the GPU folders will step in and advise.

Could you repost your log and include the first couple of hundred lines of the log as there may be some clues in the configuration settings that will display there … Given someone is likely to ask I'll mention it now, Is you GPU running at Stock speeds, Stock OC speeds or Bespoke OC speeds?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
kwthom
Posts: 29
Joined: Sun Mar 29, 2020 11:06 pm
Location: Jaynes Station, AZ

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Post by kwthom »

I've paused GPU processing for the last ~24 hours, so all it's got at the moment is CPU efforts.

I'll need to revisit in a few hours for the GPU details, but I've been under-clocking it for the last week, due to ambient heat issues (it's now hot in the desert southwest...)

Thanks!
Image
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Post by Neil-B »

Can you send some of your heat please over to the UK please? … Cold, wet , miserable at the moment - so typical British weather I suppose … even with my server toasting my toes it feels sub optimal in the office.

Hope someone manages to help and get this sorted for you.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Post by PantherX »

Can you please post the first 100 lines of your log file so we can see what hardware the client has detected and how its configured?

Also, have you configured an exception for the client files from your anti-virus/anti-malware/anit-spyware/anti-ransomeware software?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
kwthom
Posts: 29
Joined: Sun Mar 29, 2020 11:06 pm
Location: Jaynes Station, AZ

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Post by kwthom »

PantherX wrote:Can you please post the first 100 lines of your log file so we can see what hardware the client has detected and how its configured?

Code: Select all

*********************** Log Started 2020-04-30T21:02:59Z ***********************
21:02:59:****************************** FAHClient ******************************
21:02:59:        Version: 7.6.9
21:02:59:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:02:59:      Copyright: 2020 foldingathome.org
21:02:59:       Homepage: https://foldingathome.org/
21:02:59:           Date: Apr 17 2020
21:02:59:           Time: 11:13:06
21:02:59:       Revision: 398c2b17fa535e0cc6c9d10856b2154c32771646
21:02:59:         Branch: master
21:02:59:       Compiler: Visual C++ 2008
21:02:59:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:02:59:       Platform: win32 10
21:02:59:           Bits: 32
21:02:59:           Mode: Release
21:02:59:           Args: --open-web-control
21:02:59:         Config: C:\Users\kwtho\AppData\Roaming\FAHClient\config.xml
21:02:59:******************************** CBang ********************************
21:02:59:           Date: Apr 17 2020
21:02:59:           Time: 11:10:09
21:02:59:       Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
21:02:59:         Branch: master
21:02:59:       Compiler: Visual C++ 2008
21:02:59:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:02:59:       Platform: win32 10
21:02:59:           Bits: 32
21:02:59:           Mode: Release
21:02:59:******************************* System ********************************
21:02:59:            CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
21:02:59:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
21:02:59:           CPUs: 6
21:02:59:         Memory: 15.93GiB
21:02:59:    Free Memory: 12.05GiB
21:02:59:        Threads: WINDOWS_THREADS
21:02:59:     OS Version: 6.2
21:02:59:    Has Battery: false
21:02:59:     On Battery: false
21:02:59:     UTC Offset: -7
21:02:59:            PID: 6296
21:02:59:            CWD: C:\Users\kwtho\AppData\Roaming\FAHClient
21:02:59:             OS: Windows 10 Enterprise
21:02:59:        OS Arch: AMD64
21:02:59:           GPUs: 1
21:02:59:          GPU 0: Bus:1 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
21:02:59:                 470/480/570/580/590]
21:02:59:           CUDA: Not detected: Failed to open dynamic library 'nvcuda.dll': The
21:02:59:                 specified module could not be found.
21:02:59:
21:02:59:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:3004.8
21:02:59:  Win32 Service: false
21:02:59:******************************* libFAH ********************************
21:02:59:           Date: Apr 15 2020
21:02:59:           Time: 14:53:14
21:02:59:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
21:02:59:         Branch: master
21:02:59:       Compiler: Visual C++ 2008
21:02:59:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:02:59:       Platform: win32 10
21:02:59:           Bits: 32
21:02:59:           Mode: Release
21:02:59:***********************************************************************
21:02:59:<config>
21:02:59:  <!-- Folding Slot Configuration -->
21:02:59:  <cause v='HIGH_PRIORITY'/>
21:02:59:
21:02:59:  <!-- Network -->
21:02:59:  <proxy v=':8080'/>
21:02:59:
21:02:59:  <!-- Slot Control -->
21:02:59:  <pause-on-start v='true'/>
21:02:59:
21:02:59:  <!-- User Information -->
21:02:59:  <passkey v='*****'/>
21:02:59:  <team v='35780'/>
21:02:59:  <user v='kwthom'/>
21:02:59:
21:02:59:  <!-- Folding Slots -->
21:02:59:  <slot id='0' type='CPU'/>
21:02:59:  <slot id='1' type='GPU'>
21:02:59:    <paused v='true'/>
21:02:59:  </slot>
21:02:59:</config>
21:02:59:Trying to access database...
21:02:59:Successfully acquired database lock
21:02:59:Enabled folding slot 00: PAUSED cpu:4 (by user)
21:02:59:Enabled folding slot 01: PAUSED gpu:0:Ellesmere XT [Radeon RX 470/480/570/580/590] (by user)
21:03:00:3:127.0.0.1:New Web session
PantherX wrote:Also, have you configured an exception for the client files from your anti-virus/anti-malware/anit-spyware/anti-ransomeware software?
Haven't had the necessity to do so, since CPU WU's are functioning without issue.

If you'd be kind enough to point me in the direction of where I'd find the prefered settings for F@H, then I'll confirm my settings.
Image
Joe_H
Site Admin
Posts: 7883
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Post by Joe_H »

Generally we recommend that the F@h data directory be excluded from scanning by anti-virus software, the random binary data can trigger false positives. In your case that would be C:\Users\kwtho\AppData\Roaming\FAHClient. That can keep files from being opened by the scanning process and being blocked from use.

The Guru Meditation error in the first log usually indicates a problem opening a file, in this case connected with the checkpoint. Either the file was locked, corrupted, or missing. One other thing that can do this is a shutdown at the same time as a checkpoint is being written. Windows sometimes does not give running applications time enough to complete exiting, so data ends up not be written fully.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
kwthom
Posts: 29
Joined: Sun Mar 29, 2020 11:06 pm
Location: Jaynes Station, AZ

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Post by kwthom »

I *really* think it's a GPU card setting issue I'll need to resolve.

EDIT: Exclusion set as described above; I'll release the pause I have on my GPU when it cools down a bit (36°C - 97°F) today. :biggrin:
Image
kwthom
Posts: 29
Joined: Sun Mar 29, 2020 11:06 pm
Location: Jaynes Station, AZ

Re: Another BAD_WORK_UNIT (114 = 0x72) - why?

Post by kwthom »

UPDATE: Let it run thru the evening, then paused at the end of the WU; zero issues. I did find this morning that my screen was set to turn off after 5 hours; changed to 'never'. Since I shut off the monitors with the front panel switches when not in use, no big change.

When it ran thru the evening, I actually had the log tab of F@H active, so that means there was always something refreshing on the screen. That, along with a slight tweak of under-volting the card *may* be the solutions on my end.

Card is holding ~68°C while under load at this early hour of the morning. Current WU has four more hours to complete - this will be another good test of my settings.
Image
Post Reply