Page 1 of 1

Project: 10019 (Run 1869, Clone 1, Gen 49)

Posted: Fri Oct 22, 2010 1:22 pm
by Fahrenheit451
Hi,
Today I checked my running clients (with HFM.net) and found out that one of my CPU clients (Systray version 6.23) stopped working. The strange behavior started at [22:11:35] Completed 284700 out of 499375 steps (57%)

I quit all my running clients (2 v6.23 CPU systray clients and a v6.23 GPU console client) to restart Windows but the b4 core of the "bad" CPU client continued to "work" (=wasting my CPU power for nothing) before I restarted Windows.

Then I checked the FAH-Log and found this message: [11:55:37] Folding@home Core Shutdown: BAD_WORK_UNIT.

Code: Select all

[12:21:16] + Attempting to send results [October 21 12:21:16 UTC]
[12:21:24] + Results successfully sent
[12:21:24] Thank you for your contribution to Folding@Home.
[12:21:24] + Number of Units Completed: 420

[12:21:28] - Preparing to get new work unit...
[12:21:28] + Attempting to get work packet
[12:21:28] - Connecting to assignment server
[12:21:29] - Successful: assigned to (129.74.85.15).
[12:21:29] + News From Folding@Home: Welcome to Folding@Home
[12:21:29] Loaded queue successfully.
[12:21:31] + Closed connections
[12:21:31] 
[12:21:31] + Processing work unit
[12:21:31] Core required: FahCore_b4.exe
[12:21:31] Core found.
[12:21:31] Working on queue slot 00 [October 21 12:21:31 UTC]
[12:21:31] + Working ...
[12:21:33] *********************** Log Started 21/Oct/2010 12:21:33 ***********************
[12:21:33] ************************** ProtoMol Folding@Home Core **************************
[12:21:33]   Version: 25
[12:21:33]      Type: 180
[12:21:33]      Core: ProtoMol
[12:21:33]   Website: http://folding.stanford.edu/
[12:21:33] Copyright: (c) 2009 Stanford University
[12:21:33]    Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[12:21:33]      Args: -dir work/ -suffix 00 -cpu 98 -checkpoint 15 -lifeline 1272 -version
[12:21:33]            623
[12:21:33] ************************************ Build *************************************
[12:21:33]      Date: May 18 2010
[12:21:33]      Time: 23:43:52
[12:21:33]  Revision: 1819
[12:21:33]  Compiler: Intel(R) C++ MSVC 1500 mode 1110
[12:21:33]   Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[12:21:33]            /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[12:21:33]   Defines: _CRT_SECURE_NO_WARNINGS NDEBUG HAVE_GEEKINFO BOOST_ALL_NO_LIB
[12:21:33]            XML_STATIC HAVE_EXPAT HAVE_OPENSSL HAVE_LIBFAH HAVE_SIMTK_LAPACK
[12:21:33]  Platform: Windows XP
[12:21:33]      Bits: 32
[12:21:33]      Mode: Release
[12:21:33] ************************************ System ************************************
[12:21:33]        OS: Microsoft Windows Vista Ultimate
[12:21:33]       CPU: Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz
[12:21:33]    CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
[12:21:33]      CPUs: 2 Logical, 1 Physical
[12:21:33]    Memory: 2.00 GB
[12:21:33]   Threads: Windows
[12:21:33] ********************************************************************************
[12:21:33] Project: 10019 (Run 1869, Clone 1, Gen 49)
[12:21:33] Unit: 0x000000640001329c4bb101150000aad8
[12:21:33] User: 0x00000000000000000000000000000000
[12:21:33] Machine: 3
[12:21:33] Reading tar file par_all27_prot_lipid.inp
[12:21:33] Reading tar file scpismQuartic.inp
[12:21:33] Reading tar file ww.pdb
[12:21:33] Reading tar file ww.psf
[12:21:33] Reading tar file checkpt
[12:21:33] Reading tar file ww.5130.pos
[12:21:33] Reading tar file ww.5130.vel
[12:21:33] Reading tar file protomol.conf
[12:21:33] Reading tar file core.xml
[12:21:33] Digital signatures verified
[12:21:34] Completed 0 out of 499375 steps (0%)
[12:21:34] WARNING: Exception: 0: Could not bind socket to 127.0.0.1:52753: No error
[12:30:11] Completed 5000 out of 499375 steps (1%)
[12:38:37] Completed 10000 out of 499375 steps (2%)
[12:47:01] Completed 15000 out of 499375 steps (3%)
[12:55:28] Completed 20000 out of 499375 steps (4%)
[13:04:04] Completed 25000 out of 499375 steps (5%)
[13:12:22] Completed 30000 out of 499375 steps (6%)
[13:20:46] Completed 35000 out of 499375 steps (7%)
[13:29:04] Completed 40000 out of 499375 steps (8%)
[13:37:19] Completed 45000 out of 499375 steps (9%)
[13:45:49] Completed 50000 out of 499375 steps (10%)
[13:54:19] Completed 55000 out of 499375 steps (11%)
[14:02:52] Completed 60000 out of 499375 steps (12%)
[14:11:14] Completed 65000 out of 499375 steps (13%)
[14:19:30] Completed 70000 out of 499375 steps (14%)
[14:27:28] Completed 74900 out of 499375 steps (14%)
[14:35:38] Completed 79900 out of 499375 steps (16%)
[14:44:06] Completed 84900 out of 499375 steps (17%)
[14:49:15] + Working...
[14:52:13] Completed 89900 out of 499375 steps (18%)
[15:00:11] Completed 94900 out of 499375 steps (19%)
[15:08:17] Completed 99900 out of 499375 steps (20%)
[15:16:27] Completed 104900 out of 499375 steps (21%)
[15:24:42] Completed 109900 out of 499375 steps (22%)
[15:32:49] Completed 114900 out of 499375 steps (23%)
[15:41:04] Completed 119900 out of 499375 steps (24%)
[15:49:12] Completed 124900 out of 499375 steps (25%)
[15:57:26] Completed 129900 out of 499375 steps (26%)
[16:05:38] Completed 134900 out of 499375 steps (27%)
[16:13:49] Completed 139900 out of 499375 steps (28%)
[16:22:02] Completed 144800 out of 499375 steps (28%)
[16:30:13] Completed 149800 out of 499375 steps (29%)
[16:38:39] Completed 154800 out of 499375 steps (30%)
[16:47:06] Completed 159800 out of 499375 steps (32%)
[16:55:25] Completed 164800 out of 499375 steps (33%)
[17:04:15] Completed 169800 out of 499375 steps (34%)
[17:13:31] Completed 174800 out of 499375 steps (35%)
[17:21:55] Completed 179800 out of 499375 steps (36%)
[17:30:22] Completed 184800 out of 499375 steps (37%)
[17:39:21] Completed 189800 out of 499375 steps (38%)
[17:48:36] Completed 194800 out of 499375 steps (39%)
[17:57:38] Completed 199800 out of 499375 steps (40%)
[18:06:24] Completed 204800 out of 499375 steps (41%)
[18:06:41] CoreStatus = 62 (98)
[18:06:41] + Restarting core (settings changed) 
[18:06:41] 
[18:06:41] + Processing work unit
[18:06:41] Core required: FahCore_b4.exe
[18:06:41] Core found.
[18:06:41] Working on queue slot 00 [October 21 18:06:41 UTC]
[18:06:41] + Working ...
[18:06:43] *********************** Log Started 21/Oct/2010 18:06:43 ***********************
[18:06:43] ************************** ProtoMol Folding@Home Core **************************
[18:06:43]   Version: 25
[18:06:43]      Type: 180
[18:06:43]      Core: ProtoMol
[18:06:43]   Website: http://folding.stanford.edu/
[18:06:43] Copyright: (c) 2009 Stanford University
[18:06:43]    Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[18:06:43]      Args: -dir work/ -suffix 00 -cpu 98 -checkpoint 6 -notermcheck -lifeline
[18:06:43]            1272 -version 623
[18:06:43] ************************************ Build *************************************
[18:06:43]      Date: May 18 2010
[18:06:43]      Time: 23:43:52
[18:06:43]  Revision: 1819
[18:06:43]  Compiler: Intel(R) C++ MSVC 1500 mode 1110
[18:06:43]   Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[18:06:43]            /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[18:06:43]   Defines: _CRT_SECURE_NO_WARNINGS NDEBUG HAVE_GEEKINFO BOOST_ALL_NO_LIB
[18:06:43]            XML_STATIC HAVE_EXPAT HAVE_OPENSSL HAVE_LIBFAH HAVE_SIMTK_LAPACK
[18:06:43]  Platform: Windows XP
[18:06:43]      Bits: 32
[18:06:43]      Mode: Release
[18:06:43] ************************************ System ************************************
[18:06:43]        OS: Microsoft Windows Vista Ultimate
[18:06:43]       CPU: Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz
[18:06:43]    CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
[18:06:43]      CPUs: 2 Logical, 1 Physical
[18:06:43]    Memory: 2.00 GB
[18:06:43]   Threads: Windows
[18:06:43] ********************************************************************************
[18:06:43] Project: 10019 (Run 1869, Clone 1, Gen 49)
[18:06:43] Unit: 0x000000640001329c4bb101150000aad8
[18:06:43] User: 0x00000000000000000000000000000000
[18:06:43] Machine: 3
[18:06:43] Digital signatures verified
[18:06:44] WARNING: Exception: 0: Could not bind socket to 127.0.0.1:52753: No error
[18:06:44] Completed 197800 out of 499375 steps (39%)
[18:10:09] Completed 199800 out of 499375 steps (40%)
[18:19:01] Completed 204800 out of 499375 steps (41%)
[18:27:55] Completed 209800 out of 499375 steps (42%)
[18:35:56] Completed 214700 out of 499375 steps (42%)
[18:44:09] Completed 219700 out of 499375 steps (43%)
[18:52:25] Completed 224700 out of 499375 steps (44%)
[19:00:46] Completed 229700 out of 499375 steps (45%)
[19:09:23] Completed 234700 out of 499375 steps (46%)
[19:17:41] Completed 239700 out of 499375 steps (48%)
[19:25:59] Completed 244700 out of 499375 steps (49%)
[19:34:32] Completed 249700 out of 499375 steps (50%)
[19:44:31] Completed 254700 out of 499375 steps (51%)
[19:52:45] Completed 259700 out of 499375 steps (52%)

Folding@Home Client Shutdown.


--- Opening Log file [October 21 21:26:34 UTC] 


# Windows CPU Systray Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\USERNAME\AppData\Roaming\Folding@home-x86-Core2
Arguments: -advmethods 

[21:26:34] - Ask before connecting: No
[21:26:34] - User name: superduper4711 (Team 0)
[21:26:34] - User ID: 
[21:26:34] - Machine ID: 3
[21:26:34] 
[21:26:34] Loaded queue successfully.
[21:26:34] Initialization complete
[21:26:34] 
[21:26:34] + Processing work unit
[21:26:34] Core required: FahCore_b4.exe
[21:26:34] Core found.
[21:26:34] Working on queue slot 00 [October 21 21:26:34 UTC]
[21:26:34] + Working ...
[21:26:37] *********************** Log Started 21/Oct/2010 21:26:37 ***********************
[21:26:37] ************************** ProtoMol Folding@Home Core **************************
[21:26:37]   Version: 25
[21:26:37]      Type: 180
[21:26:37]      Core: ProtoMol
[21:26:37]   Website: http://folding.stanford.edu/
[21:26:37] Copyright: (c) 2009 Stanford University
[21:26:37]    Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[21:26:37]      Args: -dir work/ -suffix 00 -cpu 98 -checkpoint 6 -lifeline 5876 -version
[21:26:37]            623
[21:26:37] ************************************ Build *************************************
[21:26:37]      Date: May 18 2010
[21:26:37]      Time: 23:43:52
[21:26:37]  Revision: 1819
[21:26:37]  Compiler: Intel(R) C++ MSVC 1500 mode 1110
[21:26:37]   Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[21:26:37]            /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[21:26:37]   Defines: _CRT_SECURE_NO_WARNINGS NDEBUG HAVE_GEEKINFO BOOST_ALL_NO_LIB
[21:26:37]            XML_STATIC HAVE_EXPAT HAVE_OPENSSL HAVE_LIBFAH HAVE_SIMTK_LAPACK
[21:26:37]  Platform: Windows XP
[21:26:37]      Bits: 32
[21:26:37]      Mode: Release
[21:26:37] ************************************ System ************************************
[21:26:37]        OS: Microsoft Windows Vista Ultimate
[21:26:37]       CPU: Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz
[21:26:37]    CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
[21:26:37]      CPUs: 2 Logical, 1 Physical
[21:26:37]    Memory: 2.00 GB
[21:26:37]   Threads: Windows
[21:26:37] ********************************************************************************
[21:26:37] Project: 10019 (Run 1869, Clone 1, Gen 49)
[21:26:37] Unit: 0x000000640001329c4bb101150000aad8
[21:26:37] User: 0x00000000000000000000000000000000
[21:26:37] Machine: 3
[21:26:37] Digital signatures verified
[21:26:39] WARNING: Exception: 0: Could not bind socket to 127.0.0.1:52753: No error
[21:26:39] Completed 258100 out of 499375 steps (51%)
[21:29:19] Completed 259700 out of 499375 steps (52%)
[21:37:42] Completed 264700 out of 499375 steps (53%)
[21:46:07] Completed 269700 out of 499375 steps (54%)
[21:54:48] Completed 274700 out of 499375 steps (55%)
[22:03:23] Completed 279700 out of 499375 steps (56%)
[22:11:35] Completed 284700 out of 499375 steps (57%)
[22:19:34] Completed 289600 out of 499375 steps (57%)
[03:26:34] + Working...
[09:26:34] + Working...

Folding@Home Client Shutdown.


--- Opening Log file [October 22 11:54:23 UTC] 


# Windows CPU Systray Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\USERNAME\AppData\Roaming\Folding@home-x86-Core2
Arguments: -advmethods 

[11:54:23] - Ask before connecting: No
[11:54:23] - User name: superduper4711 (Team 0)
[11:54:23] - User ID: 
[11:54:23] - Machine ID: 3
[11:54:23] 
[11:54:23] Loaded queue successfully.
[11:54:23] Initialization complete
[11:54:23] 
[11:54:23] + Processing work unit
[11:54:23] Core required: FahCore_b4.exe
[11:54:23] Core found.
[11:54:23] Working on queue slot 00 [October 22 11:54:23 UTC]
[11:54:23] + Working ...
[11:54:25] *********************** Log Started 22/Oct/2010 11:54:25 ***********************
[11:54:25] ************************** ProtoMol Folding@Home Core **************************
[11:54:25]   Version: 25
[11:54:25]      Type: 180
[11:54:25]      Core: ProtoMol
[11:54:25]   Website: http://folding.stanford.edu/
[11:54:25] Copyright: (c) 2009 Stanford University
[11:54:25]    Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[11:54:25]      Args: -dir work/ -suffix 00 -cpu 98 -checkpoint 6 -lifeline 3904 -version
[11:54:25]            623
[11:54:25] ************************************ Build *************************************
[11:54:25]      Date: May 18 2010
[11:54:25]      Time: 23:43:52
[11:54:25]  Revision: 1819
[11:54:25]  Compiler: Intel(R) C++ MSVC 1500 mode 1110
[11:54:25]   Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[11:54:25]            /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[11:54:25]   Defines: _CRT_SECURE_NO_WARNINGS NDEBUG HAVE_GEEKINFO BOOST_ALL_NO_LIB
[11:54:25]            XML_STATIC HAVE_EXPAT HAVE_OPENSSL HAVE_LIBFAH HAVE_SIMTK_LAPACK
[11:54:25]  Platform: Windows XP
[11:54:25]      Bits: 32
[11:54:25]      Mode: Release
[11:54:25] ************************************ System ************************************
[11:54:25]        OS: Microsoft Windows Vista Ultimate
[11:54:25]       CPU: Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz
[11:54:25]    CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
[11:54:25]      CPUs: 2 Logical, 1 Physical
[11:54:25]    Memory: 2.00 GB
[11:54:25]   Threads: Windows
[11:54:25] ********************************************************************************
[11:54:25] Project: 10019 (Run 1869, Clone 1, Gen 49)
[11:54:25] Unit: 0x000000640001329c4bb101150000aad8
[11:54:25] User: 0x00000000000000000000000000000000
[11:54:25] Machine: 3
[11:54:25] Digital signatures verified
[11:54:26] WARNING: Exception: 0: Could not bind socket to 127.0.0.1:52753: No error
[11:54:26] Completed 294500 out of 499375 steps (58%)
[11:54:35] Completed 294600 out of 499375 steps (58%)
[11:55:33] ERROR: ProtoMol ERROR: Corrupt DCD file. Size is 2986428, should be >= 2993184.
[11:55:33] Saving result file logfile_00.txt
[11:55:33] Saving result file checkpt
[11:55:33] Saving result file checkpt.crc
[11:55:33] Saving result file log.txt
[11:55:36] Saving result file protomol.conf
[11:55:36] Saving result file ww.5179.pos
[11:55:36] Saving result file ww.5179.vel
[11:55:36] Saving result file ww.dcd
[11:55:37] WARNING: While cleaning up: 0: Failed to remove directory '00': boost::filesystem::remove: Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen Prozess verwendet wird: "00\ww.dcd"
[11:55:37] Folding@home Core Shutdown: BAD_WORK_UNIT
[11:55:40] CoreStatus = 72 (114)
[11:55:40] Sending work to server
[11:55:40] Project: 10019 (Run 1869, Clone 1, Gen 49)
[11:55:40] - Read packet limit of 540015616... Set to 524286976.


[11:55:40] + Attempting to send results [October 22 11:55:40 UTC]
[11:56:18] + Results successfully sent
[11:56:18] Thank you for your contribution to Folding@Home.
[11:56:25] - Preparing to get new work unit...
[11:56:25] + Attempting to get work packet
[11:56:25] - Connecting to assignment server
[11:56:26] - Successful: assigned to (129.74.85.15).
[11:56:26] + News From Folding@Home: Welcome to Folding@Home
[11:56:26] Loaded queue successfully.
[11:56:27] + Closed connections
[11:56:32] 
[11:56:32] + Processing work unit
[11:56:32] Core required: FahCore_b4.exe
[11:56:32] Core found.
[11:56:32] Working on queue slot 01 [October 22 11:56:32 UTC]
[11:56:32] + Working ...
[11:56:35] *********************** Log Started 22/Oct/2010 11:56:35 ***********************
[11:56:35] ************************** ProtoMol Folding@Home Core **************************
[11:56:35]   Version: 25
[11:56:35]      Type: 180
[11:56:35]      Core: ProtoMol
[11:56:35]   Website: http://folding.stanford.edu/
[11:56:35] Copyright: (c) 2009 Stanford University
[11:56:35]    Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[11:56:35]      Args: -dir work/ -suffix 01 -cpu 98 -checkpoint 6 -lifeline 3904 -version
[11:56:35]            623
[11:56:35] ************************************ Build *************************************
[11:56:35]      Date: May 18 2010
[11:56:35]      Time: 23:43:52
[11:56:35]  Revision: 1819
[11:56:35]  Compiler: Intel(R) C++ MSVC 1500 mode 1110
[11:56:35]   Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[11:56:35]            /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[11:56:35]   Defines: _CRT_SECURE_NO_WARNINGS NDEBUG HAVE_GEEKINFO BOOST_ALL_NO_LIB
[11:56:35]            XML_STATIC HAVE_EXPAT HAVE_OPENSSL HAVE_LIBFAH HAVE_SIMTK_LAPACK
[11:56:35]  Platform: Windows XP
[11:56:35]      Bits: 32
[11:56:35]      Mode: Release
[11:56:35] ************************************ System ************************************
[11:56:35]        OS: Microsoft Windows Vista Ultimate
[11:56:35]       CPU: Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz
[11:56:35]    CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
[11:56:35]      CPUs: 2 Logical, 1 Physical
[11:56:35]    Memory: 2.00 GB
[11:56:35]   Threads: Windows
[11:56:35] ********************************************************************************
[11:56:35] Project: 10056 (Run 955, Clone 0, Gen 16)
[11:56:35] Unit: 0x0000001a0001329c4c61916c00003a6c
[11:56:35] User: 0x00000000000000000000000000000000
[11:56:35] Machine: 3
[11:56:35] Reading tar file par_all27_prot_lipid.inp
[11:56:35] Reading tar file scpismQuartic.inp
[11:56:35] Reading tar file ww.pdb
[11:56:35] Reading tar file ww.psf
[11:56:35] Reading tar file checkpt
[11:56:35] Reading tar file ww.1278.pos
[11:56:35] Reading tar file ww.1278.vel
[11:56:35] Reading tar file protomol.conf
[11:56:35] Reading tar file core.xml
[11:56:35] Digital signatures verified
[11:56:35] Completed 0 out of 499375 steps (0%)
[11:56:35] WARNING: Exception: 0: Could not bind socket to 127.0.0.1:52753: No error
[12:05:37] Completed 5000 out of 499375 steps (1%)
[12:14:51] Completed 10000 out of 499375 steps (2%)
As you can see I restarted the client sometimes, but this should be no problem. Yesterday I only added my passkey to the client and lowered the checkpoint frequency ([18:06:41] + Restarting core (settings changed) ).

Is this error caused by the WU or a random PC error?
My PC specs can be found in this thread: http://foldingforum.org/viewtopic.php?f ... hilit=5765 (page 3)

Re: Project: 10019 (Run 1869, Clone 1, Gen 49)

Posted: Fri Oct 22, 2010 4:48 pm
by sortofageek
Thanks for your report. I'm sorry to hear you had trouble with this one. The WU itself wasn't bad, however, as someone else was able to complete it successfully.

Re: Project: 10019 (Run 1869, Clone 1, Gen 49)

Posted: Fri Oct 22, 2010 7:26 pm
by Fahrenheit451
What is the exact definition of a bad work unit?
In my case the client itself labeled it as BAD_WORK_UNIT in the log file :?

Re: Project: 10019 (Run 1869, Clone 1, Gen 49)

Posted: Fri Oct 22, 2010 7:46 pm
by sortofageek
When I say "bad work unit," I mean a WU which cannot be completed by anyone. In such a case, the mods have the ability to stop the WU from being reassigned. We tend to wait until at least three different folders are unable to complete that WU before marking it bad. If one person can complete it, it isn't a bad WU in the sense we are using here in this forum.

It obviously, however, did prove to be a bad WU for you on that machine.

Re: Project: 10019 (Run 1869, Clone 1, Gen 49)

Posted: Fri Oct 22, 2010 9:57 pm
by Fahrenheit451
Thx for the explanation.
If I understand you right you require this BAD_WORK_UNIT message from three different folders for the same project (R,C,G) to mark it as bad WU. It's only because I never had this issue before.
Has somebody finished this WU successfully?
Is it possible to reassign this specific WU to me? I really would like to know if this was a random PC failure or if the error is reproducible on my PC.

Re: Project: 10019 (Run 1869, Clone 1, Gen 49)

Posted: Fri Oct 22, 2010 11:45 pm
by bruce
Fahrenheit451 wrote:Thx for the explanation.
If I understand you right you require this BAD_WORK_UNIT message from three different folders for the same project (R,C,G) to mark it as bad WU. It's only because I never had this issue before.
Has somebody finished this WU successfully?
Is it possible to reassign this specific WU to me? I really would like to know if this was a random PC failure or if the error is reproducible on my PC.
sortofageek wrote:The WU itself wasn't bad, however, as someone else was able to complete it successfully.
There's no way to assign a specific WU to anyone, especially when the WU has already been completed.

There are a variety of possible errors (including BAD_WORK_UNIT) which more or less mean that the analysis has detected data that has been corrupted in some way. The problem here is clearly that "in some way" covers a lot of territory. I'd look seriously for anything that might cause an error in your system including overclocking, heat, outright hardware failures, etc.

From the perspective of the FAH servers, they cannot assume that every PC is stable/reliable/etc. so when there's an error, it has to retry to process the same WU under different conditions. If that succeeds, science marches onward and it's left to the owner of the machine to identify why somebody else could complete that PRCG and you couldn't. FAH has an excellent record of discarding data that contains errors, so science isn't hampered significantly, though fixing any bad node is a good thing (and I'm sure you'll agree).

An extremely rare failure probably isn't worth trying to find and fix. Something that happens regularly is. If you choose to reduce your clock rate or to investigate less critical memory timing (if you've tweaked those) you might want to do that as a safety measure but the bottom line is that it's up to you.

Re: Project: 10019 (Run 1869, Clone 1, Gen 49)

Posted: Sat Oct 23, 2010 12:43 am
by sortofageek
Fahrenheit451 wrote:...
If I understand you right you require this BAD_WORK_UNIT message from three different folders for the same project (R,C,G) to mark it as bad WU.
...
No, not necessarily. As Bruce mentioned above, there can be all kinds of reasons why people can't complete a WU. They may get that error message or they may get a different error message. The key isn't the error message. The key is at least three different machine/client setups get that WU and cannot complete it.

Fahrenheit451 wrote:...
Has somebody finished this WU successfully?

Yes, that is what I said above.
The WU itself wasn't bad, however, as someone else was able to complete it successfully.

Re: Project: 10019 (Run 1869, Clone 1, Gen 49)

Posted: Sat Oct 23, 2010 9:05 am
by Fahrenheit451
Thx for your replies. They help me to understand how FAH works.