Page 1 of 1

Project: 6517 (Run 12, Clone 213, Gen 13)

Posted: Tue Mar 15, 2011 2:41 pm
by djohnston
What is the deal with all these bad work units?

Code: Select all

[11:13:02] Project: 6517 (Run 12, Clone 213, Gen 13)
[11:13:02] 
[11:13:02] Assembly optimizations on if available.
[11:13:02] Entering M.D.
[11:13:08] Protein: 1CFC_B_13 in water
[11:13:08] 
[11:13:08] Writing local files
[11:13:08] Extra SSE boost OK.
[11:13:08] Writing local files
[11:13:08] Completed 0 out of 250000 steps  (0%)
[11:21:30] Writing local files
[11:21:30] Completed 2500 out of 250000 steps  (1%)
[11:29:49] Writing local files
.....................................................
[16:38:21] Completed 97500 out of 250000 steps  (39%)
[16:46:51] Writing local files
[16:46:51] Completed 100000 out of 250000 steps  (40%)
[16:50:35] CoreStatus = 0 (0)
[16:50:35] Client-core communications error: ERROR 0x0
[16:50:35] Deleting current work unit & continuing...
[16:51:28] - Preparing to get new work unit...
[16:51:28] + Attempting to get work packet
[16:51:28] - Connecting to assignment server
[16:51:28] - Successful: assigned to (171.64.65.62).
[16:51:28] + News From Folding@Home: Welcome to Folding@Home
[16:51:29] Loaded queue successfully.
[16:51:31] + Closed connections
[16:51:36] 
[16:51:36] + Processing work unit
[16:51:36] Core required: FahCore_78.exe
[16:51:36] Core found.
[16:51:36] Working on Unit 04 [March 14 16:51:36]
[16:51:36] + Working ...
[16:51:36] 
[16:51:36] *------------------------------*
[16:51:36] Folding@Home Gromacs Core
[16:51:36] Version 1.90 (March 8, 2006)
[16:51:36] 
[16:51:36] Preparing to commence simulation
[16:51:36] - Looking at optimizations...
[16:51:36] - Created dyn
[16:51:36] - Files status OK
[16:51:37] - Expanded 571838 -> 2846941 (decompressed 497.8 percent)
[16:51:37] - Starting from initial work packet
[16:51:37] 
[16:51:37] Project: 6517 (Run 12, Clone 213, Gen 13)
[16:51:37] 
[16:51:37] Assembly optimizations on if available.
[16:51:37] Entering M.D.
[16:51:43] Protein: 1CFC_B_13 in water
[16:51:43] 
[16:51:43] Writing local files
[16:51:43] Extra SSE boost OK.
[16:51:43] Writing local files
[16:51:43] Completed 0 out of 250000 steps  (0%)
[17:00:07] Writing local files
[17:00:07] Completed 2500 out of 250000 steps  (1%)
[17:08:59] Writing local files
.....................................................
[22:26:20] Completed 97500 out of 250000 steps  (39%)
[22:36:32] Writing local files
[22:36:32] Completed 100000 out of 250000 steps  (40%)
[22:40:28] CoreStatus = 0 (0)
[22:40:28] Client-core communications error: ERROR 0x0
[22:40:28] Deleting current work unit & continuing...
[22:41:20] - Preparing to get new work unit...
[22:41:20] + Attempting to get work packet
[22:41:20] - Connecting to assignment server
[22:41:20] - Successful: assigned to (171.64.65.62).
[22:41:20] + News From Folding@Home: Welcome to Folding@Home
[22:41:20] Loaded queue successfully.
[22:41:22] + Closed connections
[22:41:27] 
[22:41:27] + Processing work unit
[22:41:27] Core required: FahCore_78.exe
[22:41:27] Core found.
[22:41:27] Working on Unit 05 [March 14 22:41:27]
[22:41:27] + Working ...
[22:41:28] 
[22:41:28] *------------------------------*
[22:41:28] Folding@Home Gromacs Core
[22:41:28] Version 1.90 (March 8, 2006)
[22:41:28] 
[22:41:28] Preparing to commence simulation
[22:41:28] - Looking at optimizations...
[22:41:28] - Created dyn
[22:41:28] - Files status OK
[22:41:28] - Expanded 571838 -> 2846941 (decompressed 497.8 percent)
[22:41:28] - Starting from initial work packet
[22:41:28] 
[22:41:28] Project: 6517 (Run 12, Clone 213, Gen 13)
[22:41:28] 
[22:41:28] Assembly optimizations on if available.
[22:41:28] Entering M.D.
[22:41:34] Protein: 1CFC_B_13 in water
[22:41:34] 
[22:41:34] Writing local files
[22:41:34] Extra SSE boost OK.
[22:41:35] Writing local files
[22:41:35] Completed 0 out of 250000 steps  (0%)
[22:50:11] Writing local files
[22:50:11] Completed 2500 out of 250000 steps  (1%)
[22:58:32] Writing local files
.....................................................
[04:32:23] Completed 97500 out of 250000 steps  (39%)
[04:41:56] Writing local files
[04:41:56] Completed 100000 out of 250000 steps  (40%)
[04:45:54] CoreStatus = 0 (0)
[04:45:54] Client-core communications error: ERROR 0x0
[04:45:54] - Attempting to download new core...
[04:45:54] + Downloading new core: FahCore_78.exe
[04:45:54] + 10240 bytes downloaded
[04:45:54] + 20480 bytes downloaded
.......................................................
[04:45:57] + 1126400 bytes downloaded
[04:45:57] + 1134407 bytes downloaded
[04:45:57] Verifying core Core_78.fah...
[04:45:57] Signature is VALID
[04:45:57] 
[04:45:57] Trying to unzip core FahCore_78.exe
[04:45:57] Decompressed FahCore_78.exe (3435296 bytes) successfully
[04:45:57] + Core successfully engaged
[04:46:07] Deleting current work unit & continuing...
[04:46:59] - Preparing to get new work unit...
[04:46:59] + Attempting to get work packet
[04:46:59] - Connecting to assignment server
[04:47:00] - Successful: assigned to (171.64.65.62).
[04:47:00] + News From Folding@Home: Welcome to Folding@Home
[04:47:00] Loaded queue successfully.
[04:47:02] + Closed connections
[04:47:07] 
[04:47:07] + Processing work unit
[04:47:07] Core required: FahCore_78.exe
[04:47:07] Core found.
[04:47:07] Working on Unit 06 [March 15 04:47:07]
[04:47:07] + Working ...
[04:47:07] 
[04:47:07] *------------------------------*
[04:47:07] Folding@Home Gromacs Core
[04:47:07] Version 1.90 (March 8, 2006)
[04:47:07] 
[04:47:07] Preparing to commence simulation
[04:47:07] - Looking at optimizations...
[04:47:07] - Created dyn
[04:47:07] - Files status OK
[04:47:08] - Expanded 571838 -> 2846941 (decompressed 497.8 percent)
[04:47:08] - Starting from initial work packet
[04:47:08] 
[04:47:08] Project: 6517 (Run 12, Clone 213, Gen 13)
[04:47:08] 
[04:47:08] Assembly optimizations on if available.
[04:47:08] Entering M.D.
[04:47:14] Protein: 1CFC_B_13 in water
[04:47:14] 
[04:47:14] Writing local files
[04:47:14] Extra SSE boost OK.
[04:47:14] Writing local files
[04:47:14] Completed 0 out of 250000 steps  (0%)
[04:56:01] Writing local files
[04:56:01] Completed 2500 out of 250000 steps  (1%)
[05:04:30] Writing local files
.....................................................
[10:15:25] Completed 97500 out of 250000 steps  (39%)
[10:23:45] Writing local files
[10:23:46] Completed 100000 out of 250000 steps  (40%)
[10:27:30] CoreStatus = 0 (0)
[10:27:30] Client-core communications error: ERROR 0x0
[10:27:30] Deleting current work unit & continuing...
[10:28:21] - Preparing to get new work unit...
[10:28:21] + Attempting to get work packet
[10:28:21] - Connecting to assignment server
[10:28:22] - Successful: assigned to (171.64.65.62).
[10:28:22] + News From Folding@Home: Welcome to Folding@Home
[10:28:22] Loaded queue successfully.
[10:28:24] + Closed connections
[10:28:29] 
[10:28:29] + Processing work unit
[10:28:29] Core required: FahCore_78.exe
[10:28:29] Core found.
[10:28:29] Working on Unit 07 [March 15 10:28:29]
[10:28:29] + Working ...
[10:28:29] 
[10:28:29] *------------------------------*
[10:28:29] Folding@Home Gromacs Core
[10:28:29] Version 1.90 (March 8, 2006)
[10:28:29] 
[10:28:29] Preparing to commence simulation
[10:28:29] - Looking at optimizations...
[10:28:29] - Created dyn
[10:28:29] - Files status OK
[10:28:30] - Expanded 571838 -> 2846941 (decompressed 497.8 percent)
[10:28:30] - Starting from initial work packet
[10:28:30] 
[10:28:30] Project: 6517 (Run 12, Clone 213, Gen 13)
[10:28:30] 
[10:28:30] Assembly optimizations on if available.
[10:28:30] Entering M.D.
[10:28:36] Protein: 1CFC_B_13 in water
[10:28:36] 
[10:28:36] Writing local files
[10:28:36] Extra SSE boost OK.
[10:28:36] Writing local files
[10:28:36] Completed 0 out of 250000 steps  (0%)
[10:37:00] Writing local files
[10:37:00] Completed 2500 out of 250000 steps  (1%)
[10:45:21] Writing local files
[10:45:21] Completed 5000 out of 250000 steps  (2%)
[10:53:40] Writing local files
....................................................
Mod Edit: Added Code Tags - PantherX

Re: Project: 6517 (Run 12, Clone 213, Gen 13)

Posted: Tue Mar 15, 2011 11:02 pm
by bruce
Apparently you've already figure out that there's only one WU involved. When a WU is deleted, whether intentionally (which is strongly discouraged) or by the software (as in: "Client-core communications error: ERROR 0x0 ... Deleting current work unit & continuing..") no failure record it returned to the server, so it gets automatically reassigned.

I'll mark this topic for follow-up.

If you have not already moved on to another topic, you'll need to delete it manually (in spite of my statement about "strongly discouraged")
* Stop the client
* Delete queue.dat and the \work folder
* Reconfigure the client to use a different Machine ID
* Restart

Re: Project: 6517 (Run 12, Clone 213, Gen 13)

Posted: Wed Mar 16, 2011 12:36 am
by PantherX
If you want to know why bruce recommended manually deleting the WU, please read about the Bad WU section 2 -> viewtopic.php?f=19&t=16526

Re: Project: 6517 (Run 12, Clone 213, Gen 13)

Posted: Wed Mar 16, 2011 5:21 pm
by djohnston
What I meant by "all these bad work units" is how many I've gotten lately.

I never delete a WU without a valid reason. In this particular case, the WU kept getting client - core communication error at 40%. Although the text indicates the program was deleting that WU amd getting another, it never did. The program continued in that loop five times. At that point, I determined it would be better to stop the client, delete work files, get another WU and report this particular one.

I know deleting unprocessed WUs is strongly discouraged. But if a WU keeps erroring and looping at the same point, without ever getting another WU to process, I don't see how letting that program loop continue would serve any good purpose. I'm doing my best to contribute as much as I can to this fine project.

Re: Project: 6517 (Run 12, Clone 213, Gen 13)

Posted: Wed Mar 16, 2011 9:57 pm
by gwildperson
As far as the client was concerned, it was deleting the WU and getting a new one. It doesn't know that the server is reissued the same WU.

Note the messages

Code: Select all

[16:51:36] Working on Unit 04 [March 14 16:51:36]
[22:41:27] Working on Unit 05 [March 14 22:41:27]
[04:47:07] Working on Unit 06 [March 15 04:47:07]
[10:28:29] Working on Unit 07 [March 15 10:28:29]

Re: Project: 6517 (Run 12, Clone 213, Gen 13)

Posted: Sun Mar 20, 2011 11:11 pm
by PantherX
There were multiple failures so I have marked it as a bad WU. Thanks for your report.