Page 1 of 1

Project: 2665 (Run 2, Clone 593, Gen 53)

Posted: Fri Dec 05, 2008 4:34 am
by Drewpy
SMP 6.22 beta2 for MPICH (64-bit)

Project: 2665 (Run 2, Clone 593, Gen 53)
Failed at 47%

Code: Select all

[22:59:42] Project: 2665 (Run 2, Clone 593, Gen 53)
[22:59:42] 
[22:59:43] Assembly optimizations on if available.
[22:59:43] Entering M.D.
[23:00:00]  on if available.
[23:00:00] Entering M.D.
[23:00:06] Rejecting checkpoint
[23:00:08] Protein: HGG with glycosylations
[23:00:08] Writing local files
[23:00:15] Extra SSE boost OK.
[23:00:16] Writing local files
[23:00:16] Completed 0 out of 250000 steps  (0 percent)
[23:16:35] Writing local files
[23:16:35] Completed 2500 out of 250000 steps  (1 percent)
[23:33:04] Writing local files
[23:33:04] Completed 5000 out of 250000 steps  (2 percent)
[23:53:21] Writing local files
[23:53:21] Completed 7500 out of 250000 steps  (3 percent)
...
[14:58:46] Writing local files
[14:58:46] Completed 115000 out of 250000 steps  (46 percent)
[15:19:43] Writing local files
[15:19:44] Completed 117500 out of 250000 steps  (47 percent)
[15:27:00] Warning:  long 1-4 interactions
[15:27:00] Quit 101 - NaN detected: (ener[20])
[15:27:00] 
[15:27:00] Simulation instability has been encountered. The run has entered a
[15:27:00]   state from which no further progress can be made.
[15:27:00] This may be the correct result of the simulation, however if you
[15:27:00]   often see other project units terminating early like this
[15:27:00]   too, you may wish to check the stability of your computer (issues
[15:27:00]   such as high temperature, overclocking, etc.).
[15:27:00] Going to send back what have done.
[15:27:00] logfile size: 97216
[15:27:00] - Writing 97766 bytes of core data to disk...
[15:27:00]   ... Done.
[15:27:00] - Failed to delete work/wudata_01.arc
[15:27:00] Warning:  check for stray files
[15:29:00] 
[15:29:00] Folding@home Core Shutdown: EARLY_UNIT_END
[15:29:00] 
[15:29:00] Folding@home Core Shutdown: EARLY_UNIT_END
[15:29:05] CoreStatus = 7B (123)
[15:29:05] Client-core communications error: ERROR 0x7b
[15:29:05] This is a sign of more serious problems, shutting down.

Prior to this
Project: 2665 (Run 2, Clone 296, Gen 61)
Failed at 14%, 5 times in a row before I got the new WU (P2665 R2 C593 G53).

Please help, I think I'll be frustrated soon. It this just Bad luck with Bad WUs, or do I have a more serious problem with my computer's stability?

Re: Project: 2665 (Run 2, Clone 593, Gen 53)

Posted: Fri Dec 05, 2008 7:50 am
by bruce
Project: 2665 (Run 2, Clone 593, Gen 53) with an error 0x7B it's almost impossible to tell since that's an unknown error and the client doesn't upload an error report.

Project: 2665 (Run 2, Clone 296, Gen 61). on the other hand, must have a different error because there are several reports of EUEs at various degrees of completion so it's probably a bad WU. What was the error message?

Re: Project: 2665 (Run 2, Clone 593, Gen 53)

Posted: Fri Dec 05, 2008 11:56 am
by toTOW
Upgrade your client to 6.23 : viewtopic.php?f=46&t=6642

It should handles EUE better.

Re: Project: 2665 (Run 2, Clone 593, Gen 53)

Posted: Fri Dec 05, 2008 8:34 pm
by Drewpy
bruce wrote:Project: 2665 (Run 2, Clone 296, Gen 61). on the other hand, must have a different error because there are several reports of EUEs at various degrees of completion so it's probably a bad WU. What was the error message?
Nope, all the same error message 0x7b.

Usually they failed with this error message, but one instance failed with an identical message to the P2665 R2 C593 G53 above.

Code: Select all

[08:33:16] Writing local files
[08:33:16] Completed 35000 out of 250000 steps  (14 percent)
[08:40:10] Gromacs cannot continue further.
[08:40:10] Going to send back what have done.
[08:40:10] logfile size: 35356
[08:40:10] - Writing 35892 bytes of core data to disk...
[08:40:10]   ... Done.
[08:40:10] - Failed to delete work/wudata_01.sas
[08:40:10] - Failed to delete work/wudata_01.goe
[08:40:10] Warning:  check for stray files
[08:40:10] 
[08:40:10] Folding@home Core Shutdown: EARLY_UNIT_END
[08:40:10] 
[08:40:10] Folding@home Core Shutdown: EARLY_UNIT_END
[08:40:13] CoreStatus = 7B (123)
[08:40:13] Client-core communications error: ERROR 0x7b
[08:40:13] This is a sign of more serious problems, shutting down.

Re: Project: 2665 (Run 2, Clone 593, Gen 53)

Posted: Fri Dec 05, 2008 8:42 pm
by Drewpy
toTOW wrote:Upgrade your client to 6.23 : viewtopic.php?f=46&t=6642

It should handles EUE better.
Are you sure? That file says "Win32-x86", as I mentioned in the first line of the OP I am using the 64-bit client (and running under Windows Vista). The naming of this file seems to indicate that it is for 32-bit Windows on x86 ISA...

Re: Project: 2665 (Run 2, Clone 593, Gen 53)

Posted: Fri Dec 05, 2008 11:56 pm
by toTOW
There's no 64 bits version of the client ... what is the name of your current client executable ?

Re: Project: 2665 (Run 2, Clone 593, Gen 53)

Posted: Sat Dec 06, 2008 10:43 pm
by Drewpy
Client: Windows Vista SMP client console
Version: 6.22 beta2 for MPICH (32-bit or 64-bit)

So its only the DEINO vs MPICH code that determines whether you can run on 64-bit OS? And the client code remains the same? :?

It would be MUCH clearer to state that it is the Operating system that has to be 64-bit for MPICH. The project downloads page puts the 64-bit designation under the "version #" column... Hence my interpretation that I was running a 64-bit client. I'll go ahead and update to 6.23

Re: Project: 2665 (Run 2, Clone 593, Gen 53)

Posted: Fri Dec 19, 2008 3:59 am
by bruce
The MPICH library can support either a 32-bit or a 64-bit OS. The DENIO library is only available for 32-bit.