Page 1 of 1

Project: 2653 (Run 21, Clone 89, Gen 118)

Posted: Sun Jun 28, 2009 10:19 pm
by anko1
Simulation instability. Couldn't read results, so nothing was sent back. [I checked my work folder and there's no results file for 08.]

Code: Select all

[12:05:36] + Closed connections
[12:05:36] 
[12:05:36] + Processing work unit
[12:05:36] Work type a1 not eligible for variable processors
[12:05:36] Core required: FahCore_a1.exe
[12:05:36] Core found.
[12:05:36] Using generic mpiexec calls
[12:05:36] Working on queue slot 08 [June 27 12:05:36 UTC]
[12:05:36] + Working ...
[12:05:36] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 15 -verbose -lifeline 6460 -version 623'

[12:05:37] 
[12:05:37] *------------------------------*
[12:05:37] Folding@Home Gromacs SMP Core
[12:05:37] Version 1.74 (March 10, 2007)
[12:05:37] 
[12:05:37] Preparing to commence simulation
[12:05:37] - Ensuring status. Please wait.
[12:05:39] - Starting from initial work packet
[12:05:39] 
[12:05:39] Project: 2653 (Run 21, Clone 89, Gen 118)
[12:05:39] 
[12:05:40] Assembly optimizations on if available.
[12:05:40] Entering M.D.
[12:05:58] percent)
[12:05:58] - Starting from initial work packet
[12:05:58] 
[12:05:58] Project: 2653 (Run 21, Clone 89, Gen 118)
[12:05:58] 
[12:06:00] Entering M.D.
[12:06:06] Rejecting checkpoint
[12:06:07] Protein: Protein in POPC
[12:06:07] Writing local files
[12:06:08] Extra SSE boost OK.
[12:06:08] Writing local files
[12:06:09] Completed 0 out of 500000 steps  (0 percent)
[12:19:00] Writing local files
[12:19:01] Completed 5000 out of 500000 steps  (1 percent)
                         {snip}
[08:00:06] Timered checkpoint triggered.
[08:00:24] Writing local files
[08:00:24] Completed 450000 out of 500000 steps  (90 percent)
[08:00:24] Quit 101 - XTC error
[08:00:24] Simulation instability has been encountered. The run has entered a
[08:00:24]   state from which no further progress can be made.
[08:00:24] This may be the correct result of the simulation, however if you
[08:00:24]   often see other project units terminating early like this
[08:00:24]   too, you may wish to check the stability of your computer (issues
[08:00:24]   such as high temperature, overclocking, etc.).
[08:00:24] Going to send back what have done.
[08:00:24] logfile size: 8784
[08:00:24] - Could not open results file
[08:00:24] - Failed to delete work/wudata_08.sas
[08:00:24] - Failed to delete work/wudata_08.goe
[08:00:24] Warning:  check for stray files
[08:00:24] 
[08:00:24] Folding@home Core Shutdown: EARLY_UNIT_END
[08:00:24] 
[08:00:24] Folding@home Core Shutdown: EARLY_UNIT_END
[08:00:29] CoreStatus = 7B (123)
[08:00:29] Sending work to server
[08:00:29] Project: 2653 (Run 21, Clone 89, Gen 118)
[08:00:29] - Error: Could not get length of results file work/wuresults_08.dat
[08:00:29] - Error: Could not read unit 08 file. Removing from queue.
[08:00:29] Trying to send all finished work units
[08:00:29] + No unsent completed units remaining.
[08:00:29] - Preparing to get new work unit...

Re: Project: 2653 (Run 21, Clone 89, Gen 118)

Posted: Fri Jul 03, 2009 1:03 pm
by susato
Thanks for the report, anko1

Re: Project: 2653 (Run 21, Clone 89, Gen 118)

Posted: Sat Jul 04, 2009 2:50 am
by bruce
If past experience is any guide, XTC errors seem to be associated with a full disk or some other similar condition (like interference from an AV program) preventing FAH from reading and writing whatever it wants in it's work directly.

Re: Project: 2653 (Run 21, Clone 89, Gen 118)

Posted: Sat Jul 04, 2009 3:07 am
by anko1
Thanks for the input, Bruce. It's been awhile since the error, so I may not recall correctly, but I think there were some errors on the following units, and when I checked the work folder there were a lot of stray files hanging around (files left from other units), so that may have been it. I cleaned it up and things have been going well since.