Project 6059: (Run 0, Clone 33, Gen 5 )

Moderators: Site Moderators, FAHC Science Team

Post Reply
Aardvark
Posts: 143
Joined: Sat Jul 12, 2008 4:22 pm
Location: Team MacResource

Project 6059: (Run 0, Clone 33, Gen 5 )

Post by Aardvark »

Another "early fail" WU. This one at <1% and it was able to report back to Stanford.

Log File follows:

Code: Select all


[08:35:27] + Processing work unit
[08:35:27] Core required: FahCore_a3.exe
[08:35:27] Core found.
[08:35:27] Working on queue slot 05 [April 23 08:35:27 UTC]
[08:35:27] + Working ...
[08:35:27] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 2 -checkpoint 15 -verbose -lifeline 5262 -version 629'

[08:35:27] 
[08:35:27] *------------------------------*
[08:35:27] Folding@Home Gromacs SMP Core
[08:35:27] Version 2.17 (Mar 7 2010)
[08:35:27] 
[08:35:27] Preparing to commence simulation
[08:35:27] - Ensuring status. Please wait.
[08:35:36] - Looking at optimizations...
[08:35:36] - Working with standard loops on this execution.
[08:35:36] - Created dyn
[08:35:36] - Files status OK
[08:35:37] - Expanded 1767592 -> 2254429 (decompressed 127.5 percent)
[08:35:37] Called DecompressByteArray: compressed_data_size=1767592 data_size=2254429, decompressed_data_size=2254429 diff=0
[08:35:37] - Digital signature verified
[08:35:37] 
[08:35:37] Project: 6059 (Run 0, Clone 33, Gen 5)
[08:35:37] 
[08:35:37] Entering M.D.
Starting 2 threads
NNODES=2, MYRANK=0, HOSTNAME=thread #0
NNODES=2, MYRANK=1, HOSTNAME=thread #1
Reading file work/wudata_05.tpr, VERSION 4.0.99_development_20090605 (single precision)
Making 1D domain decomposition 2 x 1 x 1
starting mdrun 'Mutant_scan'
3000000 steps,   6000.0 ps (continuing from step 2500000,   5000.0 ps).
[08:35:44] Completed 0 out of 500000 steps  (0%)

-------------------------------------------------------
Program mdrun, VERSION 4.0.99-dev-20100305
Source code file: /Users/kasson/a3_devnew/gromacs/src/mdlib/pme.c, line: 563

Fatal error:
6 particles communicated to PME node 0 are more than a cell length out of the domain decomposition cell of their charge group in dimension x
For more information and tips for trouble shooting please check the GROMACS website at
http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

[08:56:49] mdrun returned 255
[08:56:49] Going to send back what have done -- stepsTotalG=500000
[08:56:49] Work fraction=0.0091 steps=500000.
[08:56:53] logfile size=12522 infoLength=12522 edr=0 trr=25
[08:56:53] logfile size: 12522 info=12522 bed=0 hdr=25
[08:56:53] - Writing 13060 bytes of core data to disk...
[08:56:53]   ... Done.
[08:56:53] 
[08:56:53] Folding@home Core Shutdown: UNSTABLE_MACHINE
[08:56:53] CoreStatus = 7A (122)
[08:56:53] Sending work to server
[08:56:53] Project: 6059 (Run 0, Clone 33, Gen 5)


[08:56:53] + Attempting to send results [April 23 08:56:53 UTC]
[08:56:53] - Reading file work/wuresults_05.dat from core
[08:56:53]   (Read 13060 bytes from disk)
[08:56:54] > Press "c" to connect to the server to upload results

What is past is prologue!
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: Project 6059: (Run 0, Clone 33, Gen 5 )

Post by codysluder »

Your log doesn't show that it uploaded the results. Did you press "c"?
Aardvark
Posts: 143
Joined: Sat Jul 12, 2008 4:22 pm
Location: Team MacResource

Re: Project 6059: (Run 0, Clone 33, Gen 5 )

Post by Aardvark »

@codysluder:

I'm sorry if my Log Snippet was slightly incomplete. I did press "c" and the morbid remains of that WU were returned to Stanford.

I checked the Statistics this morning and found that I had been credited with 4 ( that's FOUR) points.

Not bad, eh what??????
What is past is prologue!
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 6059: (Run 0, Clone 33, Gen 5 )

Post by bruce »

Right:

Hi Aardvark (team 48057),
Your WU (P6059 R0 C33 G5) was added to the stats database on 2010-04-23 02:06:04 for 4.37 points of credit.

Considering you spent about 20 minutes processing that WU that still about 315 PPD for results were most likely unusable, scientifically speaking.

I'm sure that everybody agrees that problems like this need fixing, whether it's a bad WU, some bad software somewhere, or some bad hardware somewhere, but the 315 PPD is probably fair.
Aardvark
Posts: 143
Joined: Sat Jul 12, 2008 4:22 pm
Location: Team MacResource

Re: Project 6059: (Run 0, Clone 33, Gen 5 )

Post by Aardvark »

@bruce;

Your statement does contain a certain universal wisdom.

However, the active FOLDERS among us would like an early solution to this problem.

I do recognize the effort you have made in moderating the FUROR that exists within the FAH effort.

PLEASE do not get the wrong message from my postings. I think we want the same outcome......
What is past is prologue!
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project 6059: (Run 0, Clone 33, Gen 5 )

Post by kasson »

Unfortunately, it's not a work unit problem. We get these almost exclusively from macs. We're aware of the problem and are looking for a solution. It's likely a gromacs bug somewhere.
Aardvark
Posts: 143
Joined: Sat Jul 12, 2008 4:22 pm
Location: Team MacResource

Re: Project 6059: (Run 0, Clone 33, Gen 5 )

Post by Aardvark »

@bruce;

Sorry about the delay in reply but I have been pondering the "fairness" of your diagnosis. I feel the following applies:

It has been my experience that approximately half of the "Early Fail" OSX10.6.3/v2.17A3core WUs I encounter are unable to write to the Work Files in a manner that allows for ANY return of information to Stanford. Without some measure, I assume this is Universal among this portion of the Folding Community. If we can go forward with all these assumptions, the PPD estimate should be divided by 2. That brings us to a 157.5 PPD result. Is anyone comfortable with that number on an ongoing basis???

All I can offer is "Any Points are better than none". I just continue to feel that kind of acceptance should be a discomfort to Stanford/PG.

I have recently checked the Client Profile as reported in the "Official Statistics". It is rather obvious just where the OSX10.6.3/v2.17 crowd stands in the rankings and therefore resource allocation.
What is past is prologue!
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: Project 6059: (Run 0, Clone 33, Gen 5 )

Post by codysluder »

Aardvark wrote:@bruce;

Sorry about the delay in reply but I have been pondering the "fairness" of your diagnosis. I feel the following applies:

It has been my experience that approximately half of the "Early Fail" OSX10.6.3/v2.17A3core WUs I encounter are unable to write to the Work Files in a manner that allows for ANY return of information to Stanford. Without some measure, I assume this is Universal among this portion of the Folding Community. If we can go forward with all these assumptions, the PPD estimate should be divided by 2. That brings us to a 157.5 PPD result. Is anyone comfortable with that number on an ongoing basis???

All I can offer is "Any Points are better than none". I just continue to feel that kind of acceptance should be a discomfort to Stanford/PG.

I have recently checked the Client Profile as reported in the "Official Statistics". It is rather obvious just where the OSX10.6.3/v2.17 crowd stands in the rankings and therefore resource allocation.
I assure you that the discomfort it Stanford/PG is genuine. They've been trying very hard to figure out what changed in OS-X that broke FAH. It was working fine in the older versions. If anybody has any suggestions about what they changed and how to make FAH work again, I'm sure they'd appreciate the help.
Post Reply