Folding Forum

Posted: **Tue Mar 23, 2010 2:23 pm**

What's happening here? This is 2x in a row now over the last couple of hours, fahcore3 runs, gets to around 2% and then bails out with this same error. Is this a bug in the WU's themselves? Or in the new fahcore3?

I see the /documentation/errors says (helpfully) that my system is blowing up

. Can I 'deflate' it at my end, or is this a problem caused by fah/fahcore/the WU?

Code: Select all

Last login: Tue Mar 23 12:14:04 on ttys000
Macintosh:~ yoxi$ /Users/yoxi/Applications/mym/fah.command ; exit;
COLUMNS=55;
LINES=6;
export COLUMNS LINES;
Using local directory for configuration

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

Using local directory for work files
2 cores detected


--- Opening Log file [March 23 12:38:27 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.29r1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/yoxi/Library/Folding@home
Executable: ./fah6
Arguments: -smp -local -verbosity 9 -oneunit -forceasm 

[12:38:27] - Ask before connecting: No
[12:38:27] - User name: yoxi (Team 3007)
[12:38:27] - User ID: 80C71DC069F7846
[12:38:27] - Machine ID: 1
[12:38:27] 
[12:38:28] Loaded queue successfully.
[12:38:28] - Preparing to get new work unit...
[12:38:28] Cleaning up work directory
[12:38:28] - Autosending finished units... [March 23 12:38:28 UTC]
[12:38:28] Trying to send all finished work units
[12:38:28] + No unsent completed units remaining.
[12:38:28] - Autosend completed
[12:38:28] + Attempting to get work packet
[12:38:28] Passkey found
[12:38:28] - Will indicate memory of 8192 MB
[12:38:28] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 10
[12:38:28] - Connecting to assignment server
[12:38:28] Connecting to http://assign.stanford.edu:8080/
[12:38:28] Posted data.
[12:38:28] Initial: ED82; - Successful: assigned to (130.237.232.140).
[12:38:28] + News From Folding@Home: Welcome to Folding@Home
[12:38:29] Loaded queue successfully.
[12:38:29] Sent data
[12:38:29] Connecting to http://130.237.232.140:8080/
[12:38:29] Posted data.
[12:38:29] Initial: 0000; - Receiving payload (expected size: 1799850)
[12:39:28] - Downloaded at ~29 kB/s
[12:39:28] - Averaged speed for that direction ~45 kB/s
[12:39:28] + Received work.
[12:39:28] + Closed connections
[12:39:28] 
[12:39:28] + Processing work unit
[12:39:28] Core required: FahCore_a3.exe
[12:39:28] Core found.
[12:39:28] Working on queue slot 01 [March 23 12:39:28 UTC]
[12:39:28] + Working ...
[12:39:28] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 2 -checkpoint 30 -forceasm -verbose -lifeline 3912 -version 629'

[12:39:28] 
[12:39:28] *------------------------------*
[12:39:28] Folding@Home Gromacs SMP Core
[12:39:28] Version 2.17 (Mar 7 2010)
[12:39:28] 
[12:39:28] Preparing to commence simulation
[12:39:28] - Assembly optimizations manually forced on.
[12:39:28] - Not checking prior termination.
[12:39:28] - Expanded 1799338 -> 2396877 (decompressed 133.2 percent)
[12:39:28] Called DecompressByteArray: compressed_data_size=1799338 data_size=2396877, decompressed_data_size=2396877 diff=0
[12:39:28] - Digital signature verified
[12:39:28] 
[12:39:28] Project: 6014 (Run 0, Clone 106, Gen 86)
[12:39:28] 
[12:39:28] Assembly optimizations on if available.
[12:39:28] Entering M.D.
Starting 2 threads
NNODES=2, MYRANK=0, HOSTNAME=thread #0
NNODES=2, MYRANK=1, HOSTNAME=thread #1
Reading file work/wudata_01.tpr, VERSION 4.0.99_development_20090605 (single precision)
Note: tpx file_version 68, software version 70
Making 1D domain decomposition 2 x 1 x 1
starting mdrun 'Protein in POPC'
43500004 steps,  87000.0 ps (continuing from step 43000004,  86000.0 ps).
[12:39:35] Completed 0 out of 500000 steps  (0%)
[12:50:41] Completed 5000 out of 500000 steps  (1%)
[13:02:12] Completed 10000 out of 500000 steps  (2%)

-------------------------------------------------------
Program mdrun, VERSION 4.0.99-dev-20100305
Source code file: /Users/kasson/a3_devnew/gromacs/src/mdlib/pme.c, line: 563

Fatal error:
16 particles communicated to PME node 1 are more than a cell length out of the domain decomposition cell of their charge group in dimension x
For more information and tips for trouble shooting please check the GROMACS website at
http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

[13:02:25] mdrun returned 255
[13:02:25] Going to send back what have done -- stepsTotalG=500000
[13:02:25] Work fraction=0.0202 steps=500000.
[13:02:25] CoreStatus = 0 (0)
[13:02:25] Sending work to server
[13:02:25] Project: 6014 (Run 0, Clone 106, Gen 86)
[13:02:25] - Error: Could not get length of results file work/wuresults_01.dat
[13:02:25] - Error: Could not read unit 01 file. Removing from queue.
[13:02:25] Trying to send all finished work units
[13:02:25] + No unsent completed units remaining.
[13:02:25] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[13:02:25] Cleaning up work directory
[13:02:25] ***** Got a SIGTERM signal (15)
[13:02:25] Killing all core threads

Folding@Home Client Shutdown.
logout

[Process completed]

Posted: **Tue Mar 23, 2010 3:02 pm**

...and again. There's no point running any more of these until it's clear why they're not working now.

Code: Select all

Last login: Tue Mar 23 12:38:27 on ttys000
Macintosh:~ yoxi$ /Users/yoxi/Applications/mym/fah.command ; exit;
COLUMNS=55;
LINES=6;
export COLUMNS LINES;
Using local directory for configuration

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

Using local directory for work files
2 cores detected


--- Opening Log file [March 23 14:36:20 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.29r1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/yoxi/Library/Folding@home
Executable: ./fah6
Arguments: -smp -local -verbosity 9 -oneunit -forceasm 

[14:36:20] - Ask before connecting: No
[14:36:20] - User name: yoxi (Team 3007)
[14:36:20] - User ID: 80C71DC069F7846
[14:36:20] - Machine ID: 1
[14:36:20] 
[14:36:20] Loaded queue successfully.
[14:36:20] - Preparing to get new work unit...
[14:36:20] Cleaning up work directory
[14:36:20] - Autosending finished units... [March 23 14:36:20 UTC]
[14:36:20] Trying to send all finished work units
[14:36:20] + No unsent completed units remaining.
[14:36:20] - Autosend completed
[14:36:20] + Attempting to get work packet
[14:36:20] Passkey found
[14:36:20] - Will indicate memory of 8192 MB
[14:36:20] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 10
[14:36:20] - Connecting to assignment server
[14:36:20] Connecting to http://assign.stanford.edu:8080/
[14:36:21] Posted data.
[14:36:21] Initial: ED82; - Successful: assigned to (130.237.232.140).
[14:36:21] + News From Folding@Home: Welcome to Folding@Home
[14:36:21] Loaded queue successfully.
[14:36:21] Sent data
[14:36:21] Connecting to http://130.237.232.140:8080/
[14:36:22] Posted data.
[14:36:22] Initial: 0000; - Receiving payload (expected size: 1799850)
[14:37:19] - Downloaded at ~30 kB/s
[14:37:19] - Averaged speed for that direction ~42 kB/s
[14:37:19] + Received work.
[14:37:19] + Closed connections
[14:37:19] 
[14:37:19] + Processing work unit
[14:37:19] Core required: FahCore_a3.exe
[14:37:19] Core found.
[14:37:19] Working on queue slot 02 [March 23 14:37:19 UTC]
[14:37:19] + Working ...
[14:37:19] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 02 -np 2 -checkpoint 30 -forceasm -verbose -lifeline 4168 -version 629'

[14:37:19] 
[14:37:19] *------------------------------*
[14:37:19] Folding@Home Gromacs SMP Core
[14:37:19] Version 2.17 (Mar 7 2010)
[14:37:19] 
[14:37:19] Preparing to commence simulation
[14:37:19] - Ensuring status. Please wait.
[14:37:28] - Assembly optimizations manually forced on.
[14:37:28] - Not checking prior termination.
[14:37:29] - Expanded 1799338 -> 2396877 (decompressed 133.2 percent)
[14:37:29] Called DecompressByteArray: compressed_data_size=1799338 data_size=2396877, decompressed_data_size=2396877 diff=0
[14:37:29] - Digital signature verified
[14:37:29] 
[14:37:29] Project: 6014 (Run 0, Clone 106, Gen 86)
[14:37:29] 
[14:37:29] Assembly optimizations on if available.
[14:37:29] Entering M.D.
Starting 2 threads
NNODES=2, MYRANK=1, HOSTNAME=thread #1
NNODES=2, MYRANK=0, HOSTNAME=thread #0
Reading file work/wudata_02.tpr, VERSION 4.0.99_development_20090605 (single precision)
Note: tpx file_version 68, software version 70
Making 1D domain decomposition 2 x 1 x 1
starting mdrun 'Protein in POPC'
43500004 steps,  87000.0 ps (continuing from step 43000004,  86000.0 ps).
[14:37:35] Completed 0 out of 500000 steps  (0%)
[14:48:53] Completed 5000 out of 500000 steps  (1%)
[15:00:09] Completed 10000 out of 500000 steps  (2%)

-------------------------------------------------------
Program mdrun, VERSION 4.0.99-dev-20100305
Source code file: /Users/kasson/a3_devnew/gromacs/src/mdlib/pme.c, line: 563

Fatal error:
16 particles communicated to PME node 1 are more than a cell length out of the domain decomposition cell of their charge group in dimension x
For more information and tips for trouble shooting please check the GROMACS website at
http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

[15:00:21] mdrun returned 255
[15:00:21] Going to send back what have done -- stepsTotalG=500000
[15:00:21] Work fraction=0.0202 steps=500000.
[15:00:21] CoreStatus = 0 (0)
[15:00:21] Sending work to server
[15:00:21] Project: 6014 (Run 0, Clone 106, Gen 86)
[15:00:21] - Error: Could not get length of results file work/wuresults_02.dat
[15:00:21] - Error: Could not read unit 02 file. Removing from queue.
[15:00:21] Trying to send all finished work units
[15:00:21] + No unsent completed units remaining.
[15:00:21] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[15:00:21] Cleaning up work directory
[15:00:21] ***** Got a SIGTERM signal (15)
[15:00:21] Killing all core threads

Folding@Home Client Shutdown.
logout

[Process completed]

Posted: **Tue Mar 23, 2010 3:17 pm**

padmavyuha wrote:...and again. There's no point running any more of these until it's clear why they're not working now.
Code: Select all
[15:00:21] Project: 6014 (Run 0, Clone 106, Gen 86)

All of your reports are for the same WU. There's a reasonable chance that it's a bad WU and you just need to get past it and move on to something else. Nobody has successfully returned that WU yet.

I'm going to change the title and move this topic to the forum where reports of possible bad WUs are found.

Posted: **Tue Mar 23, 2010 3:18 pm**

Both of these are the same WU. Seems like you have a reproducible error and that indicates a bad WU. Kasson may want a copy of the data files for this WU. I suggest that you PM him.

If you are not interested in that, you will need to force a new WU. You do that by deleting the work folder, the queue and reconfiguring the client to a different MachineID so that from the servers point of view it is a totally different install.

Posted: **Tue Mar 23, 2010 3:57 pm**

Oh, I didn't realise it was downloading me the same WU over and over again. That's a bit weird. I'll archive my current work files and reset the system as you suggest, and PM Kasson.

Posted: **Tue Mar 23, 2010 4:11 pm**

Now running with a fresh WU, and have PM'd Kasson.

Posted: **Tue Mar 23, 2010 4:13 pm**

If you don't get a "Thank you ..." in the log, then the server is not getting the WU back and then if your machine requests a new WU the server will look up the last wu that was assigned and give the same one back to you. It is designed that way to prevent people from cherry-picking WU's.

Posted: **Tue Mar 23, 2010 4:22 pm**

Smart. I did actually start to muse (in a half-awake sort of way) on why the WU's I was getting all seemed to be around the same size

, but I had no reason to suspect it was the same one over and over. This makes me feel better about the ones that crashed (in my other thread) after I hit ctrl-c - as presumably, I ended up getting them again and completing them. Closure!

Posted: **Thu Mar 25, 2010 8:09 am**

Another one bites the dust

- Project: 6015 (Run 0, Clone 8, Gen 109)

I got no reply from Kasson about the last one yet, by the way.

Code: Select all

Last login: Wed Mar 24 11:08:42 on console
/Users/yoxi/Applications/mym/fah.command ; exit;
Macintosh:~ yoxi$ /Users/yoxi/Applications/mym/fah.command ; exit;
COLUMNS=55;
LINES=6;
export COLUMNS LINES;
Using local directory for configuration

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

Using local directory for work files
2 cores detected


--- Opening Log file [March 25 01:00:25 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.29r1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/yoxi/Library/Folding@home
Executable: ./fah6
Arguments: -smp -local -verbosity 9 -oneunit -forceasm 

[01:00:25] - Ask before connecting: No
[01:00:25] - User name: yoxi (Team 3007)
[01:00:25] - User ID: 80C71DC069F7846
[01:00:25] - Machine ID: 2
[01:00:25] 
[01:00:26] Loaded queue successfully.
[01:00:26] - Preparing to get new work unit...
[01:00:26] Cleaning up work directory
[01:00:26] + Attempting to get work packet
[01:00:26] Passkey found
[01:00:26] - Will indicate memory of 8192 MB
[01:00:26] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 10
[01:00:26] - Connecting to assignment server
[01:00:26] Connecting to http://assign.stanford.edu:8080/
[01:00:26] - Autosending finished units... [March 25 01:00:26 UTC]
[01:00:26] Trying to send all finished work units
[01:00:26] + No unsent completed units remaining.
[01:00:26] - Autosend completed
[01:00:27] Posted data.
[01:00:27] Initial: ED82; - Successful: assigned to (130.237.232.140).
[01:00:27] + News From Folding@Home: Welcome to Folding@Home
[01:00:27] Loaded queue successfully.
[01:00:27] Sent data
[01:00:27] Connecting to http://130.237.232.140:8080/
[01:00:29] Posted data.
[01:00:29] Initial: 0000; - Receiving payload (expected size: 1799333)
[01:00:47] - Downloaded at ~97 kB/s
[01:00:47] - Averaged speed for that direction ~56 kB/s
[01:00:47] + Received work.
[01:00:47] + Closed connections
[01:00:47] 
[01:00:47] + Processing work unit
[01:00:47] Core required: FahCore_a3.exe
[01:00:47] Core found.
[01:00:47] Working on queue slot 02 [March 25 01:00:47 UTC]
[01:00:47] + Working ...
[01:00:47] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 02 -np 2 -checkpoint 30 -forceasm -verbose -lifeline 2422 -version 629'

[01:00:47] 
[01:00:47] *------------------------------*
[01:00:47] Folding@Home Gromacs SMP Core
[01:00:47] Version 2.17 (Mar 7 2010)
[01:00:47] 
[01:00:47] Preparing to commence simulation
[01:00:47] - Assembly optimizations manually forced on.
[01:00:47] - Not checking prior termination.
[01:00:47] - Expanded 1798821 -> 2392545 (decompressed 133.0 percent)
[01:00:47] Called DecompressByteArray: compressed_data_size=1798821 data_size=2392545, decompressed_data_size=2392545 diff=0
[01:00:47] - Digital signature verified
[01:00:47] 
[01:00:47] Project: 6015 (Run 0, Clone 8, Gen 109)
[01:00:47] 
[01:00:47] Assembly optimizations on if available.
[01:00:47] Entering M.D.
Starting 2 threads
NNODES=2, MYRANK=1, HOSTNAME=thread #1
NNODES=2, MYRANK=0, HOSTNAME=thread #0
Reading file work/wudata_02.tpr, VERSION 4.0.99_development_20090605 (single precision)
Note: tpx file_version 68, software version 70
Making 1D domain decomposition 2 x 1 x 1
starting mdrun 'Protein in POPC'
55000004 steps, 110000.0 ps (continuing from step 54500004, 109000.0 ps).
[01:00:54] Completed 0 out of 500000 steps  (0%)

-------------------------------------------------------
Program mdrun, VERSION 4.0.99-dev-20100305
Source code file: /Users/kasson/a3_devnew/gromacs/src/mdlib/pme.c, line: 563

Fatal error:
8 particles communicated to PME node 1 are more than a cell length out of the domain decomposition cell of their charge group in dimension x
For more information and tips for trouble shooting please check the GROMACS website at
http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

[01:04:35] mdrun returned 255
[01:04:35] Going to send back what have done -- stepsTotalG=500000
[01:04:35] Work fraction=0.0035 steps=500000.
[01:04:35] CoreStatus = 0 (0)
[01:04:35] Sending work to server
[01:04:35] Project: 6015 (Run 0, Clone 8, Gen 109)
[01:04:35] - Error: Could not get length of results file work/wuresults_02.dat
[01:04:35] - Error: Could not read unit 02 file. Removing from queue.
[01:04:35] Trying to send all finished work units
[01:04:35] + No unsent completed units remaining.
[01:04:35] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[01:04:35] Cleaning up work directory
[01:04:35] + Attempting to get work packet
[01:04:35] Passkey found
[01:04:35] - Will indicate memory of 8192 MB
[01:04:35] - Connecting to assignment server
[01:04:35] Connecting to http://assign.stanford.edu:8080/
[01:04:35] ***** Got a SIGTERM signal (15)
[01:04:35] Killing all core threads

Folding@Home Client Shutdown.
logout

[Process completed]

Posted: **Thu Mar 25, 2010 9:59 am**

...and another one - Project: 6015 (Run 0, Clone 30, Gen 108)

I deleted the fahcores as well this time, hence the new download. I'm suggesting there may be a problem with the fahcore_a3 itself here: this problem began to appear only after the most recent update to the fahcore_a3 a couple of weeks ago (or whenever it was). I remember running fah6 and seeing a new version of fahcore_a3 being downloaded, anyway, and it's only since then that these errors have started appearing.

Code: Select all

Last login: Thu Mar 25 08:12:31 on ttys000
Macintosh:~ yoxi$ /Users/yoxi/Applications/mym/fah.command ; exit;
COLUMNS=55;
LINES=6;
export COLUMNS LINES;
Using local directory for configuration

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

Using local directory for work files
2 cores detected


--- Opening Log file [March 25 08:13:37 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.29r1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/yoxi/Library/Folding@home
Executable: ./fah6
Arguments: -smp -local -verbosity 9 -oneunit -forceasm 

[08:13:37] - Ask before connecting: No
[08:13:37] - User name: yoxi (Team 3007)
[08:13:37] - User ID: 80C71DC069F7846
[08:13:37] - Machine ID: 3
[08:13:37] 
[08:13:37] Work directory not found. Creating...
[08:13:37] Could not open work queue, generating new queue...
[08:13:37] - Autosending finished units... [March 25 08:13:37 UTC]
[08:13:37] Trying to send all finished work units
[08:13:37] + No unsent completed units remaining.
[08:13:37] - Autosend completed
[08:13:37] - Preparing to get new work unit...
[08:13:37] Cleaning up work directory
[08:13:37] + Attempting to get work packet
[08:13:37] Passkey found
[08:13:37] - Will indicate memory of 8192 MB
[08:13:37] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 10
[08:13:37] - Connecting to assignment server
[08:13:37] Connecting to http://assign.stanford.edu:8080/
[08:13:38] Posted data.
[08:13:38] Initial: ED82; - Successful: assigned to (130.237.232.140).
[08:13:38] + News From Folding@Home: Welcome to Folding@Home
[08:13:38] Loaded queue successfully.
[08:13:38] Sent data
[08:13:38] Connecting to http://130.237.232.140:8080/
[08:13:39] Posted data.
[08:13:39] Initial: 0000; - Receiving payload (expected size: 1798679)
[08:14:35] - Downloaded at ~31 kB/s
[08:14:35] - Averaged speed for that direction ~31 kB/s
[08:14:35] + Received work.
[08:14:35] + Closed connections
[08:14:35] 
[08:14:35] + Processing work unit
[08:14:35] Core required: FahCore_a3.exe
[08:14:35] Core not found.
[08:14:35] - Core is not present or corrupted.
[08:14:35] - Attempting to download new core...
[08:14:35] + Downloading new core: FahCore_a3.exe
[08:14:35] Downloading core (/~pande/OSX/x86/Core_a3.fah from www.stanford.edu)
[08:14:37] Initial: AFDE; + 10240 bytes downloaded
 [...snip...]
[08:16:39] Initial: 5486; + 2125653 bytes downloaded
[08:16:40] Verifying core Core_a3.fah...
[08:16:40] Signature is VALID
[08:16:40] 
[08:16:40] Trying to unzip core FahCore_a3.exe
[08:16:40] Decompressed FahCore_a3.exe (6392344 bytes) successfully
[08:16:40] + Core successfully engaged
[08:16:45] 
[08:16:45] + Processing work unit
[08:16:45] Core required: FahCore_a3.exe
[08:16:45] Core found.
[08:16:45] Working on queue slot 01 [March 25 08:16:45 UTC]
[08:16:45] + Working ...
[08:16:45] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 2 -checkpoint 5 -forceasm -verbose -lifeline 3259 -version 629'

[08:16:45] 
[08:16:45] *------------------------------*
[08:16:45] Folding@Home Gromacs SMP Core
[08:16:45] Version 2.17 (Mar 7 2010)
[08:16:45] 
[08:16:45] Preparing to commence simulation
[08:16:45] - Assembly optimizations manually forced on.
[08:16:45] - Not checking prior termination.
[08:16:46] - Expanded 1798167 -> 2392545 (decompressed 133.0 percent)
[08:16:46] Called DecompressByteArray: compressed_data_size=1798167 data_size=2392545, decompressed_data_size=2392545 diff=0
[08:16:46] - Digital signature verified
[08:16:46] 
[08:16:46] Project: 6015 (Run 0, Clone 30, Gen 108)
[08:16:46] 
[08:16:46] Assembly optimizations on if available.
[08:16:46] Entering M.D.
Starting 2 threads
NNODES=2, MYRANK=1, HOSTNAME=thread #1
NNODES=2, MYRANK=0, HOSTNAME=thread #0
Reading file work/wudata_01.tpr, VERSION 4.0.99_development_20090605 (single precision)
Note: tpx file_version 68, software version 70
Making 1D domain decomposition 2 x 1 x 1
starting mdrun 'Protein in POPC'
54500004 steps, 109000.0 ps (continuing from step 54000004, 108000.0 ps).
[08:16:52] Completed 0 out of 500000 steps  (0%)
[08:28:03] Completed 5000 out of 500000 steps  (1%)
[08:39:16] Completed 10000 out of 500000 steps  (2%)
[08:51:00] Completed 15000 out of 500000 steps  (3%)
[09:02:02] Completed 20000 out of 500000 steps  (4%)
[09:13:39] Completed 25000 out of 500000 steps  (5%)
[09:25:33] Completed 30000 out of 500000 steps  (6%)
[09:37:45] Completed 35000 out of 500000 steps  (7%)

-------------------------------------------------------
Program mdrun, VERSION 4.0.99-dev-20100305
Source code file: /Users/kasson/a3_devnew/gromacs/src/mdlib/pme.c, line: 563

Fatal error:
3 particles communicated to PME node 1 are more than a cell length out of the domain decomposition cell of their charge group in dimension x
For more information and tips for trouble shooting please check the GROMACS website at
http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

[09:47:59] mdrun returned 255
[09:47:59] Going to send back what have done -- stepsTotalG=500000
[09:47:59] Work fraction=0.0794 steps=500000.
[09:48:03] logfile size=15619 infoLength=15619 edr=0 trr=25
[09:48:03] logfile size: 15619 info=15619 bed=0 hdr=25
[09:48:03] - Writing 16157 bytes of core data to disk...
[09:48:03]   ... Done.
[09:48:03] 
[09:48:03] Folding@home Core Shutdown: UNSTABLE_MACHINE
[09:48:03] CoreStatus = 7A (122)
[09:48:03] Sending work to server
[09:48:03] Project: 6015 (Run 0, Clone 30, Gen 108)


[09:48:03] + Attempting to send results [March 25 09:48:03 UTC]
[09:48:03] - Reading file work/wuresults_01.dat from core
[09:48:03]   (Read 16157 bytes from disk)
[09:48:03] Connecting to http://130.237.232.140:8080/
[09:48:04] Posted data.
[09:48:04] Initial: 0000; - Uploaded at ~16 kB/s
[09:48:04] - Averaged speed for that direction ~16 kB/s
[09:48:04] + Results successfully sent
[09:48:04] Thank you for your contribution to Folding@Home.
[09:48:04] Trying to send all finished work units
[09:48:04] + No unsent completed units remaining.
[09:48:04] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[09:48:04] Cleaning up work directory
[09:48:04] + Attempting to get work packet
[09:48:04] Passkey found
[09:48:04] - Will indicate memory of 8192 MB
[09:48:04] - Connecting to assignment server
[09:48:04] Connecting to http://assign.stanford.edu:8080/
[09:48:04] ***** Got a SIGTERM signal (15)
[09:48:04] Killing all core threads

Folding@Home Client Shutdown.
logout

[Process completed]

Posted: **Thu Mar 25, 2010 10:51 am**

Is overclocking involved here ? Did you check your system stability (StressCPU and Memtest) ?

Posted: **Thu Mar 25, 2010 3:07 pm**

Just your basic dual core iMac, no cheese, hold the overclocking. I've checked the memory, but will check it again. In the meantime, either there are some rogue WUs out there suddenly, or it's the fahcore_a3 that's choking. All was working fine on the folding front until the fahcore_a3 got updated. Of course, it could be coincidence and maybe something's up with my mac, but I'm not having any other problems.

Posted: **Thu Mar 25, 2010 5:35 pm**

I am afraid that this problem is more serious than the discussion to date seems to accept. Over the past 7-10 days, I have experienced almost exactly the same failure. Approximately 4 or 5 a3core WUs have failed with the same failure report at the 1 to 7 % stage of folding. Some of the WUs turned into EUEs and the report on those should have been received by and noted by the Servers at Stanford at the time of return. They have not all been from the same Project. They have all been a3core Projects, however.

I am running these on a 1.83Ghz Mac Mini using OSX10.6.2. Nothing has changed recently with my rig other than the recent update of the a3core to v2.17 and I did recently go to OSX10.6.2 from OSX10.6.1. I have not posted anything about this problem because it will happen to two WUs in a row and then another a3core WU will show up and run without a problem. I am not into any overclocking. This is strictly a "stock" Mac Mini rig.

I am not including any logs in this post since I am sending this from a different machine and I don't have immediate access to the log files. If there is any interest in my providing those, let me know.

I sense this is more insidious and widespread than is being accepted. Someone should give it some "serious" attention.

Posted: **Thu Mar 25, 2010 8:31 pm**

Well, I'm glad to know it's not just me.

Posted: **Fri Mar 26, 2010 12:44 am**

@ padmavyuha;

You are not alone in this, and I suspect that there are others beyond you and I. I agree with your musings about it probably being associated with the new v2.17 a3core. Timing would just seem to point in that direction.

It is important that the word get distributed on this problem so others can chime in and display that "critical mass" necessary to get a significant PG response in time to save us from total collapse.....

If moving the thread again seems to be indicated, I hope one of the Moderators will do the deed.

Don't let the issue die. Unless it does just go away on it's own.

Folding Forum

Project: 6014 (Run 0, Clone 106, Gen 86)

Project: 6014 (Run 0, Clone 106, Gen 86)

Re: fahcore3 breaks after 1-2%

Re: fahcore a3 breaks after 1-2%

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)

Re: Project: 6014 (Run 0, Clone 106, Gen 86)