Project 6040 Run 0 Clone 30 Gen 71

Moderators: Site Moderators, FAHC Science Team

Post Reply
HendricksSA
Posts: 339
Joined: Fri Jun 26, 2009 4:34 am

Project 6040 Run 0 Clone 30 Gen 71

Post by HendricksSA »

This is the first project 604x to fail on this machine. Not sure if error 7A (122) means it is my hardware or not. I decided to send the log along just in case it helps to distinguish between a problem with you or me. Fold on!

Code: Select all

Launch directory: /media/linux_smp/fahSMP
Executable: ./fah6
Arguments: -smp -oneunit -verbosity 9 

[22:32:11] - Ask before connecting: No
[22:32:11] - User name: xxxxxx
[22:32:11] - User ID: xxxxxx
[22:32:11] - Machine ID: 1
[22:32:11] 
[22:32:11] Loaded queue successfully.
[22:32:11] - Preparing to get new work unit...
[22:32:11] Cleaning up work directory
[22:32:11] - Autosending finished units... [August 20 22:32:11 UTC]
[22:32:11] Trying to send all finished work units
[22:32:11] + No unsent completed units remaining.
[22:32:11] - Autosend completed
[22:32:18] + Attempting to get work packet
[22:32:18] Passkey found
[22:32:18] - Will indicate memory of 7994 MB
[22:32:18] - Connecting to assignment server
[22:32:18] Connecting to http://assign.stanford.edu:8080/
[22:32:18] Posted data.
[22:32:18] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[22:32:18] + News From Folding@Home: Welcome to Folding@Home
[22:32:18] Loaded queue successfully.
[22:32:18] Connecting to http://171.64.65.54:8080/
[22:32:21] Posted data.
[22:32:21] Initial: 0000; - Receiving payload (expected size: 7882826)
[22:32:26] - Downloaded at ~1539 kB/s
[22:32:26] - Averaged speed for that direction ~992 kB/s
[22:32:26] + Received work.
[22:32:26] + Closed connections
[22:32:26] 
[22:32:26] + Processing work unit
[22:32:26] Core required: FahCore_a3.exe
[22:32:26] Core found.
[22:32:26] Working on queue slot 07 [August 20 22:32:26 UTC]
[22:32:26] + Working ...
[22:32:26] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 07 -np 6 -checkpoint 15 -verbose -lifeline 2386 -version 629'

[22:32:26] 
[22:32:26] *------------------------------*
[22:32:26] Folding@Home Gromacs SMP Core
[22:32:26] Version 2.22 (June 10, 2010)
[22:32:26] 
[22:32:26] Preparing to commence simulation
[22:32:26] - Looking at optimizations...
[22:32:26] - Created dyn
[22:32:26] - Files status OK
[22:32:27] - Expanded 7882314 -> 10126021 (decompressed 128.4 percent)
[22:32:27] Called DecompressByteArray: compressed_data_size=7882314 data_size=10126021, decompressed_data_size=10126021 diff=0
[22:32:27] - Digital signature verified
[22:32:27] 
[22:32:27] Project: 6040 (Run 0, Clone 30, Gen 71)
[22:32:27] 
[22:32:27] Assembly optimizations on if available.
[22:32:27] Entering M.D.
Starting 6 threads
NNODES=6, MYRANK=5, HOSTNAME=thread #5
NNODES=6, MYRANK=1, HOSTNAME=thread #1
NNODES=6, MYRANK=3, HOSTNAME=thread #3
NNODES=6, MYRANK=2, HOSTNAME=thread #2
NNODES=6, MYRANK=4, HOSTNAME=thread #4
NNODES=6, MYRANK=0, HOSTNAME=thread #0
Reading file work/wudata_07.tpr, VERSION 4.0.99_development_20090605 (single precision)
Making 1D domain decomposition 6 x 1 x 1
starting mdrun '8817 system'
18000002 steps,  72000.0 ps (continuing from step 17750002,  71000.0 ps).
[22:32:35] Completed 0 out of 250000 steps  (0%)
[22:44:49] Completed 2500 out of 250000 steps  (1%)
[22:57:04] Completed 5000 out of 250000 steps  (2%)
[23:09:18] Completed 7500 out of 250000 steps  (3%)
[23:21:32] Completed 10000 out of 250000 steps  (4%)
[23:33:43] Completed 12500 out of 250000 steps  (5%)
[23:45:52] Completed 15000 out of 250000 steps  (6%)
[23:58:07] Completed 17500 out of 250000 steps  (7%)
[00:10:20] Completed 20000 out of 250000 steps  (8%)
[00:22:32] Completed 22500 out of 250000 steps  (9%)
[00:34:46] Completed 25000 out of 250000 steps  (10%)
[00:51:50] Completed 27500 out of 250000 steps  (11%)
[01:07:37] Completed 30000 out of 250000 steps  (12%)
[01:19:51] Completed 32500 out of 250000 steps  (13%)
[01:32:00] Completed 35000 out of 250000 steps  (14%)
[01:44:12] Completed 37500 out of 250000 steps  (15%)
[01:56:21] Completed 40000 out of 250000 steps  (16%)
[02:14:15] Completed 42500 out of 250000 steps  (17%)
[02:30:36] Completed 45000 out of 250000 steps  (18%)
[02:42:46] Completed 47500 out of 250000 steps  (19%)
[02:54:57] Completed 50000 out of 250000 steps  (20%)
[03:07:08] Completed 52500 out of 250000 steps  (21%)
[03:19:19] Completed 55000 out of 250000 steps  (22%)
[03:31:28] Completed 57500 out of 250000 steps  (23%)
[03:43:40] Completed 60000 out of 250000 steps  (24%)
[03:55:53] Completed 62500 out of 250000 steps  (25%)
[04:10:06] Completed 65000 out of 250000 steps  (26%)
[04:27:45] Completed 67500 out of 250000 steps  (27%)
[04:32:11] - Autosending finished units... [August 21 04:32:11 UTC]
[04:32:11] Trying to send all finished work units
[04:32:11] + No unsent completed units remaining.
[04:32:11] - Autosend completed
[04:45:33] Completed 70000 out of 250000 steps  (28%)
[04:58:25] Completed 72500 out of 250000 steps  (29%)
[05:10:39] Completed 75000 out of 250000 steps  (30%)
[05:22:56] Completed 77500 out of 250000 steps  (31%)
[05:35:14] Completed 80000 out of 250000 steps  (32%)
[05:52:00] Completed 82500 out of 250000 steps  (33%)
[06:10:02] Completed 85000 out of 250000 steps  (34%)
[06:25:03] Completed 87500 out of 250000 steps  (35%)
[06:41:51] Completed 90000 out of 250000 steps  (36%)
[06:54:52] Completed 92500 out of 250000 steps  (37%)
[07:07:13] Completed 95000 out of 250000 steps  (38%)
[07:19:34] Completed 97500 out of 250000 steps  (39%)
[07:31:51] Completed 100000 out of 250000 steps  (40%)
[07:47:54] Completed 102500 out of 250000 steps  (41%)

step 17853716: Water molecule starting at atom 60913 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.

step 17853718: Water molecule starting at atom 60913 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.

step 17853719: Water molecule starting at atom 288010 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.

step 17853719: Water molecule starting at atom 184933 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.

Step 17853720:
The charge group starting at atom 86614 moved than the distance allowed by the domain decomposition (2.499710) in direction X
distance out of cell -4298325.500000
Old coordinates:   12.578    2.688    0.652
New coordinates: -4298313.000    8.202    1.798
Old cell boundaries in direction X:   12.499   14.998
New cell boundaries in direction X:   12.499   14.998

-------------------------------------------------------
Program mdrun, VERSION 4.0.99-dev-20100610-b6a86-dirty
Source code file: /data0/FAHdev/a3_development/gromacs/src/mdlib/domdec.c, line: 4081

Fatal error:
A charge group moved too far between two domain decomposition steps
This usually means that your system is not well equilibrated
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

[07:53:52] mdrun returned 255
[07:53:52] Going to send back what have done -- stepsTotalG=250000
[07:53:52] Work fraction=171.1388 steps=250000.
[07:53:56] logfile size=81423 infoLength=81423 edr=25 trr=1
[07:53:56] logfile size: 81423 info=81423 bed=25 hdr=1
[07:53:56] - Writing 81961 bytes of core data to disk...
[07:53:57]   ... Done.
[07:58:03] 
[07:58:03] Folding@home Core Shutdown: UNSTABLE_MACHINE
[07:58:03] CoreStatus = 7A (122)
[07:58:03] Sending work to server
[07:58:03] Project: 6040 (Run 0, Clone 30, Gen 71)


[07:58:03] + Attempting to send results [August 21 07:58:03 UTC]
[07:58:03] - Reading file work/wuresults_07.dat from core
[07:58:03]   (Read 81961 bytes from disk)
[07:58:03] Connecting to http://171.64.65.54:8080/
[07:58:04] Posted data.
[07:58:04] Initial: 0000; - Uploaded at ~81 kB/s
[07:58:04] - Averaged speed for that direction ~197 kB/s
[07:58:04] + Results successfully sent
[07:58:04] Thank you for your contribution to Folding@Home.
[07:58:05] Trying to send all finished work units
[07:58:05] + No unsent completed units remaining.
[07:58:05] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[07:58:05] Cleaning up work directory
[07:58:05] ***** Got a SIGTERM signal (15)
[07:58:05] Killing all core threads
toTOW
Site Moderator
Posts: 6435
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project 6040 Run 0 Clone 30 Gen 71

Post by toTOW »

You're the only one to have returned this WU at the moment ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Project 6040 Run 0 Clone 30 Gen 71

Post by sortofageek »

Project: 6040 (Run 0, Clone 30, Gen 71) still has not been returned by anyone else. Usually that is a good sign for the WU itself, although we'll keep checking.

From what I could see there is reason for you to check for instability if you continue to have this sort of issue. If I were checking, I would check overclocking, CPU, memory and disks, but instability could also be caused by a corrupted download or some sort of momentary glitch, like conflict with other software. The latter suspects are the reason I would probably just shrug and ignore it if it doesn't keep happening.
Post Reply