Page 1 of 1

Project: 2671 (Run 39, Clone 95, Gen 38) LINCS WARNING

Posted: Thu May 28, 2009 10:34 am
by bollix47
FYI

This WU died as indicated in the following log. It appears to carry on okay after a restart.

Code: Select all

[06:34:31] Completed 152500 out of 250000 steps  (61%)
[06:38:55] Completed 155000 out of 250000 steps  (62%)
[06:43:22] Completed 157500 out of 250000 steps  (63%)

Step 9657860, time 19315.7 (ps)  LINCS WARNING
relative constraint deviation after LINCS:
rms 0.040056, max 1.179423 (between atoms 723 and 725)
bonds that rotated more than 90 degrees:
 atom 1 atom 2  angle  previous, current, constraint length
    723    725   90.0    0.1090   0.2376      0.1090
    726    727   90.0    0.1010   0.2059      0.1010

Step 9657861, time 19315.7 (ps)  LINCS WARNING
relative constraint deviation after LINCS:
rms 775.789356, max 21975.304688 (between atoms 723 and 725)
bonds that rotated more than 90 degrees:
 atom 1 atom 2  angle  previous, current, constraint length
[06:44:00] 
[06:44:00] Folding@home Core Shutdown: INTERRUPTED
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[07:28:22] - Autosending finished units... [May 28 07:28:22 UTC]
[07:28:22] Trying to send all finished work units
[07:28:22] + No unsent completed units remaining.
[07:28:22] - Autosend completed
[10:22:25] ***** Got an Activate signal (2)
[10:22:25] Killing all core threads

Folding@Home Client Shutdown.
bollix@XXXXXXX:~/fah/smp$ ./fah6

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

8 cores detected


--- Opening Log file [May 28 10:22:59 UTC] 


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.24beta

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/bollix/fah/smp
Executable: ./fah6
Arguments: -smp 8 -verbosity 9 

[10:22:59] - Ask before connecting: No
[10:22:59] - User name: bollix47 (Team 39340)
[10:22:59] - User ID: XXXXXXXXXXXXXXXX
[10:22:59] - Machine ID: 1
[10:22:59] 
[10:23:00] Loaded queue successfully.
[10:23:00] - Autosending finished units... [May 28 10:23:00 UTC]
[10:23:00] Trying to send all finished work units
[10:23:00] + No unsent completed units remaining.
[10:23:00] - Autosend completed
[10:23:00] 
[10:23:00] + Processing work unit
[10:23:00] Core required: FahCore_a2.exe
[10:23:00] Core found.
[10:23:00] Working on queue slot 09 [May 28 10:23:00 UTC]
[10:23:00] + Working ...
[10:23:00] - Calling './mpiexec -np 8 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 09 -checkpoint 30 -verbose -lifeline 31185 -version 624'

[10:23:00] 
[10:23:00] *------------------------------*
[10:23:00] Folding@Home Gromacs SMP Core
[10:23:00] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[10:23:00] 
[10:23:00] Preparing to commence simulation
[10:23:00] - Ensuring status. Please wait.
[10:23:01] Called DecompressByteArray: compressed_data_size=4825702 data_size=24048925, decompressed_data_size=24048925 diff=0
[10:23:01] - Digital signature verified
[10:23:01] 
[10:23:01] Project: 2671 (Run - Digital signature verAssembly optimizations on if available.
[10:23:01] Entering Entering M.D.
[10:23:07] Using Gromacs checkpoints
[10:23:10]  M.D.
[10:23:16] Using Gromacs checkpoints
NNODES=8, MYRANK=0, HOSTNAME=XXXXXX
NNODES=8, MYRANK=4, HOSTNAME=XXXXXX
NNODES=8, MYRANK=1, HOSTNAME=XXXXXX
NNODES=8, MYRANK=2, HOSTNAME=XXXXXX
NNODES=8, MYRANK=3, HOSTNAME=XXXXXX
NNODES=8, MYRANK=5, HOSTNAME=XXXXXX
NNODES=8, MYRANK=7, HOSTNAME=XXXXXX
NNODES=8, MYRANK=6, HOSTNAME=XXXXXX
NODEID=2 argc=23
NODEID=3 argc=23
NODEID=0 argc=23
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_09.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=1 argc=23
NODEID=4 argc=23
NODEID=5 argc=23
NODEID=6 argc=23
NODEID=7 argc=23
Note: tpx file_version 48, software version 64

Reading checkpoint file work/wudata_09.cpt generated: Thu May 28 02:43:22 2009


NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 3D domain decomposition 2 x 2 x 2
starting mdrun '22908 system in water'
9750001 steps,  19500.0 ps (continuing from step 9657510,  19315.0 ps).
[10:23:19] d work/wudata_09.log
[10:23:20] Verified work/wudata_09.trr
[10:23:20] Verified work/wudata_09.xtc
[10:23:20] Verified work/wudata_09.edr
[10:23:20] Completed 157509 out of 250000 steps  (63%)
[10:27:45] Completed 160000 out of 250000 steps  (64%)



Re: Project: 2671 (Run 39, Clone 95, Gen 38) LINCS WARNING

Posted: Fri May 29, 2009 5:58 am
by susato
Good on ya for getting it up and running again. That's quite a rare event! Did you shut it down in any special way to give it a better chance of restarting?

Let us know if it runs into more trouble down the line.

Re: Project: 2671 (Run 39, Clone 95, Gen 38) LINCS WARNING

Posted: Fri May 29, 2009 7:12 am
by bollix47
I just used the ctrl-c to stop the client and restarted it. The WU did finish normally and uploaded fine. :wink:

Any explanation as to what it means? :egeek:

Re: Project: 2671 (Run 39, Clone 95, Gen 38) LINCS WARNING

Posted: Fri May 29, 2009 5:58 pm
by bruce
bollix47 wrote:Any explanation as to what it means? :egeek:
The LINCS WARNING message is from the Gromacs software and it basically is saying that the simulated atoms have moved into an impossible position. Whether that's because of some strange characteristics of the initial positions/velocities that the atoms had when the WU was created from Gen 37 or whether that's because the hardware made a calculation error is impossible to tell when you get the error the first time. The fact that the same thing didn't happen again tends to suggest that it was a hardware error, but that's not 100% certain, either.

The usual guidelines are: If it happens rarely, ignore it. If it happens frequently, start looking for problems like overclocking / heat / power stability / etc.