Page 1 of 1

Project: 2684 (Run 2, Clone 2, Gen 6)

Posted: Fri Jun 18, 2010 6:42 am
by DocJonz
Are these type of messages symptomatic of WU problems or machine stability issues (reading about Gromacs particle errors would suggest its the WU)??

Code: Select all

[[06:26:54] Project: 2684 (Run 2, Clone 2, Gen 6)
[06:26:54] 
[06:26:54] Assembly optimizations on if available.
[06:26:54] Entering M.D.
Starting 12 threads
NNODES=12, MYRANK=0, HOSTNAME=thread #0
NNODES=12, MYRANK=4, HOSTNAME=thread #4
NNODES=12, MYRANK=6, HOSTNAME=thread #6
NNODES=12, MYRANK=7, HOSTNAME=thread #7
NNODES=12, MYRANK=2, HOSTNAME=thread #2
NNODES=12, MYRANK=1, HOSTNAME=thread #1
NNODES=12, MYRANK=3, HOSTNAME=thread #3
NNODES=12, MYRANK=9, HOSTNAME=thread #9
NNODES=12, MYRANK=5, HOSTNAME=thread #5
NNODES=12, MYRANK=8, HOSTNAME=thread #8
Reading file work/wudata_01.tpr, VERSION 4.0.99_development_20090605 (single precision)
NNODES=12, MYRANK=10, HOSTNAME=thread #10
NNODES=12, MYRANK=11, HOSTNAME=thread #11
Making 1D domain decomposition 12 x 1 x 1
starting mdrun 'SINGLE VESICLE in water'
1750000 steps,   7000.0 ps (continuing from step 1500000,   6000.0 ps).
[06:27:06] Completed 0 out of 250000 steps  (0%)

step 1500001: Water molecule starting at atom 662814 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.

step 1500001: Water molecule starting at atom 101529 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.

-------------------------------------------------------
Program mdrun, VERSION 4.0.99-dev-20100610-b6a86-dirty
Source code file: /data0/FAHdev/a3_development/gromacs/src/mdlib/pme.c, line: 535

Fatal error:
2 particles communicated to PME node 10 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension xThis usually means that your system is not well equilibrated
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

[06:27:08] mdrun returned 255
[06:27:08] Going to send back what have done -- stepsTotalG=250000
[06:27:08] Work fraction=1499912.7500 steps=250000.
[06:27:12] logfile size=12575 infoLength=12575 edr=25 trr=1
[06:27:12] logfile size: 12575 info=12575 bed=25 hdr=1
[06:27:12] - Writing 13113 bytes of core data to disk...
[06:27:13]   ... Done.

Re: Project: 2684 (Run 2, Clone 2, Gen 6)

Posted: Fri Jun 18, 2010 7:08 am
by bruce
I saw the same problem reported elsewhere and the Pande Group, in cooperation with Gromacs.org, is working on a fix for it. If the WU is actually blowing up, it's a WU problem, but my guess is that it could be called a bug in the core rather than either a bad WU or a symptom of hardware instabilities. (I don't understand all of the details nor do I work with the code personally so maybe I'm wrong.).

It's probably related to how the domain is decomposed. Whatever is deciding that the decomposition should be 12x1x1 will have much better luck if it used 3x2x2.

In any case, thanks for reporting it.