Project: 2684 (Run 2, Clone 2, Gen 6)

Moderators: Site Moderators, FAHC Science Team

Post Reply
DocJonz
Posts: 248
Joined: Thu Dec 06, 2007 6:31 pm
Hardware configuration: Folding with: 4x RTX 4070Ti, 1x RTX 4080 Super
Location: United Kingdom
Contact:

Project: 2684 (Run 2, Clone 2, Gen 6)

Post by DocJonz »

Are these type of messages symptomatic of WU problems or machine stability issues (reading about Gromacs particle errors would suggest its the WU)??

Code: Select all

[[06:26:54] Project: 2684 (Run 2, Clone 2, Gen 6)
[06:26:54] 
[06:26:54] Assembly optimizations on if available.
[06:26:54] Entering M.D.
Starting 12 threads
NNODES=12, MYRANK=0, HOSTNAME=thread #0
NNODES=12, MYRANK=4, HOSTNAME=thread #4
NNODES=12, MYRANK=6, HOSTNAME=thread #6
NNODES=12, MYRANK=7, HOSTNAME=thread #7
NNODES=12, MYRANK=2, HOSTNAME=thread #2
NNODES=12, MYRANK=1, HOSTNAME=thread #1
NNODES=12, MYRANK=3, HOSTNAME=thread #3
NNODES=12, MYRANK=9, HOSTNAME=thread #9
NNODES=12, MYRANK=5, HOSTNAME=thread #5
NNODES=12, MYRANK=8, HOSTNAME=thread #8
Reading file work/wudata_01.tpr, VERSION 4.0.99_development_20090605 (single precision)
NNODES=12, MYRANK=10, HOSTNAME=thread #10
NNODES=12, MYRANK=11, HOSTNAME=thread #11
Making 1D domain decomposition 12 x 1 x 1
starting mdrun 'SINGLE VESICLE in water'
1750000 steps,   7000.0 ps (continuing from step 1500000,   6000.0 ps).
[06:27:06] Completed 0 out of 250000 steps  (0%)

step 1500001: Water molecule starting at atom 662814 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.

step 1500001: Water molecule starting at atom 101529 can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.

-------------------------------------------------------
Program mdrun, VERSION 4.0.99-dev-20100610-b6a86-dirty
Source code file: /data0/FAHdev/a3_development/gromacs/src/mdlib/pme.c, line: 535

Fatal error:
2 particles communicated to PME node 10 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension xThis usually means that your system is not well equilibrated
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

[06:27:08] mdrun returned 255
[06:27:08] Going to send back what have done -- stepsTotalG=250000
[06:27:08] Work fraction=1499912.7500 steps=250000.
[06:27:12] logfile size=12575 infoLength=12575 edr=25 trr=1
[06:27:12] logfile size: 12575 info=12575 bed=25 hdr=1
[06:27:12] - Writing 13113 bytes of core data to disk...
[06:27:13]   ... Done.
Folding Stats (HFM.NET): DocJonz Folding Farm Stats
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2684 (Run 2, Clone 2, Gen 6)

Post by bruce »

I saw the same problem reported elsewhere and the Pande Group, in cooperation with Gromacs.org, is working on a fix for it. If the WU is actually blowing up, it's a WU problem, but my guess is that it could be called a bug in the core rather than either a bad WU or a symptom of hardware instabilities. (I don't understand all of the details nor do I work with the code personally so maybe I'm wrong.).

It's probably related to how the domain is decomposed. Whatever is deciding that the decomposition should be 12x1x1 will have much better luck if it used 3x2x2.

In any case, thanks for reporting it.
Post Reply