Page 1 of 1

Project: 2671 (Run 37, Clone 79, Gen 78) - NaN detected

Posted: Tue Sep 08, 2009 7:47 am
by SGirbau
Hello,
my Linux SMP client has loadad twice a WU from this same unit, and twice failed the same way:

Code: Select all

[07:34:33] 
[07:34:33] *------------------------------*
[07:34:33] Folding@Home Gromacs SMP Core
[07:34:33] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[07:34:33] 
[07:34:33] Preparing to commence simulation
[07:34:33] - Ensuring status. Please wait.
[07:34:42] - Looking at optimizations...
[07:34:42] - Working with standard loops on this execution.
[07:34:42] - Files status OK
[07:34:43] - Expanded 1513330 -> 24038109 (decompressed 1588.4 percent)
[07:34:43] Called DecompressByteArray: compressed_data_size=1513330 data_size=24038109, decompressed_data_size=24038109 diff=0
[07:34:43] - Digital signature verified
[07:34:43] 
[07:34:43] Project: 2671 (Run 37, Clone 79, Gen 78)
[07:34:43] 
[07:34:43] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=bartleby
NNODES=4, MYRANK=2, HOSTNAME=bartleby
NNODES=4, MYRANK=3, HOSTNAME=bartleby
NNODES=4, MYRANK=0, HOSTNAME=bartleby
NODEID=0 argc=20
NODEID=1 argc=20
NODEID=2 argc=20
NODEID=3 argc=20
Reading file work/wudata_06.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22908 system in water'
19750002 steps,  39500.0 ps (continuing from step 19500002,  39000.0 ps).

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: md.c, line: 2169

Fatal error:
NaN detected at step 19500002

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 2, will try to stop all the nodes
Halting parallel program mdrun on CPU 2 out of 4

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: md.c, line: 2169

Fatal error:
NaN detected at step 19500002

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 3, will try to stop all the nodes
Halting parallel program mdrun on CPU 3 out of 4

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: md.c, line: 2169

Fatal error:
NaN detected at step 19500002

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

gcq#0: Thanx for Using GROMACS - Have a Nice Day

application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: md.c, line: 2169

Fatal error:
NaN detected at step 19500002

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 1, will try to stop all the nodes
Halting parallel program mdrun on CPU 1 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day
Thank you

Re: Project: 2671 (Run 37, Clone 79, Gen 78) - NaN detected

Posted: Tue Sep 08, 2009 9:18 am
by parkut
This is a known bad work unit: viewtopic.php?f=19&t=11098&start=0