Project: 2676 (Run 0, Clone 14, Gen 77) seg fault

Moderators: Site Moderators, FAHC Science Team

Post Reply
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Project: 2676 (Run 0, Clone 14, Gen 77) seg fault

Post by alpha754293 »

Code: Select all

[21:27:03] 
[21:27:03] *------------------------------*
[21:27:03] Folding@Home Gromacs SMP Core
[21:27:03] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[21:27:03] 
[21:27:03] Preparing to commence simulation
[21:27:03] - Ensuring status. Please wait.
[21:27:13] - Looking at optimizations...
[21:27:13] - Working with standard loops on this execution.
[21:27:13] - Files status OK
[21:27:14] - Expanded 4858048 -> 24076137 (decompressed 495.5 percent)
[21:27:14] Called DecompressByteArray: compressed_data_size=4858048 data_size=24076137, decompressed_data_size=24076137 diff=0
[21:27:14] - Digital signature verified
[21:27:14] 
[21:27:14] Project: 2676 (Run 0, Clone 14, Gen 77)
[21:27:14] 
[21:27:14] Entering M.D.
[21:27:20] Using Gromacs checkpoints
NNODES=4, MYRANK=2, HOSTNAME=computenode
NNODES=4, MYRANK=0, HOSTNAME=computenode
NNODES=4, MYRANK=3, HOSTNAME=computenode
NNODES=4, MYRANK=1, HOSTNAME=computenode
NODEID=0 argc=23
NODEID=1 argc=23
NODEID=2 argc=23
NODEID=3 argc=23
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_03.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 64

Reading checkpoint file work/wudata_03.cpt generated: Tue May  5 17:25:22 2009


NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '23109 system in water'
19500002 steps,  39000.0 ps (continuing from step 19250010,  38500.0 ps).
[21:27:24] Resuming from checkpoint
[21:27:24] Verified work/wudata_03.log
[21:27:24] Verified work/wudata_03.trr
[21:27:24] Verified work/wudata_03.xtc
[21:27:24] Verified work/wudata_03.edr
[21:27:25] Completed 8 out of 250000 steps  (0%)
[21:41:14] Completed 2500 out of 250000 steps  (1%)
[21:55:02] Completed 5000 out of 250000 steps  (2%)
[22:08:55] Completed 7500 out of 250000 steps  (3%)
[22:22:45] Completed 10000 out of 250000 steps  (4%)
[22:36:33] Completed 12500 out of 250000 steps  (5%)
[22:50:22] Completed 15000 out of 250000 steps  (6%)
[23:04:10] Completed 17500 out of 250000 steps  (7%)
[23:17:32] Completed 20000 out of 250000 steps  (8%)
[23:31:21] Completed 22500 out of 250000 steps  (9%)
[23:45:11] Completed 25000 out of 250000 steps  (10%)
[23:59:00] Completed 27500 out of 250000 steps  (11%)
[00:08:35] Completed 30000 out of 250000 steps  (12%)
[00:17:26] Completed 32500 out of 250000 steps  (13%)
[00:26:18] Completed 35000 out of 250000 steps  (14%)
[00:35:08] Completed 37500 out of 250000 steps  (15%)
[00:43:59] Completed 40000 out of 250000 steps  (16%)
[00:52:50] Completed 42500 out of 250000 steps  (17%)
[01:01:41] Completed 45000 out of 250000 steps  (18%)
[01:10:32] Completed 47500 out of 250000 steps  (19%)
[01:19:24] Completed 50000 out of 250000 steps  (20%)
[01:28:15] Completed 52500 out of 250000 steps  (21%)
[01:37:06] Completed 55000 out of 250000 steps  (22%)
[01:45:57] Completed 57500 out of 250000 steps  (23%)
[01:54:48] Completed 60000 out of 250000 steps  (24%)
[02:03:39] Completed 62500 out of 250000 steps  (25%)
[02:12:30] Completed 65000 out of 250000 steps  (26%)
[02:21:21] Completed 67500 out of 250000 steps  (27%)
[02:30:11] Completed 70000 out of 250000 steps  (28%)
[02:39:02] Completed 72500 out of 250000 steps  (29%)
[02:47:53] Completed 75000 out of 250000 steps  (30%)
[02:56:44] Completed 77500 out of 250000 steps  (31%)
[03:05:35] Completed 80000 out of 250000 steps  (32%)
[03:14:25] Completed 82500 out of 250000 steps  (33%)
[03:23:16] Completed 85000 out of 250000 steps  (34%)
[03:27:03] - Autosending finished units... [May 6 03:27:03 UTC]
[03:27:03] Trying to send all finished work units
[03:27:03] + No unsent completed units remaining.
[03:27:03] - Autosend completed
[03:32:07] Completed 87500 out of 250000 steps  (35%)
[03:40:57] Completed 90000 out of 250000 steps  (36%)
[03:49:48] Completed 92500 out of 250000 steps  (37%)
[03:58:38] Completed 95000 out of 250000 steps  (38%)
[04:07:29] Completed 97500 out of 250000 steps  (39%)
[04:16:20] Completed 100000 out of 250000 steps  (40%)
[04:25:10] Completed 102500 out of 250000 steps  (41%)
[04:34:01] Completed 105000 out of 250000 steps  (42%)
[04:42:52] Completed 107500 out of 250000 steps  (43%)
[04:51:43] Completed 110000 out of 250000 steps  (44%)
[05:00:34] Completed 112500 out of 250000 steps  (45%)
[05:09:24] Completed 115000 out of 250000 steps  (46%)
[05:18:15] Completed 117500 out of 250000 steps  (47%)
[05:27:06] Completed 120000 out of 250000 steps  (48%)
[05:35:56] Completed 122500 out of 250000 steps  (49%)
[05:44:48] Completed 125000 out of 250000 steps  (50%)
[05:53:38] Completed 127500 out of 250000 steps  (51%)
[06:02:29] Completed 130000 out of 250000 steps  (52%)
[06:11:20] Completed 132500 out of 250000 steps  (53%)
[06:20:11] Completed 135000 out of 250000 steps  (54%)
[06:29:02] Completed 137500 out of 250000 steps  (55%)
[06:37:52] Completed 140000 out of 250000 steps  (56%)
[06:46:43] Completed 142500 out of 250000 steps  (57%)
[06:55:34] Completed 145000 out of 250000 steps  (58%)
[07:04:24] Completed 147500 out of 250000 steps  (59%)
[07:13:16] Completed 150000 out of 250000 steps  (60%)
[07:22:06] Completed 152500 out of 250000 steps  (61%)
[07:30:57] Completed 155000 out of 250000 steps  (62%)
[07:39:48] Completed 157500 out of 250000 steps  (63%)
[07:48:39] Completed 160000 out of 250000 steps  (64%)
[07:57:29] Completed 162500 out of 250000 steps  (65%)
[08:06:20] Completed 165000 out of 250000 steps  (66%)
[08:15:10] Completed 167500 out of 250000 steps  (67%)
[08:24:01] Completed 170000 out of 250000 steps  (68%)
[08:32:51] Completed 172500 out of 250000 steps  (69%)
[08:41:41] Completed 175000 out of 250000 steps  (70%)
[08:50:32] Completed 177500 out of 250000 steps  (71%)
[08:59:23] Completed 180000 out of 250000 steps  (72%)
[09:08:15] Completed 182500 out of 250000 steps  (73%)
[09:17:06] Completed 185000 out of 250000 steps  (74%)
[09:25:57] Completed 187500 out of 250000 steps  (75%)
[09:27:03] - Autosending finished units... [May 6 09:27:03 UTC]
[09:27:03] Trying to send all finished work units
[09:27:03] + No unsent completed units remaining.
[09:27:03] - Autosend completed
[09:34:48] Completed 190000 out of 250000 steps  (76%)
[09:43:40] Completed 192500 out of 250000 steps  (77%)
[09:52:33] Completed 195000 out of 250000 steps  (78%)
[10:01:27] Completed 197500 out of 250000 steps  (79%)
[10:10:20] Completed 200000 out of 250000 steps  (80%)
[10:19:14] Completed 202500 out of 250000 steps  (81%)
[10:28:07] Completed 205000 out of 250000 steps  (82%)
[10:37:00] Completed 207500 out of 250000 steps  (83%)
[10:45:53] Completed 210000 out of 250000 steps  (84%)
[10:54:44] Completed 212500 out of 250000 steps  (85%)
[11:03:36] Completed 215000 out of 250000 steps  (86%)
[11:12:27] Completed 217500 out of 250000 steps  (87%)
[11:21:18] Completed 220000 out of 250000 steps  (88%)

t = 38944.182 ps: Water molecule starting at atom 65176 can not be settled.
Check for bad contacts and/or reduce the timestep.
[11:28:43] 
[11:28:43] Folding@home Core Shutdown: INTERRUPTED
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[cli_1]: aborting job:
Fatal error in MPI_Sendrecv: Error message texts are not available
[cli_2]: aborting job:
Fatal error in MPI_Sendrecv: Error message texts are not available
[0]0:Return code = 102
[0]1:Return code = 1
[0]2:Return code = 1
[0]3:Return code = 0, signaled with Segmentation fault
[11:28:47] CoreStatus = 66 (102)
[11:28:47] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[11:28:47] Killing all core threads

Folding@Home Client Shutdown.
restarting client...

*edit*
restart failed.

Code: Select all

[15:46:11] 
[15:46:11] *------------------------------*
[15:46:11] Folding@Home Gromacs SMP Core
[15:46:11] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[15:46:11] 
[15:46:11] Preparing to commence simulation
[15:46:11] - Ensuring status. Please wait.
[15:46:12] Called DecompressByteArray: compressed_data_size=4858048 data_size=24076137, decompressed_data_size=24076137 diff=0
[15:46:12] - Digital signature verified
[15:46:12] 
[15:46:12] Project: 2676 (Run 0, Clone 14, Gen 77)
[15:46:12] 
[15:46:12] Assembly optimizations on if available.
[15:46:12] Entering M.D.
[15:46:18] Using Gromacs checkpoints
[15:46:21] 
[15:46:22] Entering M.D.
[15:46:28] Using Gromacs checkpoints
NNODES=4, MYRANK=3, HOSTNAME=computenode
NNODES=4, MYRANK=0, HOSTNAME=computenode
NNODES=4, MYRANK=2, HOSTNAME=computenode
NNODES=4, MYRANK=1, HOSTNAME=computenode
NODEID=0 argc=23
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


NODEID=2 argc=23
                                :-)  mdrun  (-:

NODEID=3 argc=23
Reading file work/wudata_03.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=1 argc=23
Note: tpx file_version 48, software version 64

Reading checkpoint file work/wudata_03.cpt generated: Wed May  6 07:21:20 2009


NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '23109 system in water'
19500002 steps,  39000.0 ps (continuing from step 19470010,  38940.0 ps).

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: md.c, line: 910

Fatal error:
Checkpoint error on step 0

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day
: No such process
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[15:46:31] _03.log has changed since last checkpoint
[cli_1]: aborting job:
Fatal error in MPI_Sendrecv: Error message texts are not available
x2

WU lost. Reassigned. Restarting from scratch. :(

What was wrong with the WU to begin with?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2676 (Run 0, Clone 14, Gen 77) seg fault

Post by bruce »

alpha754293 wrote:What was wrong with the WU to begin with?
Nothing. That WU has been successfully completed by other people.
Post Reply