Bad WU:
Reading file work/wudata_01.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: domdec.c, line: 6276
Fatal error:
There is no domain decomposition for 4 nodes that is compatible with the given box and a minimum cell size of 13.2024 nm
Change the number of nodes or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4
gcq#0: Thanx for Using GROMACS - Have a Nice Day
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
[04:16:47] CoreStatus = FF (255)
[04:16:47] Sending work to server
[04:16:47] Project: 2669 (Run 6, Clone 194, Gen 186)
[04:16:47] - Error: Could not get length of results file work/wuresults_01.dat
[04:16:47] - Error: Could not read unit 01 file. Removing from queue.
[04:16:47] Trying to send all finished work units
[04:16:47] + No unsent completed units remaining.
[04:16:47] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[04:16:47] Cleaning up work directory
[04:16:47] ***** Got a SIGTERM signal (15)
[04:16:47] Killing all core threads
Folding@Home Client Shutdown.
[04:16:47] + Attempting to get work packet
[04:16:47] - Will indicate memory of 2001 MB
Project: 2669 (Run 6, Clone 194, Gen 186)
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 126
- Joined: Sat Aug 02, 2008 3:08 am
Re: Project: 2669 (Run 6, Clone 194, Gen 186)
I've had the same exact problem
Please retire this bad WU
I had to trash the work folder client and unit info and restart with Machine id 2 to get somethingt else...
Plus 4+ hours of wasted electricity
Please retire this bad WU
I had to trash the work folder client and unit info and restart with Machine id 2 to get somethingt else...
Plus 4+ hours of wasted electricity
Code: Select all
[21:53:47] Completed 250000 out of 250000 steps (100%)
Writing final coordinates.
Average load imbalance: 2.0 %
Part of the total run time spent waiting due to load imbalance: 1.1 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Z 0 %
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 107822.620 107822.620 100.0
1d05h57:02
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 119.789 5.037 0.401 59.901
gcq#0: Thanx for Using GROMACS - Have a Nice Day
[21:53:50] DynamicWrapper: Finished Work Unit: sleep=10000
[21:54:00]
[21:54:00] Finished Work Unit:
[21:54:00] - Reading up to 21124656 from "work/wudata_08.trr": Read 21124656
[21:54:00] trr file hash check passed.
[21:54:00] - Reading up to 4526196 from "work/wudata_08.xtc": Read 4526196
[21:54:00] xtc file hash check passed.
[21:54:00] edr file hash check passed.
[21:54:00] logfile size: 194345
[21:54:00] Leaving Run
[21:54:02] - Writing 25994981 bytes of core data to disk...
[21:54:05] ... Done.
Attempting to use an MPI routine after finalizing MPICH
[21:54:06] - Shutting down core
[21:54:06]
[21:54:06] Folding@home Core Shutdown: FINISHED_UNIT
[21:57:23] CoreStatus = 64 (100)
[21:57:23] Unit 8 finished with 58 percent of time to deadline remaining.
[21:57:23] Updated performance fraction: 0.578036
[21:57:23] Sending work to server
[21:57:23] Project: 2669 (Run 14, Clone 106, Gen 167)
[21:57:23] + Attempting to send results [October 2 21:57:23 UTC]
[21:57:23] - Reading file work/wuresults_08.dat from core
[21:57:23] (Read 25994981 bytes from disk)
[21:57:23] Connecting to http://171.64.65.56:8080/
[22:01:55] Posted data.
[22:01:55] Initial: 0000; - Uploaded at ~91 kB/s
[22:02:00] - Averaged speed for that direction ~90 kB/s
[22:02:00] + Results successfully sent
[22:02:00] Thank you for your contribution to Folding@Home.
[22:02:00] + Number of Units Completed: 8
[22:02:00] - Warning: Could not delete all work unit files (8): Core file absent
[22:02:00] Trying to send all finished work units
[22:02:00] + No unsent completed units remaining.
[22:02:00] - Preparing to get new work unit...
[22:02:00] + Attempting to get work packet
[22:02:00] - Will indicate memory of 1666 MB
[22:02:00] - Connecting to assignment server
[22:02:00] Connecting to http://assign.stanford.edu:8080/
[22:02:01] Posted data.
[22:02:01] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[22:02:01] + News From Folding@Home: Welcome to Folding@Home
[22:02:01] Loaded queue successfully.
[22:02:01] Connecting to http://171.64.65.56:8080/
[22:02:06] Posted data.
[22:02:06] Initial: 0000; - Receiving payload (expected size: 4841078)
[22:02:18] - Downloaded at ~393 kB/s
[22:02:18] - Averaged speed for that direction ~412 kB/s
[22:02:18] + Received work.
[22:02:18] Trying to send all finished work units
[22:02:18] + No unsent completed units remaining.
[22:02:18] + Closed connections
[22:02:18]
[22:02:18] + Processing work unit
[22:02:18] At least 4 processors must be requested.Core required: FahCore_a2.exe
[22:02:18] Core found.
[22:02:18] Working on queue slot 09 [October 2 22:02:18 UTC]
[22:02:18] + Working ...
[22:02:18] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 09 -checkpoint 10 -verbose -lifeline 3729 -version 624'
[22:02:18]
[22:02:18] *------------------------------*
[22:02:18] Folding@Home Gromacs SMP Core
[22:02:18] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[22:02:18]
[22:02:18] Preparing to commence simulation
[22:02:18] - Ensuring status. Please wait.
[22:02:19] Called DecompressByteArray: compressed_data_size=4840566 data_size=23982741, decompressed_data_size=23982741 diff=0
[22:02:19] - Digital signature verified
[22:02:19]
[22:02:19] Project: 2669 (Run 6, Clone 194, Gen 186)
[22:02:19]
[22:02:19] Assembly optimizations on if available.
[22:02:19] Entering M.D.
[22:02:29] un 6, Clone 194, Gen 186)
[22:02:29]
[22:02:29] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NODEID=0 argc=20
NODEID=1 argc=20
Reading file work/wudata_09.tpr, VERSION 3.3.99_development_20070618 (single precision)
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NODEID=2 argc=20
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NODEID=3 argc=20
Note: tpx file_version 48, software version 68
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: domdec.c, line: 6276
Fatal error:
There is no domain decomposition for 4 nodes that is compatible with the given box and a minimum cell size of 13.2024 nm
Change the number of nodes or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
gcq#0: Thanx for Using GROMACS - Have a Nice Day
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3
[22:02:45] CoreStatus = FF (255)
[22:02:45] Sending work to server
[22:02:45] Project: 2669 (Run 6, Clone 194, Gen 186)
[22:02:45] - Error: Could not get length of results file work/wuresults_09.dat
[22:02:45] - Error: Could not read unit 09 file. Removing from queue.
[22:02:45] Trying to send all finished work units
[22:02:45] + No unsent completed units remaining.
[22:02:45] - Preparing to get new work unit...
[22:02:45] + Attempting to get work packet
[22:02:45] - Will indicate memory of 1666 MB
[22:02:45] - Connecting to assignment server
[22:02:45] Connecting to http://assign.stanford.edu:8080/
[22:02:46] Posted data.
[22:02:46] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[22:02:46] + News From Folding@Home: Welcome to Folding@Home
[22:02:46] Loaded queue successfully.
[22:02:46] Connecting to http://171.64.65.56:8080/
[22:02:51] Posted data.
[22:02:51] Initial: 0000; - Receiving payload (expected size: 4841078)
[22:03:04] - Downloaded at ~363 kB/s
[22:03:04] - Averaged speed for that direction ~402 kB/s
[22:03:04] + Received work.
[22:03:04] Trying to send all finished work units
[22:03:04] + No unsent completed units remaining.
[22:03:04] + Closed connections
[22:03:09]
[22:03:09] + Processing work unit
[22:03:09] At least 4 processors must be requested.Core required: FahCore_a2.exe
[22:03:09] Core found.
[22:03:09] Working on queue slot 00 [October 2 22:03:09 UTC]
[22:03:09] + Working ...
[22:03:09] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 00 -checkpoint 10 -verbose -lifeline 3729 -version 624'
[22:03:09]
[22:03:09] *------------------------------*
[22:03:09] Folding@Home Gromacs SMP Core
[22:03:09] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[22:03:09]
[22:03:09] Preparing to commence simulation
[22:03:09] - Ensuring status. Please wait.
[22:03:18] - Looking at optimizations...
[22:03:18] - Working with standard loops on this execution.
[22:03:18] - Files status OK
[22:03:20] - Expanded 4840566 -> 23982741 (decompressed 495.4 percent)
[22:03:20] Called DecompressByteArray: compressed_data_size=4840566 data_size=23982741, decompressed_data_size=23982741 diff=0
[22:03:21] - Digital signature verified
[22:03:21]
[22:03:21] Project: 2669 (Run 6, Clone 194, Gen 186)
[22:03:21]
[22:03:21] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NODEID=0 argc=20
NODEID=1 argc=20
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NODEID=2 argc=20
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NODEID=3 argc=20
Reading file work/wudata_00.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: domdec.c, line: 6276
Fatal error:
There is no domain decomposition for 4 nodes that is compatible with the given box and a minimum cell size of 13.2024 nm
Change the number of nodes or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3
gcq#0: Thanx for Using GROMACS - Have a Nice Day
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
[22:03:37] CoreStatus = FF (255)
[22:03:37] Sending work to server
[22:03:37] Project: 2669 (Run 6, Clone 194, Gen 186)
[22:03:37] - Error: Could not get length of results file work/wuresults_00.dat
[22:03:37] - Error: Could not read unit 00 file. Removing from queue.
[22:03:37] Trying to send all finished work units
[22:03:37] + No unsent completed units remaining.
[22:03:37] - Preparing to get new work unit...
[22:03:37] + Attempting to get work packet
[22:03:37] - Will indicate memory of 1666 MB
[22:03:37] - Connecting to assignment server
[22:03:37] Connecting to http://assign.stanford.edu:8080/
[22:03:37] Posted data.
[22:03:37] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[22:03:37] + News From Folding@Home: Welcome to Folding@Home
[22:03:38] Loaded queue successfully.
[22:03:38] Connecting to http://171.64.65.56:8080/
[22:03:44] Posted data.
[22:03:44] Initial: 0000; - Receiving payload (expected size: 4841078)
[22:03:59] - Downloaded at ~315 kB/s
[22:03:59] - Averaged speed for that direction ~385 kB/s
[22:03:59] + Received work.
[22:03:59] Trying to send all finished work units
[22:03:59] + No unsent completed units remaining.
[22:03:59] + Closed connections
[22:04:04]
[22:04:04] + Processing work unit
[22:04:04] At least 4 processors must be requested.Core required: FahCore_a2.exe
[22:04:04] Core found.
[22:04:04] Working on queue slot 01 [October 2 22:04:04 UTC]
[22:04:04] + Working ...
[22:04:04] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 10 -verbose -lifeline 3729 -version 624'
[22:04:04]
[22:04:04] *------------------------------*
[22:04:04] Folding@Home Gromacs SMP Core
[22:04:04] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[22:04:04]
[22:04:04] Preparing to commence simulation
[22:04:04] - Ensuring status. Please wait.
[22:04:14] - Looking at optimizations...
[22:04:14] - Working with standard loops on this execution.
[22:04:14] - Files status OK
[22:04:16] - Expanded 4840566 -> 23982741 (decompressed 495.4 percent)
[22:04:16] Called DecompressByteArray: compressed_data_size=4840566 data_size=23982741, decompressed_data_size=23982741 diff=0
[22:04:16] - Digital signature verified
[22:04:16]
[22:04:16] Project: 2669 (Run 6, Clone 194, Gen 186)
[22:04:16]
[22:04:16] Entering M.D.
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NODEID=0 argc=20
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NODEID=1 argc=20
Reading file work/wudata_01.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=2 argc=20
NODEID=3 argc=20
Note: tpx file_version 48, software version 68
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: domdec.c, line: 6276
Fatal error:
There is no domain decomposition for 4 nodes that is compatible with the given box and a minimum cell size of 13.2024 nm
Change the number of nodes or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3
gcq#0: Thanx for Using GROMACS - Have a Nice Day
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[22:04:32] CoreStatus = FF (255)
[22:04:32] Sending work to server
[22:04:32] Project: 2669 (Run 6, Clone 194, Gen 186)
[22:04:32] - Error: Could not get length of results file work/wuresults_01.dat
[22:04:32] - Error: Could not read unit 01 file. Removing from queue.
[22:04:32] Trying to send all finished work units
[22:04:32] + No unsent completed units remaining.
[22:04:32] - Preparing to get new work unit...
[22:04:32] + Attempting to get work packet
[22:04:32] - Will indicate memory of 1666 MB
[22:04:32] - Connecting to assignment server
[22:04:32] Connecting to http://assign.stanford.edu:8080/
[22:04:33] Posted data.
[22:04:33] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[22:04:33] + News From Folding@Home: Welcome to Folding@Home
[22:04:33] Loaded queue successfully.
[22:04:33] Connecting to http://171.64.65.56:8080/
[22:04:38] Posted data.
[22:04:38] Initial: 0000; - Receiving payload (expected size: 4841078)
[22:04:51] - Downloaded at ~363 kB/s
[22:04:51] - Averaged speed for that direction ~380 kB/s
[22:04:51] + Received work.
[22:04:51] Trying to send all finished work units
[22:04:51] + No unsent completed units remaining.
[22:04:51] + Closed connections
[22:04:56]
[22:04:56] + Processing work unit
[22:04:56] At least 4 processors must be requested.Core required: FahCore_a2.exe
[22:04:56] Core found.
[22:04:56] Working on queue slot 02 [October 2 22:04:56 UTC]
[22:04:56] + Working ...
[22:04:56] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 10 -verbose -lifeline 3729 -version 624'
[22:04:56]
[22:04:56] *------------------------------*
[22:04:56] Folding@Home Gromacs SMP Core
[22:04:56] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[22:04:56]
[22:04:56] Preparing to commence simulation
[22:04:56] - Ensuring status. Please wait.
[22:05:05] - Looking at optimizations...
[22:05:05] - Working with standard loops on this execution.
[22:05:05] - Files status OK
[22:05:07] - Expanded 4840566 -> 23982741 (decompressed 495.4 percent)
[22:05:07] Called DecompressByteArray: compressed_data_size=4840566 data_size=23982741, decompressed_data_size=23982741 diff=0
[22:05:08] - Digital signature verified
[22:05:08]
[22:05:08] Project: 2669 (Run 6, Clone 194, Gen 186)
[22:05:08]
[22:05:08] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NODEID=0 argc=20
NODEID=2 argc=20
NODEID=3 argc=20
Reading file work/wudata_02.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=1 argc=20
Note: tpx file_version 48, software version 68
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
[22:05:20]
[22:05:20] Folding@home Core Shutdown: INTERRUPTED
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[22:05:24] CoreStatus = 66 (102)
[22:05:24] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[22:05:24] Killing all core threads
Folding@Home Client Shutdown.
cormiem@ubuntu:~/folding$ ./fah6 -smp -verbosity 9
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
2 cores detected
--- Opening Log file [October 3 02:19:08 UTC]
# Linux SMP Console Edition ###################################################
###############################################################################
Folding@Home Client Version 6.24beta
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/cormiem/folding
Executable: ./fah6
Arguments: -smp -verbosity 9
[02:19:08] - Ask before connecting: No
[02:19:08] - User name: Martin_T7250 (Team 96377)
[02:19:08] - User ID: 2FC465931CC94671
[02:19:08] - Machine ID: 1
[02:19:08]
[02:19:08] Loaded queue successfully.
[02:19:08]
[02:19:08] + Processing work unit
[02:19:08] At least 4 processors must be requested.Core required: FahCore_a2.exe
[02:19:08] Core found.
[02:19:08] - Autosending finished units... [October 3 02:19:08 UTC]
[02:19:08] Trying to send all finished work units
[02:19:08] + No unsent completed units remaining.
[02:19:08] - Autosend completed
[02:19:08] Working on queue slot 02 [October 3 02:19:08 UTC]
[02:19:08] + Working ...
[02:19:08] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 10 -verbose -lifeline 4656 -version 624'
[02:19:08]
[02:19:08] *------------------------------*
[02:19:08] Folding@Home Gromacs SMP Core
[02:19:08] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[02:19:08]
[02:19:08] Preparing to commence simulation
[02:19:08] - Ensuring status. Please wait.
[02:19:09] Called DecompressByteArray: compressed_data_size=4840566 data_size=23982741, decompressed_data_size=23982741 diff=0
[02:19:10] - Digital signature verified
[02:19:10]
[02:19:10] Project: 2669 (Run 6, Clone 194, Gen 186)
[02:19:10]
[02:19:10] Assembly optimizations on if available.
[02:19:10] Entering M.D.
[02:19:20] un 6, Clone 194, Gen 186)
[02:19:20]
[02:19:20] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NODEID=0 argc=20
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NODEID=2 argc=20
NODEID=3 argc=20
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NODEID=1 argc=20
Reading file work/wudata_02.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: domdec.c, line: 6276
Fatal error:
There is no domain decomposition for 4 nodes that is compatible with the given box and a minimum cell size of 13.2024 nm
Change the number of nodes or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4
gcq#0: Thanx for Using GROMACS - Have a Nice Day
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3
[02:19:36] CoreStatus = FF (255)
[02:19:36] Sending work to server
[02:19:36] Project: 2669 (Run 6, Clone 194, Gen 186)
[02:19:36] - Error: Could not get length of results file work/wuresults_02.dat
[02:19:36] - Error: Could not read unit 02 file. Removing from queue.
[02:19:36] Trying to send all finished work units
[02:19:36] + No unsent completed units remaining.
[02:19:36] - Preparing to get new work unit...
[02:19:36] + Attempting to get work packet
[02:19:36] - Will indicate memory of 1666 MB
[02:19:36] - Connecting to assignment server
[02:19:36] Connecting to http://assign.stanford.edu:8080/
[02:19:36] Posted data.
[02:19:36] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[02:19:36] + News From Folding@Home: Welcome to Folding@Home
[02:19:36] Loaded queue successfully.
[02:19:36] Connecting to http://171.64.65.56:8080/
[02:19:42] Posted data.
[02:19:42] Initial: 0000; - Receiving payload (expected size: 4841078)
[02:20:04] - Downloaded at ~214 kB/s
[02:20:04] - Averaged speed for that direction ~347 kB/s
[02:20:04] + Received work.
[02:20:04] Trying to send all finished work units
[02:20:04] + No unsent completed units remaining.
[02:20:04] + Closed connections
[02:20:09]
[02:20:09] + Processing work unit
[02:20:09] At least 4 processors must be requested.Core required: FahCore_a2.exe
[02:20:09] Core found.
[02:20:09] Working on queue slot 03 [October 3 02:20:09 UTC]
[02:20:09] + Working ...
[02:20:09] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 03 -checkpoint 10 -verbose -lifeline 4656 -version 624'
[02:20:09]
[02:20:09] *------------------------------*
[02:20:09] Folding@Home Gromacs SMP Core
[02:20:09] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[02:20:09]
[02:20:09] Preparing to commence simulation
[02:20:09] - Ensuring status. Please wait.
[02:20:18] - Looking at optimizations...
[02:20:18] - Working with standard loops on this execution.
[02:20:18] - Files status OK
[02:20:20] - Expanded 4840566 -> 23982741 (decompressed 495.4 percent)
[02:20:20] Called DecompressByteArray: compressed_data_size=4840566 data_size=23982741, decompressed_data_size=23982741 diff=0
[02:20:21] - Digital signature verified
[02:20:21]
[02:20:21] Project: 2669 (Run 6, Clone 194, Gen 186)
[02:20:21]
[02:20:21] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NODEID=0 argc=20
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NODEID=1 argc=20
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NODEID=2 argc=20
NODEID=3 argc=20
Reading file work/wudata_03.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: domdec.c, line: 6276
Fatal error:
There is no domain decomposition for 4 nodes that is compatible with the given box and a minimum cell size of 13.2024 nm
Change the number of nodes or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4
gcq#0: Thanx for Using GROMACS - Have a Nice Day
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3
[02:20:37] CoreStatus = FF (255)
[02:20:37] Sending work to server
[02:20:37] Project: 2669 (Run 6, Clone 194, Gen 186)
[02:20:37] - Error: Could not get length of results file work/wuresults_03.dat
[02:20:37] - Error: Could not read unit 03 file. Removing from queue.
[02:20:37] Trying to send all finished work units
[02:20:37] + No unsent completed units remaining.
[02:20:37] - Preparing to get new work unit...
[02:20:37] + Attempting to get work packet
[02:20:37] - Will indicate memory of 1666 MB
[02:20:37] - Connecting to assignment server
[02:20:37] Connecting to http://assign.stanford.edu:8080/
[02:20:37] Posted data.
[02:20:37] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[02:20:37] + News From Folding@Home: Welcome to Folding@Home
[02:20:37] Loaded queue successfully.
[02:20:37] Connecting to http://171.64.65.56:8080/
[02:20:42] Posted data.
[02:20:42] Initial: 0000; - Receiving payload (expected size: 4841078)
[02:20:52] - Downloaded at ~472 kB/s
[02:20:52] - Averaged speed for that direction ~372 kB/s
[02:20:52] + Received work.
[02:20:52] Trying to send all finished work units
[02:20:52] + No unsent completed units remaining.
[02:20:52] + Closed connections
[02:20:57]
[02:20:57] + Processing work unit
[02:20:57] At least 4 processors must be requested.Core required: FahCore_a2.exe
[02:20:57] Core found.
[02:20:57] Working on queue slot 04 [October 3 02:20:57 UTC]
[02:20:57] + Working ...
[02:20:57] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 04 -checkpoint 10 -verbose -lifeline 4656 -version 624'
[02:20:57]
[02:20:57] *------------------------------*
[02:20:57] Folding@Home Gromacs SMP Core
[02:20:57] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[02:20:57]
[02:20:57] Preparing to commence simulation
[02:20:57] - Ensuring status. Please wait.
[02:21:07] - Looking at optimizations...
[02:21:07] - Working with standard loops on this execution.
[02:21:07] - Files status OK
[02:21:09] - Expanded 4840566 -> 23982741 (decompressed 495.4 percent)
[02:21:09] Called DecompressByteArray: compressed_data_size=4840566 data_size=23982741, decompressed_data_size=23982741 diff=0
[02:21:09] - Digital signature verified
[02:21:09]
[02:21:09] Project: 2669 (Run 6, Clone 194, Gen 186)
[02:21:09]
[02:21:09] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NODEID=0 argc=20
NODEID=1 argc=20
NODEID=2 argc=20
NODEID=3 argc=20
Reading file work/wudata_04.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68
[02:21:21]
[02:21:21] Folding@home Core Shutdown: INTERRUPTED
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3
[02:21:25] CoreStatus = 66 (102)
[02:21:25] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[02:21:25] Killing all core threads
Folding@Home Client Shutdown.
cormiem@ubuntu:~/folding$ ./fah6 -smp -verbosity 9 -config
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
Folding@Home User Configuration
2 cores detected
--- Opening Log file [October 3 02:23:01 UTC]
# Linux SMP Console Edition ###################################################
###############################################################################
Folding@Home Client Version 6.24beta
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/cormiem/folding
Executable: ./fah6
Arguments: -smp -verbosity 9 -config
[02:23:01] - Ask before connecting: No
[02:23:01] - User name: Martin_T7250 (Team 96377)
[02:23:01] - User ID: 2FC465931CC94671
[02:23:01] - Machine ID: 1
[02:23:01]
[02:23:01] Configuring Folding@Home...
User name [Martin_T7250]?
Team Number [96377]?
Passkey [57e049eedfed136d24eaf2498d86a422]?
Ask before fetching/sending work (no/yes) [no]?
Use proxy (yes/no) [no]?
Acceptable size of work assignment and work result packets (bigger units
may have large memory demands) -- 'small' is <5MB, 'normal' is <10MB, and
'big' is >10MB (small/normal/big) [big]?
Change advanced options (yes/no) [no]? y
Core Priority (idle/low) [idle]?
Disable highly optimized assembly code (no/yes) [no]?
Interval, in minutes, between checkpoints (3-30) [10]?
Memory, in MB, to indicate (1977 available) [1666]?
Set -advmethods flag always, requesting new advanced
scientific cores and/or work units if available (no/yes) [yes]?
Ignore any deadline information (mainly useful if
system clock frequently has errors) (no/yes) [no]?
Machine ID (1-16) [1]? 2
The following options require you to restart the client before they take effect
Disable CPU affinity lock (no/yes) [no]?
Additional client parameters []?
IP address to bind core to (for viewer) []?
[02:23:19] - Ask before connecting: No
[02:23:19] - User name: Martin_T7250 (Team 96377)
[02:23:19] - User ID: 2FC465931CC94671
[02:23:19] - Machine ID: 2
[02:23:19]
[02:23:19] Work directory not found. Creating...
[02:23:19] Could not open work queue, generating new queue...
[02:23:19] - Autosending finished units... [October 3 02:23:19 UTC]
[02:23:19] Trying to send all finished work units
[02:23:19] + No unsent completed units remaining.
[02:23:19] - Autosend completed
[02:23:19] - Preparing to get new work unit...
[02:23:19] + Attempting to get work packet
[02:23:19] - Will indicate memory of 1666 MB
[02:23:19] - Connecting to assignment server
[02:23:19] Connecting to http://assign.stanford.edu:8080/
[02:23:20] Posted data.
[02:23:20] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[02:23:20] + News From Folding@Home: Welcome to Folding@Home
[02:23:20] Loaded queue successfully.
[02:23:20] Connecting to http://171.64.65.56:8080/
[02:23:25] Posted data.
[02:23:25] Initial: 0000; - Receiving payload (expected size: 4840736)
[02:23:35] - Downloaded at ~472 kB/s
[02:23:35] - Averaged speed for that direction ~472 kB/s
[02:23:35] + Received work.
[02:23:35] + Closed connections
[02:23:35]
[02:23:35] + Processing work unit
[02:23:35] At least 4 processors must be requested.Core required: FahCore_a2.exe
[02:23:35] Core found.
[02:23:35] Working on queue slot 01 [October 3 02:23:35 UTC]
[02:23:35] + Working ...
[02:23:35] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 10 -verbose -lifeline 5082 -version 624'
[02:23:35]
[02:23:35] *------------------------------*
[02:23:35] Folding@Home Gromacs SMP Core
[02:23:35] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[02:23:35]
[02:23:35] Preparing to commence simulation
[02:23:35] - Ensuring status. Please wait.
[02:23:36] Called DecompressByteArray: compressed_data_size=4840224 data_size=23983821, decompressed_data_size=23983821 diff=0
[02:23:36] - Digital signature verified
[02:23:36]
[02:23:36] Project: 2669 (Run 1, Clone 82, Gen 153)
[02:23:36]
[02:23:37] Assembly optimizations on if available.
[02:23:37] Entering M.D.
[02:23:47] Run 1, Clone 82, Gen 153)
[02:23:47]
[02:23:47] Entering M.D.
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NODEID=0 argc=20
NODEID=1 argc=20
NODEID=2 argc=20
NODEID=3 argc=20
Reading file work/wudata_01.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68
NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp
Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22884 system'
38500004 steps, 77000.0 ps (continuing from step 38250004, 76500.0 ps).
-
- Posts: 76
- Joined: Tue Apr 29, 2008 11:02 pm
- Hardware configuration: XP-32 Pro SP-3
Antec NSK-2480 with two Thermaltake 120mm Smart Fans
Gigabyte ga-ma78gm-s2h 780G IGP
BE-2350 with 10.5 x multiplier, 1.250V in BIOS, clock at 272 (2.856GHz)
EVGA 8800 GS
Ninja Mini CPU HS
GeIL 4GB (2 x 2GB) 240-Pin DDR2 SDRAM DDR2 800
Seagate 500GB SATA hard drive
ASUS 18X DVD±R DVD Burner PATA Model DRW-1814BL
Project: 2669 (Run 6, Clone 194, Gen 186)
WU fails before first 1%
I tried restarting this WU several times. It failed the same way until the client downloaded a new WU. The new one is running fine.
Code: Select all
Reading up to 4496772 from "work/wudata_09.xtc": Read 4496772
[08:13:38] xtc file hash check passed.
[08:13:38] edr file hash check passed.
[08:13:38] logfile size: 189627
[08:13:38] Leaving Run
[08:13:42] - Writing 25962567 bytes of core data to disk...
[08:13:43] ... Done.
[08:13:47] - Shutting down core
[08:13:47]
[08:13:47] Folding@home Core Shutdown: FINISHED_UNIT
[08:17:02] CoreStatus = 64 (100)
[08:17:02] Unit 9 finished with 64 percent of time to deadline remaining.
[08:17:02] Updated performance fraction: 0.639373
[08:17:02] Sending work to server
[08:17:02] + Attempting to send results
[08:17:02] - Reading file work/wuresults_09.dat from core
[08:17:02] (Read 25962567 bytes from disk)
[08:17:02] Connecting to http://171.64.65.56:8080/
[08:20:20] Posted data.
[08:20:20] Initial: 0000; - Uploaded at ~127 kB/s
[08:20:21] - Averaged speed for that direction ~125 kB/s
[08:20:21] + Results successfully sent
[08:20:21] Thank you for your contribution to Folding@Home.
[08:20:21] + Number of Units Completed: 88
[08:20:26] - Warning: Could not delete all work unit files (9): Core file absent
[08:20:26] Trying to send all finished work units
[08:20:26] + No unsent completed units remaining.
[08:20:26] - Preparing to get new work unit...
[08:20:26] + Attempting to get work packet
[08:20:26] - Connecting to assignment server
[08:20:26] Connecting to http://assign.stanford.edu:8080/
[08:20:26] Posted data.
[08:20:26] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[08:20:26] + News From Folding@Home: Welcome to Folding@Home
[08:20:26] Loaded queue successfully.
[08:20:26] Connecting to http://171.64.65.56:8080/
[08:20:32] Posted data.
[08:20:32] Initial: 0000; - Receiving payload (expected size: 4841078)
[08:20:40] - Downloaded at ~590 kB/s
[08:20:40] - Averaged speed for that direction ~590 kB/s
[08:20:40] + Received work.
[08:20:40] Trying to send all finished work units
[08:20:40] + No unsent completed units remaining.
[08:20:40] + Closed connections
[08:20:40]
[08:20:40] + Processing work unit
[08:20:40] Core required: FahCore_a2.exe
[08:20:40] Core found.
[08:20:40] Working on Unit 00 [December 26 08:20:40]
[08:20:40] + Working ...
[08:20:40] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 00 -checkpoint 15 -verbose -lifeline 2471 -version 602'
[08:20:40]
[08:20:40] *------------------------------*
[08:20:40] Folding@Home Gromacs SMP Core
[08:20:40] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[08:20:40]
[08:20:40] Preparing to commence simulation
[08:20:40] - Ensuring status. Please wait.
[08:20:41] Called DecompressByteArray: compressed_data_size=4840566 data_size=23982741, decompressed_data_size=23982741 diff=0
[08:20:41] - Digital signature verified
[08:20:41]
[08:20:41] Project: 2669 (Run 6, Clone 194, Gen 186)
[08:20:41]
[08:20:41] Assembly optimizations on if available.
[08:20:41] Entering M.D.
[08:20:51] un 6, Clone 194, Gen 186)
[08:20:51]
[08:20:51] Entering M.D.
[08:21:01]
[08:21:01] Folding@home Core Shutdown: INTERRUPTED
[08:21:06] CoreStatus = 66 (102)
[08:21:06] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[08:21:06] Killing all core threads
Folding@Home Client Shutdown.
Re: Project: 2669 (Run 6, Clone 194, Gen 186)
I put this WU on the list to be suspended.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.