Project: 2669 (Run 1, Clone 138, Gen 21)

Moderators: Site Moderators, FAHC Science Team

Post Reply
Flathead74
Posts: 266
Joined: Sun Dec 02, 2007 6:08 pm
Location: Central New York
Contact:

Project: 2669 (Run 1, Clone 138, Gen 21)

Post by Flathead74 »

Please note the payload expected size...
Shouldn't this be in the 4834423 area?

I think this is a bad work unit.
It fails immediately, as seen below in the console Fahlog.

Code: Select all

[23:01:31] Posted data.
[23:01:31] Initial: 0000; - Receiving payload (expected size: 1142751)
[23:01:33] - Downloaded at ~557 kB/s
[23:01:33] - Averaged speed for that direction ~490 kB/s
[23:01:33] + Received work.
[23:01:33] Trying to send all finished work units
[23:01:33] + No unsent completed units remaining.
[23:01:33] + Closed connections
[23:01:38] 
[23:01:38] + Processing work unit
[23:01:38] Core required: FahCore_a2.exe
[23:01:38] Core found.
[23:01:38] Working on queue slot 09 [July 23 23:01:38 UTC]
[23:01:38] + Working ...
[23:01:38] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 09 -checkpoint 15 -forceasm -verbose -lifeline 27876 -version 624'

[23:01:39] 
[23:01:39] *------------------------------*
[23:01:39] Folding@Home Gromacs SMP Core
[23:01:39] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[23:01:39] 
[23:01:39] Preparing to commence simulation
[23:01:39] - Ensuring status. Please wait.
[23:01:48] - Assembly optimizations manually forced on.
[23:01:48] - Not checking prior termination.
[23:01:49] - Expanded 1142239 -> 17887233 (decompressed 1565.9 percent)
[23:01:49] Called DecompressByteArray: compressed_data_size=1142239 data_size=17887233, decompressed_data_size=17887233 diff=0
[23:01:49] - Digital signature verified
[23:01:49] 
[23:01:49] Project: 2669 (Run 1, Clone 138, Gen 21)
[23:01:49] 
[23:01:49] Assembly optimizations on if available.
[23:01:49] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=p31b3
NNODES=4, MYRANK=2, HOSTNAME=p31b3
NNODES=4, MYRANK=1, HOSTNAME=p31b3
NODEID=0 argc=22
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090425  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             NNODES=4, MYRANK=3, HOSTNAME=p31b3
Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

NODEID=2 argc=22
Reading file work/wudata_09.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=1 argc=22
NODEID=3 argc=22


-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090425
Source code file: smalloc.c, line: 147

Fatal error:
Not enough memory. Failed to calloc 4037994351 elements of size 4 for block->index
(called from file tpxio.c, line 1180)
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day
: Cannot allocate memory
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[0]0:Return code = 255
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[23:02:03] CoreStatus = FF (255)
[23:02:03] Sending work to server
[23:02:03] Project: 2669 (Run 1, Clone 138, Gen 21)
[23:02:03] - Error: Could not get length of results file work/wuresults_09.dat
[23:02:03] - Error: Could not read unit 09 file. Removing from queue.
[23:02:03] Trying to send all finished work units
[23:02:03] + No unsent completed units remaining.
[23:02:03] - Preparing to get new work unit...
[23:02:03] + Attempting to get work packet
[23:02:03] - Will indicate memory of 1014 MB
[23:02:03] - Connecting to assignment server
[23:02:03] Connecting to http://assign.stanford.edu:8080/
[23:02:04] Posted data.
[23:02:04] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[23:02:04] + News From Folding@Home: Welcome to Folding@Home
[23:02:04] Loaded queue successfully.
[23:02:04] Connecting to http://171.64.65.56:8080/
[23:02:08] Posted data.
[23:02:08] Initial: 0000; - Receiving payload (expected size: 1142751)
[23:02:11] - Downloaded at ~371 kB/s
[23:02:11] - Averaged speed for that direction ~466 kB/s
[23:02:11] + Received work.
[23:02:11] Trying to send all finished work units
[23:02:11] + No unsent completed units remaining.
[23:02:11] + Closed connections
[23:02:16] 
[23:02:16] + Processing work unit
[23:02:16] Core required: FahCore_a2.exe
[23:02:16] Core found.
[23:02:16] Working on queue slot 00 [July 23 23:02:16 UTC]
[23:02:16] + Working ...
[23:02:16] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 00 -checkpoint 15 -forceasm -verbose -lifeline 27876 -version 624'

[23:02:16] 
[23:02:16] *------------------------------*
[23:02:16] Folding@Home Gromacs SMP Core
[23:02:16] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[23:02:16] 
[23:02:16] Preparing to commence simulation
[23:02:16] - Ensuring status. Please wait.
[23:02:26] - Assembly optimizations manually forced on.
[23:02:26] - Not checking prior termination.
[23:02:26] - Expanded 1142239 -> 17887233 (decompressed 1565.9 percent)
[23:02:26] Called DecompressByteArray: compressed_data_size=1142239 data_size=17887233, decompressed_data_size=17887233 diff=0
[23:02:26] - Digital signature verified
[23:02:26] 
[23:02:26] Project: 2669 (Run 1, Clone 138, Gen 21)
[23:02:26] 
[23:02:26] Assembly optimizations on if available.
[23:02:26] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=p31b3
NNODES=4, MYRANK=3, HOSTNAME=p31b3
NNODES=4, MYRANK=1, HOSTNAME=p31b3
NNODES=4, MYRANK=2, HOSTNAME=p31b3
NODEID=0 argc=22
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090425  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_00.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=1 argc=22
NODEID=2 argc=22
NODEID=3 argc=22

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090425
Source code file: smalloc.c, line: 147

Fatal error:
Not enough memory. Failed to calloc 3609319279 elements of size 4 for block->index
(called from file tpxio.c, line 1180)
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2669 (Run 1, Clone 138, Gen 21)

Post by bruce »

The Pande Group is aware of this problem. Hopefully they'll find a way to prevent it from happening -- perhaps by identifying the bad WU before it is issued.
Post Reply