Project: 2675 (Run 0, Clone 178, Gen 4)

Moderators: Site Moderators, FAHC Science Team

Post Reply
^w^ing
Posts: 136
Joined: Fri Mar 07, 2008 7:29 pm
Hardware configuration: C2D E6400 2.13 GHz @ 3.2 GHz
Asus EN8800GTS 640 (G80) @ 660/792/1700 running the 6.23 w/ core11 v1.19
forceware 260.89
Asus P5N-E SLi
2GB 800MHz DDRII (2xCorsair TwinX 512MB)
WinXP 32 SP3
Location: Prague

Project: 2675 (Run 0, Clone 178, Gen 4)

Post by ^w^ing »

failed twice in a row, upon the second time it broke the client.

Code: Select all

[09:34:20] - Autosending finished units... [August 26 09:34:20 UTC]
[09:34:20] Trying to send all finished work units
[09:34:20] + No unsent completed units remaining.
[09:34:20] - Autosend completed
[09:34:20] - Preparing to get new work unit...
[09:34:20] + Attempting to get work packet
[09:34:20] - Will indicate memory of 752 MB
[09:34:20] - Connecting to assignment server
[09:34:20] Connecting to http://assign.stanford.edu:8080/
[09:34:22] Posted data.
[09:34:22] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[09:34:22] + News From Folding@Home: Welcome to Folding@Home
[09:34:22] Loaded queue successfully.
[09:34:22] Connecting to http://171.64.65.56:8080/
[09:34:25] Posted data.
[09:34:25] Initial: 0000; - Receiving payload (expected size: 3985424)
[09:35:08] - Downloaded at ~90 kB/s
[09:35:08] - Averaged speed for that direction ~86 kB/s
[09:35:08] + Received work.
[09:35:09] + Closed connections
[09:35:09] 
[09:35:09] + Processing work unit
[09:35:09] At least 4 processors must be requested.Core required: FahCore_a2.exe
[09:35:09] Core found.
[09:35:09] Working on queue slot 02 [August 26 09:35:09 UTC]
[09:35:09] + Working ...
[09:35:09] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 30 -forceasm -verbose -lifeline 4539 -version 624'

[09:35:09] 
[09:35:09] *------------------------------*
[09:35:09] Folding@Home Gromacs SMP Core
[09:35:09] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[09:35:09] 
[09:35:09] Preparing to commence simulation
[09:35:09] - Ensuring status. Please wait.
[09:35:18] - Assembly optimizations manually forced on.
[09:35:18] - Not checking prior termination.
[09:35:20] - Expanded 3984912 -> 16935197 (decompressed 424.9 percent)
[09:35:20] Called DecompressByteArray: compressed_data_size=3984912 data_size=16935197, decompressed_data_size=16935197 diff=0
[09:35:20] - Digital signature verified
[09:35:20] 
[09:35:20] Project: 2675 (Run 0, Clone 178, Gen 4)
[09:35:20] 
[09:35:20] Assembly optimizations on if available.
[09:35:20] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NODEID=1 argc=22
NODEID=0 argc=22
NODEID=2 argc=22
NODEID=3 argc=22
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090425  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_02.tpr, VERSION 3.3.99_development_20070618 (single precision)

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090425
Source code file: symtab.c, line: 108

Fatal error:
symtab get_symtab_handle 2612 not found
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[0]0:Return code = 255
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[09:35:32] CoreStatus = FF (255)
[09:35:32] Sending work to server
[09:35:32] Project: 2675 (Run 0, Clone 178, Gen 4)
[09:35:32] - Error: Could not get length of results file work/wuresults_02.dat
[09:35:32] - Error: Could not read unit 02 file. Removing from queue.
[09:35:32] Trying to send all finished work units
[09:35:32] + No unsent completed units remaining.
[09:35:32] - Preparing to get new work unit...
[09:35:32] + Attempting to get work packet
[09:35:32] - Will indicate memory of 752 MB
[09:35:32] - Connecting to assignment server
[09:35:32] Connecting to http://assign.stanford.edu:8080/
[09:35:32] Posted data.
[09:35:32] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[09:35:32] + News From Folding@Home: Welcome to Folding@Home
[09:35:33] Loaded queue successfully.
[09:35:33] Connecting to http://171.64.65.56:8080/
[09:35:36] Posted data.
[09:35:36] Initial: 0000; - Receiving payload (expected size: 3985424)
[09:36:23] - Downloaded at ~82 kB/s
[09:36:23] - Averaged speed for that direction ~85 kB/s
[09:36:23] + Received work.
[09:36:23] Trying to send all finished work units
[09:36:23] + No unsent completed units remaining.
[09:36:23] + Closed connections
[09:36:28] 
[09:36:28] + Processing work unit
[09:36:28] At least 4 processors must be requested.Core required: FahCore_a2.exe
[09:36:28] Core found.
[09:36:30] Working on queue slot 03 [August 26 09:36:30 UTC]
[09:36:30] + Working ...
[09:36:30] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 03 -checkpoint 30 -forceasm -verbose -lifeline 4539 -version 624'

[09:36:30] 
[09:36:30] *------------------------------*
[09:36:30] Folding@Home Gromacs SMP Core
[09:36:30] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[09:36:30] 
[09:36:30] Preparing to commence simulation
[09:36:30] - Ensuring status. Please wait.
[09:36:39] - Assembly optimizations manually forced on.
[09:36:39] - Not checking prior termination.
[09:36:40] - Expanded 3984912 -> 16935197 (decompressed 424.9 percent)
[09:36:41] Called DecompressByteArray: compressed_data_size=3984912 data_size=16935197, decompressed_data_size=16935197 diff=0
[09:36:41] - Digital signature verified
[09:36:41] 
[09:36:41] Project: 2675 (Run 0, Clone 178, Gen 4)
[09:36:41] 
[09:36:41] Assembly optimizations on if available.
[09:36:41] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
NODEID=0 argc=22
NODEID=1 argc=22
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090425  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_03.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=2 argc=22
NODEID=3 argc=22

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090425
Source code file: symtab.c, line: 108

Fatal error:
symtab get_symtab_handle 2612 not found
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
I have to change machine ID of the linux client more often than I have to do so with my socks these days.
djchandler
Posts: 60
Joined: Mon Aug 03, 2009 6:43 pm
Hardware configuration: AMD Ryzen 7 5700G with Radeon Graphics 3.80 GHz
16.0 GB
HP Pavilion Desktop model TP01-2xxx without discrete video card
Windows 11 Pro
Native client running 1 slot, 8 cores

Re: Project: 2675 (Run 0, Clone 178, Gen 4)

Post by djchandler »

This unit was assigned 6 times consecutively to my notfred client (VMware 3.0, Windows 7 RTM x64, AMD-V enabled). In each instance, this error occurred:

Code: Select all

[10:31:47] Called DecompressByteArray: compressed_data_size=3984912 data_size=16935197, decompressed_data_size=16935197 diff=0
[10:31:48] - Digital signature verified
[10:31:48] 
[10:31:48] Project: 2675 (Run 0, Clone 178, Gen 4)
[10:31:48] 
[10:31:48] Assembly optimizations on if available.
[10:31:48] Entering M.D.
[10:31:55] Multi-core optimizations on
[10:31:59] CoreStatus = FF (255)
[10:31:59] Client-core communications error: ERROR 0xff
This is running on an AMD X2 7750. Don't know if this is a 2 core issue or what. I see the unit has been around for a while trying for completion, but only two donors reporting it over 3 months apart. Defective perhaps?

A2 core re-downloaded twice during this series. notfred client remains solid and has not had problems prior to this; was assigned and proceeding on a different A2 core WU.
Folding for Cures
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2675 (Run 0, Clone 178, Gen 4)

Post by bruce »

djchandler wrote:I see the unit has been around for a while trying for completion, but only two donors reporting it over 3 months apart. Defective perhaps?
That is strange. . . an instant error and only two reports? I guess some people don't check their clients very often. (Instant EUEs don't lower folk's PPD enough so I guess those folks don't even bother to look.)

I'll make sure it gets stopped.
Post Reply