P2669 R7/C4/G185 folding using only one core

Moderators: Site Moderators, FAHC Science Team

Post Reply
BrokenWolf
Posts: 126
Joined: Sat Aug 02, 2008 3:08 am

P2669 R7/C4/G185 folding using only one core

Post by BrokenWolf »

Hey ya'all. Guess what? That's right boys and girls. I (well one of Noah's machines i setup for him) got a 2669, R7/C4/G185, that was downloaded almost 12 hours ago and it is folding on one core only. I am not going to do the screen shots again. One FahCore_a2.exe running for the last 719 minutes. The other 3 FahCore_a2.exe cores coming up every once in a while with only a few seconds.

FahMon shows it will take another 5d 15h 23mn to finish. It will not make the deadline of 2d 11h 55mn.

Bruce. We can add the 2669 as something that is having this problem as well. Do we know if someone could look @ 171.64.65.56 and see if there are any more of these funky/small WU's waiting to be assigned?

From the terminal window:

Code: Select all

[04:46:43] Completed 237500 out of 250000 steps  (95%)
[05:01:38] Completed 240000 out of 250000 steps  (96%)
[05:15:02] - Autosending finished units... [August 11 05:15:02 UTC]
[05:15:02] Trying to send all finished work units
[05:15:02] + No unsent completed units remaining.
[05:15:02] - Autosend completed
[05:16:35] Completed 242500 out of 250000 steps  (97%)
[05:31:31] Completed 245000 out of 250000 steps  (98%)
[05:46:31] Completed 247500 out of 250000 steps  (99%)

Writing final coordinates.
[06:01:29] Completed 250000 out of 250000 steps  (100%)

 Average load imbalance: 5.3 %
 Part of the total run time spent waiting due to load imbalance: 3.5 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Z 0 %


        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:  89679.000  89679.000    100.0
                       1d00h54:39
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    144.606      6.073      0.482     49.821

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[06:01:30] DynamicWrapper: Finished Work Unit: sleep=10000
[06:01:40]
[06:01:40] Finished Work Unit:
[06:01:40] - Reading up to 21168720 from "work/wudata_06.trr": Read 21168720
[06:01:40] trr file hash check passed.
[06:01:40] - Reading up to 27132440 from "work/wudata_06.xtc": Read 27132440
[06:01:40] xtc file hash check passed.
[06:01:40] edr file hash check passed.
[06:01:40] logfile size: 181474
[06:01:40] Leaving Run
[06:01:40] - Writing 48627386 bytes of core data to disk...
[06:01:41]   ... Done.
[06:01:44] - Shutting down core
[06:01:44]
[06:01:44] Folding@home Core Shutdown: FINISHED_UNIT
Error encountered before initializing MPICH
[06:05:02] CoreStatus = 64 (100)
[06:05:02] Unit 6 finished with 65 percent of time to deadline remaining.
[06:05:02] Updated performance fraction: 0.655339
[06:05:02] Sending work to server
[06:05:02] Project: 2677 (Run 3, Clone 28, Gen 34)


[06:05:02] + Attempting to send results [August 11 06:05:02 UTC]
[06:05:02] - Reading file work/wuresults_06.dat from core
[06:05:02]   (Read 48627386 bytes from disk)
[06:05:02] Connecting to http://171.64.65.56:8080/
[06:05:28] Posted data.
[06:05:28] Initial: 0000; - Uploaded at ~1319 kB/s
[06:05:38] - Averaged speed for that direction ~1062 kB/s
[06:05:38] + Results successfully sent
[06:05:38] Thank you for your contribution to Folding@Home.
[06:05:38] + Number of Units Completed: 58

[06:05:39] - Warning: Could not delete all work unit files (6): Core file absent[06:05:39] Trying to send all finished work units
[06:05:39] + No unsent completed units remaining.
[06:05:39] - Preparing to get new work unit...
[06:05:39] + Attempting to get work packet
[06:05:39] - Will indicate memory of 1505 MB
[06:05:39] - Connecting to assignment server
[06:05:39] Connecting to http://assign.stanford.edu:8080/
[06:05:40] Posted data.
[06:05:40] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[06:05:40] + News From Folding@Home: Welcome to Folding@Home
[06:05:40] Loaded queue successfully.
[06:05:40] Connecting to http://171.64.65.56:8080/
[06:05:46] Posted data.
[06:05:46] Initial: 0000; - Receiving payload (expected size: 1509777)
[06:05:49] - Downloaded at ~491 kB/s
[06:05:49] - Averaged speed for that direction ~929 kB/s
[06:05:49] + Received work.
[06:05:49] Trying to send all finished work units
[06:05:49] + No unsent completed units remaining.
[06:05:49] + Closed connections
[06:05:49]
[06:05:49] + Processing work unit
[06:05:49] At least 4 processors must be requested.Core required: FahCore_a2.exe[06:05:49] Core found.
[06:05:49] Working on queue slot 07 [August 11 06:05:49 UTC]
[06:05:49] + Working ...
[06:05:49] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 07 -priority 96 -checkpoint 15 -verbose -lifeline 8230 -version 624'

[06:05:49]
[06:05:49] *------------------------------*
[06:05:49] Folding@Home Gromacs SMP Core
[06:05:49] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[06:05:49]
[06:05:49] Preparing to commence simulation
[06:05:49] - Ensuring status. Please wait.
[06:05:49] Called DecompressByteArray: compressed_data_size=1509265 data_size=23977801, decompressed_data_size=23977801 diff=0
[06:05:50] - Digital signature verified
[06:05:50]
[06:05:50] Project: 2669 (Run 7, Clone 4, Gen 185)
[06:05:50]
[06:05:50] Assembly optimizations on if available.
[06:05:50] Entering M.D.
[06:05:59] (Run 7, Clone 4, Gen 185)
[06:05:59]
[06:05:59] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=RHEL4N23.lab1.com
NNODES=4, MYRANK=1, HOSTNAME=RHEL4N23.lab1.com
NNODES=4, MYRANK=2, HOSTNAME=RHEL4N23.lab1.com
NNODES=4, MYRANK=3, HOSTNAME=RHEL4N23.lab1.com
NODEID=0 argc=22
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090425  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_07.tpr, VERSION 3.3.99_development_20070618 (single precision)
NODEID=1 argc=22
NODEID=2 argc=22
NODEID=3 argc=22
Note: tpx file_version 48, software version 65

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22869 system'
46500000 steps,  93000.0 ps (continuing from step 46250000,  92500.0 ps).
[06:06:27] Completed 0 out of 250000 steps  (0%)
[07:35:07] Completed 2500 out of 250000 steps  (1%)
[09:04:20] Completed 5000 out of 250000 steps  (2%)
[10:33:32] Completed 7500 out of 250000 steps  (3%)
[11:15:03] - Autosending finished units... [August 11 11:15:03 UTC]
[11:15:03] Trying to send all finished work units
[11:15:03] + No unsent completed units remaining.
[11:15:03] - Autosend completed
[12:02:48] Completed 10000 out of 250000 steps  (4%)
[13:32:06] Completed 12500 out of 250000 steps  (5%)
[15:01:22] Completed 15000 out of 250000 steps  (6%)
[16:30:41] Completed 17500 out of 250000 steps  (7%)
[17:15:03] - Autosending finished units... [August 11 17:15:03 UTC]
[17:15:03] Trying to send all finished work units
[17:15:03] + No unsent completed units remaining.
[17:15:03] - Autosend completed
[17:59:56] Completed 20000 out of 250000 steps  (8%)
There is something fishy going on here.

Have a good day.

BrokenWolf
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: P2669 R7/C4/G185 folding using only one core

Post by bruce »

I've reported this WU along with two other discussions of similar topics. The Pande Group was already aware of the problem and is working toward a fix.
Post Reply