Project 3065 (Run 2, Clone 257, Gen 11)

Moderators: Site Moderators, FAHC Science Team

Post Reply
klasseng
Posts: 126
Joined: Thu Dec 27, 2007 6:08 am
Hardware configuration: System 1: Mac Studio, M1 Max,
System 2: Mac Mini, M2
Location: Canada

Project 3065 (Run 2, Clone 257, Gen 11)

Post by klasseng »

Running on a 3Ghz MacPro Octo-core
Just got this WU and it's displaying unusual behaviour:

Code: Select all

[16:38:56] Project: 306- Starting from initial work packet
[16:38:56] 
[16:38:56] Project: 3065 (Run 2, Clone 257, Gen 11)
[16:38:56] 
[16:38:56] Entering M.D.
NNODES=4, MYRANK=2, HOSTNAME=8Core.local
NNODES=4, MYRANK=0, HOSTNAME=8Core.local
NNODES=4, MYRANK=1, HOSTNAME=8Core.local
NNODES=4, MYRANK=3, HOSTNAME=8Core.local
NODEID=3 argc=15
NODEID=0 argc=15
NODEID=1 argc=15
NODEID=2 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

[16:39:02] cpfilenamepfilename: 
[16:39:02] Rejecting checkpoint
starting mdrun '66728 p3065_lambda5_99sb_big'
2500000 steps,   5000.0 ps.

[16:39:03] Protein: 66728 p3065_lambda5Extra SSE boost OK.
[16:39:03] oost OK.
[16:39:03] 
[16:39:03] Extra SSE boost OK.
[16:39:03] Writing local files
[16:39:03] Completed 0 out of 2500000 steps  (0 percent)
[16:59:03] Timered checkpoint triggered.
[17:19:03] Timered checkpoint triggered.
[17:39:05] Timered checkpoint triggered.
unitinfo.txt reports that it's still at 0% after an hour.

Activity Monitor shows that it is only sporadically using CPU time, the four instances of FahCOre_a1.exe are jumping between 0% and 25% CPU utilization (most cores run steady at around 95% - 96%). There's lots of idle time on the system.
peace,
Grant
susato
Site Moderator
Posts: 511
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: Project 3065 (Run 2, Clone 257, Gen 11)

Post by susato »

Hi Grant - Thanks for reporting this strange problem. Funny, I have a p3064 (r3, c192, g21) on a Mini, which was doing something similar this morning. Just quit working some time after the 13th frame, while still in 'running' status. The cores and client were all visible in activity monitor but using 0% CPU. I stopped the WU (cores and client all quit immediately) and restarted it. It came back up smoothly and is now cranking along well, though it will probably miss the deadline after spending around 24 hours stalled.

Try stopping and restarting yours, monitoring the client and core utilization in Activity Monitor as you do so. I'd be interested to see whether the simple restart is enough to get your work unit back on track.

No one else has turned in results partial or complete for this unit, and the preceding generation was submitted for credit early this morning, meaning that you are the first recipient for this generation of the unit.
klasseng
Posts: 126
Joined: Thu Dec 27, 2007 6:08 am
Hardware configuration: System 1: Mac Studio, M1 Max,
System 2: Mac Mini, M2
Location: Canada

Re: Project 3065 (Run 2, Clone 257, Gen 11)

Post by klasseng »

I didn't leave the WU running at reduced output very long, there was no way it was going to complete on time at that rate.

So, as you suggested, I stopped the WU, restarted it and it started up at full speed, now having completed 31%

Will let you know how it ends.
peace,
Grant
klasseng
Posts: 126
Joined: Thu Dec 27, 2007 6:08 am
Hardware configuration: System 1: Mac Studio, M1 Max,
System 2: Mac Mini, M2
Location: Canada

Re: Project 3065 (Run 2, Clone 257, Gen 11)

Post by klasseng »

Seemed to complete and upload the result OK
peace,
Grant
Post Reply