Merged problems with projects 6903/6904, Part 1

Moderators: Site Moderators, FAHC Science Team

Amaruk
Posts: 254
Joined: Fri Jun 20, 2008 3:57 am
Location: Watching from the Woods

Project: 6903 (Run 2, Clone 13, Gen 39)

Post by Amaruk »

Thanks for the heads-up KMac. :)

Next one also bad, third loading now...

Code: Select all


--- Opening Log file [February 9 17:37:19 UTC] 


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/ogre/folding
Executable: ./fah6
Arguments: -configonly -smp -verbosity 9 -bigadv 

[17:37:19] - Ask before connecting: No
[17:37:19] - User name: Amaruk (Team 50625)
[17:37:19] - User ID: 17BC7707xxxxxxxx
[17:37:19] - Machine ID: 1
[17:37:19] 
[17:37:19] Configuring Folding@Home...


[17:37:45] - Ask before connecting: No
[17:37:45] - User name: Amaruk (Team 50625)
[17:37:45] - User ID: 17BC7707xxxxxxxx
[17:37:45] - Machine ID: 2
[17:37:45] 
[17:37:45] -configonly flag given, so exiting.


--- Opening Log file [February 9 17:39:30 UTC] 


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/ogre/folding
Executable: ./fah6
Arguments: -oneunit -smp -verbosity 9 -bigadv 

[17:39:30] - Ask before connecting: No
[17:39:30] - User name: Amaruk (Team 50625)
[17:39:30] - User ID: 17BC7707xxxxxxxx
[17:39:30] - Machine ID: 2
[17:39:30] 
[17:39:30] Work directory not found. Creating...
[17:39:30] Could not open work queue, generating new queue...
[17:39:30] - Preparing to get new work unit...
[17:39:30] - Autosending finished units... [February 9 17:39:30 UTC]
[17:39:30] Cleaning up work directory
[17:39:30] Trying to send all finished work units
[17:39:30] + No unsent completed units remaining.
[17:39:30] - Autosend completed
[17:39:30] + Attempting to get work packet
[17:39:30] Passkey found
[17:39:30] - Will indicate memory of 32233 MB
[17:39:30] - Connecting to assignment server
[17:39:30] Connecting to http://assign.stanford.edu:8080/
[17:39:30] Posted data.
[17:39:30] Initial: ED82; - Successful: assigned to (130.237.232.237).
[17:39:30] + News From Folding@Home: Welcome to Folding@Home
[17:39:30] Loaded queue successfully.
[17:39:30] Sent data
[17:39:30] Connecting to http://130.237.232.237:8080/
[17:39:42] Posted data.
[17:39:42] Initial: 0000; - Receiving payload (expected size: 46513362)
[17:42:17] - Downloaded at ~293 kB/s
[17:42:17] - Averaged speed for that direction ~293 kB/s
[17:42:17] + Received work.
[17:42:17] + Closed connections
[17:42:17] 
[17:42:17] + Processing work unit
[17:42:17] Core required: FahCore_a5.exe
[17:42:17] Core found.
[17:42:17] Working on queue slot 01 [February 9 17:42:17 UTC]
[17:42:17] + Working ...
[17:42:17] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 01 -np 48 -checkpoint 15 -verbose -lifeline 3801 -version 634'

[17:42:17] 
[17:42:17] *------------------------------*
[17:42:17] Folding@Home Gromacs SMP Core
[17:42:17] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[17:42:17] 
[17:42:17] Preparing to commence simulation
[17:42:17] - Looking at optimizations...
[17:42:17] - Created dyn
[17:42:17] - Files status OK
[17:42:23] - Expanded 46512850 -> 71846524 (decompressed 62.1 percent)
[17:42:23] Called DecompressByteArray: compressed_data_size=46512850 data_size=71846524, decompressed_data_size=71846524 diff=0
[17:42:24] - Digital signature verified
[17:42:24] 
[17:42:24] Project: 6903 (Run 2, Clone 13, Gen 39)
[17:42:24] 
[17:42:24] Assembly optimizations on if available.
[17:42:24] Entering M.D.
[17:42:33] Mapping NT from 48 to 48 
[17:42:39] Completed 0 out of 10000000 steps  (0%)
[17:45:54] ***** Got an Activate signal (2)
[17:45:55] Killing all core threads

Folding@Home Client Shutdown.
From terminal: 'WARNING: This run will generate roughly 7425 Mb of data' :shock:


~edit~

Third WU OK.

Note to mods: for convenience, post heading contains bad WU info.
Image
KMac
Posts: 31
Joined: Thu Feb 17, 2011 6:50 pm

Re: Merged problems with projects 6903/6904

Post by KMac »

Project: 6903 (Run 8, Clone 1, Gen 68)

Same thing. Is this being looked into?
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

It was but the Stanford crew has been quiet for the last few day's so nobody knows the answer for sure. My guess would be yes though.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Amaruk
Posts: 254
Joined: Fri Jun 20, 2008 3:57 am
Location: Watching from the Woods

Project: 6903 (Run 0, Clone 6, Gen 59)

Post by Amaruk »

Hopefully your right, Grandpa_01

Got another one today. That's three bad out of the last five...

Code: Select all

[20:27:52] Connecting to http://130.237.232.237:8080/
[20:28:04] Posted data.
[20:28:04] Initial: 0000; - Receiving payload (expected size: 45233539)
[20:30:58] - Downloaded at ~253 kB/s
[20:30:58] - Averaged speed for that direction ~273 kB/s
[20:30:58] + Received work.[20:27:52] Connecting to http://130.237.232.237:8080/
[20:28:04] Posted data.
[20:28:04] Initial: 0000; - Receiving payload (expected size: 45233539)
[20:30:58] - Downloaded at ~253 kB/s
[20:30:58] - Averaged speed for that direction ~273 kB/s
[20:30:58] + Received work.
[20:30:58] + Closed connections

<SNIP>

[20:31:05] Project: 6903 (Run 0, Clone 6, Gen 59)
[20:31:05] 
[20:31:05] Assembly optimizations on if available.
[20:31:05] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_02.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
[20:31:14] Mapping NT from 48 to 48 
Starting 48 threads
Making 2D domain decomposition 8 x 6 x 1

WARNING: This run will generate roughly 11012 Mb of data

starting mdrun 'Overlay'
15000000 steps,  60000.0 ps.
[20:31:19] Completed 0 out of 15000000 steps  (0%)
^C[20:36:10] ***** Got an Activate signal (2)
[20:36:10] Killing all core threads

Folding@Home Client Shutdown.
ogre@ogre:~/folding$ 

Received the second INT/TERM signal, stopping at the next step



[20:30:58] + Closed connections

<SNIP>

[20:31:05] Project: 6903 (Run 0, Clone 6, Gen 59)
[20:31:05] 
[20:31:05] Assembly optimizations on if available.
[20:31:05] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_02.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
[20:31:14] Mapping NT from 48 to 48 
Starting 48 threads
Making 2D domain decomposition 8 x 6 x 1

WARNING: This run will generate roughly 11012 Mb of data

starting mdrun 'Overlay'
15000000 steps,  60000.0 ps.
[20:31:19] Completed 0 out of 15000000 steps  (0%)
^C[20:36:10] ***** Got an Activate signal (2)
[20:36:10] Killing all core threads

Folding@Home Client Shutdown.
ogre@ogre:~/folding$ 

Received the second INT/TERM signal, stopping at the next step


Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

Just an update they are stll working on it.
kasson wrote: I'm afraid it's just the ~30 that we've identified. We're in the process of trying to "rehabilitate" them, but the current work server software doesn't let us nuke the WU's the way we used to. (If I did that, it would keep trying to assign empty WU packets, giving the 512-byte download + missing file issue.) We apologize for the problems and are working to fix the WU's as best we can.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

I do not know if there is anything we can do to help get rid of these WU's or not But I downloaded Project: 6903 (Run 6, Clone 2, Gen 75) and it has 19,000,000 steps It ran for 1:15:00 before I stoped it and did not complete 1% normaly 40:00 per % I have saved it to my desktop just in case there is anything I can do to help. Which I doubt there is but if there is anything we can do as end users let us know. This one started from step 18,500,000 and say's it has 500,000 steps for this WU's normal is 250,000 steps so this is double size. Would it do any good to move it to my 4P and see if it would complete it before the deadline.

Code: Select all

[04:19:51] - Successful: assigned to (130.237.232.237).
[04:19:51] + News From Folding@Home: Welcome to Folding@Home
[04:19:51] Loaded queue successfully.
[04:21:12] + Closed connections
[04:21:12] 
[04:21:12] + Processing work unit
[04:21:12] Core required: FahCore_a5.exe
[04:21:12] Core found.
[04:21:12] Working on queue slot 04 [February 11 04:21:12 UTC]
[04:21:12] + Working ...
[04:21:12] 
[04:21:12] *------------------------------*
[04:21:12] Folding@Home Gromacs SMP Core
[04:21:12] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[04:21:12] 
[04:21:12] Preparing to commence simulation
[04:21:12] - Looking at optimizations...
[04:21:12] - Created dyn
[04:21:12] - Files status OK
[04:21:15] - Expanded 57241998 -> 71846524 (decompressed 50.4 percent)
[04:21:15] Called DecompressByteArray: compressed_data_size=57241998 data_size=71846524, decompressed_data_size=71846524 diff=0
[04:21:16] - Digital signature verified
[04:21:16] 
[04:21:16] Project: 6903 (Run 6, Clone 2, Gen 75)
[04:21:16] 
[04:21:16] Assembly optimizations on if available.
[04:21:16] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_04.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
[04:21:23] Mapping NT from 12 to 12 
Starting 12 threads
Making 1D domain decomposition 12 x 1 x 1
starting mdrun 'Overlay'
19000000 steps,  76000.0 ps (continuing from step 18500,000  74000.0 ps).
[04:21:26] Completed 0 out of 500000 steps  (0%)



Last edited by Grandpa_01 on Sat Feb 11, 2012 6:05 am, edited 1 time in total.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
harlam357
Posts: 222
Joined: Fri Jun 27, 2008 11:03 pm
Location: Alabama - USA
Contact:

Re: Merged problems with projects 6903/6904

Post by harlam357 »

Died after completion with file io error. :(

Code: Select all

[20:10:13] + Processing work unit
[20:10:13] Core required: FahCore_a5.exe
[20:10:13] Core found.
[20:10:13] Working on queue slot 09 [February 8 20:10:13 UTC]
[20:10:13] + Working ...
[20:10:13] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 09 -np 48 -checkpoint 30 -verbose -lifeline 4522 -version 634'

[20:10:14] 
[20:10:14] *------------------------------*
[20:10:14] Folding@Home Gromacs SMP Core
[20:10:14] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[20:10:14] 
[20:10:14] Preparing to commence simulation
[20:10:14] - Looking at optimizations...
[20:10:14] - Created dyn
[20:10:14] - Files status OK
[20:10:21] - Expanded 57244135 -> 71846524 (decompressed 50.4 percent)
[20:10:21] Called DecompressByteArray: compressed_data_size=57244135 data_size=71846524, decompressed_data_size=71846524 diff=0
[20:10:22] - Digital signature verified
[20:10:22] 
[20:10:22] Project: 6903 (Run 8, Clone 1, Gen 68)
[20:10:22] 
[20:10:22] Assembly optimizations on if available.
[20:10:22] Entering M.D.
[20:10:31] Mapping NT from 48 to 48 
[20:10:37] Completed 0 out of 500000 steps  (0%)
[20:40:36] Completed 5000 out of 500000 steps  (1%)
[21:10:36] Completed 10000 out of 500000 steps  (2%)
[21:40:36] Completed 15000 out of 500000 steps  (3%)
[22:10:39] Completed 20000 out of 500000 steps  (4%)
[22:40:31] Completed 25000 out of 500000 steps  (5%)
[23:08:29] Completed 30000 out of 500000 steps  (6%)
[23:36:25] Completed 35000 out of 500000 steps  (7%)
[00:04:22] Completed 40000 out of 500000 steps  (8%)
[00:32:20] Completed 45000 out of 500000 steps  (9%)
[01:00:17] Completed 50000 out of 500000 steps  (10%)
[01:03:47] - Autosending finished units... [February 9 01:03:47 UTC]
[01:03:47] Trying to send all finished work units
[01:03:47] + No unsent completed units remaining.
[01:03:47] - Autosend completed
[01:28:19] Completed 55000 out of 500000 steps  (11%)
[01:56:15] Completed 60000 out of 500000 steps  (12%)
[02:24:11] Completed 65000 out of 500000 steps  (13%)
[02:52:07] Completed 70000 out of 500000 steps  (14%)
[03:20:04] Completed 75000 out of 500000 steps  (15%)
[03:48:00] Completed 80000 out of 500000 steps  (16%)
[04:15:56] Completed 85000 out of 500000 steps  (17%)
[04:43:52] Completed 90000 out of 500000 steps  (18%)
[05:11:47] Completed 95000 out of 500000 steps  (19%)
[05:39:38] Completed 100000 out of 500000 steps  (20%)
[06:07:37] Completed 105000 out of 500000 steps  (21%)
[06:35:35] Completed 110000 out of 500000 steps  (22%)
[07:03:33] Completed 115000 out of 500000 steps  (23%)
[07:03:47] - Autosending finished units... [February 9 07:03:47 UTC]
[07:03:47] Trying to send all finished work units
[07:03:47] + No unsent completed units remaining.
[07:03:47] - Autosend completed
[07:31:30] Completed 120000 out of 500000 steps  (24%)
[07:59:26] Completed 125000 out of 500000 steps  (25%)
[08:27:23] Completed 130000 out of 500000 steps  (26%)
[08:55:19] Completed 135000 out of 500000 steps  (27%)
[09:23:16] Completed 140000 out of 500000 steps  (28%)
[09:51:12] Completed 145000 out of 500000 steps  (29%)
[10:19:09] Completed 150000 out of 500000 steps  (30%)
[10:47:06] Completed 155000 out of 500000 steps  (31%)
[11:15:03] Completed 160000 out of 500000 steps  (32%)
[11:43:00] Completed 165000 out of 500000 steps  (33%)
[12:10:58] Completed 170000 out of 500000 steps  (34%)
[12:38:49] Completed 175000 out of 500000 steps  (35%)
[13:03:47] - Autosending finished units... [February 9 13:03:47 UTC]
[13:03:47] Trying to send all finished work units
[13:03:47] + No unsent completed units remaining.
[13:03:47] - Autosend completed
[13:06:47] Completed 180000 out of 500000 steps  (36%)
[13:34:44] Completed 185000 out of 500000 steps  (37%)
[14:02:45] Completed 190000 out of 500000 steps  (38%)
[14:31:05] Completed 195000 out of 500000 steps  (39%)
[14:59:02] Completed 200000 out of 500000 steps  (40%)
[15:26:59] Completed 205000 out of 500000 steps  (41%)
[15:54:56] Completed 210000 out of 500000 steps  (42%)
[16:22:55] Completed 215000 out of 500000 steps  (43%)
[16:50:53] Completed 220000 out of 500000 steps  (44%)
[17:18:51] Completed 225000 out of 500000 steps  (45%)
[17:46:53] Completed 230000 out of 500000 steps  (46%)
[18:14:52] Completed 235000 out of 500000 steps  (47%)
[18:42:52] Completed 240000 out of 500000 steps  (48%)
[19:03:47] - Autosending finished units... [February 9 19:03:47 UTC]
[19:03:47] Trying to send all finished work units
[19:03:47] + No unsent completed units remaining.
[19:03:47] - Autosend completed
[19:10:51] Completed 245000 out of 500000 steps  (49%)
[19:38:44] Completed 250000 out of 500000 steps  (50%)
[20:06:58] Completed 255000 out of 500000 steps  (51%)
[20:34:59] Completed 260000 out of 500000 steps  (52%)
[21:03:00] Completed 265000 out of 500000 steps  (53%)
[21:31:00] Completed 270000 out of 500000 steps  (54%)
[21:58:59] Completed 275000 out of 500000 steps  (55%)
[22:26:59] Completed 280000 out of 500000 steps  (56%)
[22:54:57] Completed 285000 out of 500000 steps  (57%)
[23:22:54] Completed 290000 out of 500000 steps  (58%)
[23:50:52] Completed 295000 out of 500000 steps  (59%)
[00:18:49] Completed 300000 out of 500000 steps  (60%)
[00:46:47] Completed 305000 out of 500000 steps  (61%)
[01:03:47] - Autosending finished units... [February 10 01:03:47 UTC]
[01:03:47] Trying to send all finished work units
[01:03:47] + No unsent completed units remaining.
[01:03:47] - Autosend completed
[01:14:46] Completed 310000 out of 500000 steps  (62%)
[01:42:44] Completed 315000 out of 500000 steps  (63%)
[02:10:36] Completed 320000 out of 500000 steps  (64%)
[02:38:34] Completed 325000 out of 500000 steps  (65%)
[03:06:32] Completed 330000 out of 500000 steps  (66%)
[03:34:30] Completed 335000 out of 500000 steps  (67%)
[04:02:28] Completed 340000 out of 500000 steps  (68%)
[04:30:26] Completed 345000 out of 500000 steps  (69%)
[04:58:27] Completed 350000 out of 500000 steps  (70%)
[05:26:27] Completed 355000 out of 500000 steps  (71%)
[05:54:28] Completed 360000 out of 500000 steps  (72%)
[06:22:28] Completed 365000 out of 500000 steps  (73%)
[06:50:28] Completed 370000 out of 500000 steps  (74%)
[07:03:47] - Autosending finished units... [February 10 07:03:47 UTC]
[07:03:47] Trying to send all finished work units
[07:03:47] + No unsent completed units remaining.
[07:03:47] - Autosend completed
[07:18:28] Completed 375000 out of 500000 steps  (75%)
[07:46:28] Completed 380000 out of 500000 steps  (76%)
[08:14:29] Completed 385000 out of 500000 steps  (77%)
[08:42:29] Completed 390000 out of 500000 steps  (78%)
[09:10:21] Completed 395000 out of 500000 steps  (79%)
[09:38:21] Completed 400000 out of 500000 steps  (80%)
[10:06:20] Completed 405000 out of 500000 steps  (81%)
[10:34:19] Completed 410000 out of 500000 steps  (82%)
[11:02:16] Completed 415000 out of 500000 steps  (83%)
[11:30:13] Completed 420000 out of 500000 steps  (84%)
[11:58:10] Completed 425000 out of 500000 steps  (85%)
[12:26:08] Completed 430000 out of 500000 steps  (86%)
[12:54:05] Completed 435000 out of 500000 steps  (87%)
[13:03:47] - Autosending finished units... [February 10 13:03:47 UTC]
[13:03:47] Trying to send all finished work units
[13:03:47] + No unsent completed units remaining.
[13:03:47] - Autosend completed
[13:22:03] Completed 440000 out of 500000 steps  (88%)
[13:50:25] Completed 445000 out of 500000 steps  (89%)
[14:18:52] Completed 450000 out of 500000 steps  (90%)
[14:46:51] Completed 455000 out of 500000 steps  (91%)
[15:14:49] Completed 460000 out of 500000 steps  (92%)
[15:42:49] Completed 465000 out of 500000 steps  (93%)
[16:10:49] Completed 470000 out of 500000 steps  (94%)
[16:38:43] Completed 475000 out of 500000 steps  (95%)
[17:06:45] Completed 480000 out of 500000 steps  (96%)
[17:34:47] Completed 485000 out of 500000 steps  (97%)
[18:02:49] Completed 490000 out of 500000 steps  (98%)
[18:30:51] Completed 495000 out of 500000 steps  (99%)
[18:58:53] Completed 500000 out of 500000 steps  (100%)
[18:59:18] DynamicWrapper: Finished Work Unit: sleep=10000
[18:59:28] 
[18:59:28] Finished Work Unit:
[18:59:28] - Reading up to 182433744 from "work/wudata_09.trr": Read 182433744
[18:59:30] trr file hash check passed.
[18:59:30] - Reading up to 207642120 from "work/wudata_09.xtc": Read 207642120
[18:59:35] xtc file hash check passed.
[18:59:35] edr file hash check passed.
[18:59:35] logfile size: 389127
[18:59:35] Leaving Run
[18:59:38] - Writing 390808983 bytes of core data to disk...
[19:01:44] Done: 390808471 -> 378419194 (compressed to 8.9 percent)
[19:01:44] - Compressed data size (378419194) exceeds limit. 
[19:01:44] - Error: Could not write out results to file
[19:01:44] - Shutting down core
[19:01:44] 
[19:01:44] Folding@home Core Shutdown: FILE_IO_ERROR
[19:01:44] CoreStatus = 75 (117)
[19:01:44] Error opening or reading from a file.
[19:01:44] Deleting current work unit & continuing...
[19:02:33] Trying to send all finished work units
[19:02:33] + No unsent completed units remaining.
[19:02:33] - Preparing to get new work unit...
[19:02:33] Cleaning up work directory
[19:02:33] + Attempting to get work packet
[19:02:33] Passkey found
[19:02:33] - Will indicate memory of 32232 MB
[19:02:33] - Connecting to assignment server
[19:02:33] Connecting to http://assign.stanford.edu:8080/
[19:02:34] Posted data.
[19:02:34] Initial: ED82; - Successful: assigned to (130.237.232.237).
[19:02:34] + News From Folding@Home: Welcome to Folding@Home
[19:02:34] Loaded queue successfully.
[19:02:34] Sent data
[19:02:34] Connecting to http://130.237.232.237:8080/
[19:02:47] Posted data.
[19:02:47] Initial: 0000; - Receiving payload (expected size: 57244647)
[19:03:47] - Autosending finished units... [February 10 19:03:47 UTC]
[19:03:47] Trying to send all finished work units
[19:03:47] + No unsent completed units remaining.
[19:03:47] - Autosend completed
[19:05:23] - Downloaded at ~358 kB/s
[19:05:23] - Averaged speed for that direction ~608 kB/s
[19:05:23] + Received work.
[19:05:23] + Closed connections
[19:05:28] 
[19:05:28] + Processing work unit
[19:05:28] Core required: FahCore_a5.exe
[19:05:28] Core found.
[19:05:28] Working on queue slot 00 [February 10 19:05:28 UTC]
[19:05:28] + Working ...
[19:05:28] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 00 -np 48 -checkpoint 30 -verbose -lifeline 4522 -version 634'

[19:05:28] 
[19:05:28] *------------------------------*
[19:05:28] Folding@Home Gromacs SMP Core
[19:05:28] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[19:05:28] 
[19:05:28] Preparing to commence simulation
[19:05:28] - Looking at optimizations...
[19:05:28] - Created dyn
[19:05:28] - Files status OK
[19:05:35] - Expanded 57244135 -> 71846524 (decompressed 50.4 percent)
[19:05:35] Called DecompressByteArray: compressed_data_size=57244135 data_size=71846524, decompressed_data_size=71846524 diff=0
[19:05:36] - Digital signature verified
[19:05:36] 
[19:05:36] Project: 6903 (Run 8, Clone 1, Gen 68)
[19:05:36] 
[19:05:36] Assembly optimizations on if available.
[19:05:36] Entering M.D.
[19:05:45] Mapping NT from 48 to 48 
[19:05:51] Completed 0 out of 500000 steps  (0%)
[19:36:03] Completed 5000 out of 500000 steps  (1%)
[19:48:45] ***** Got an Activate signal (2)
[19:48:45] Killing all core threads

Folding@Home Client Shutdown.
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

Not only did it die but it gave you the same WU to do again. :wink:
bollix47
Posts: 2950
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Merged problems with projects 6903/6904

Post by bollix47 »

Here's another with high number of steps and taking twice the normal TPF:

Project: 6903 (Run 5, Clone 18, Gen 45)

Code: Select all

[15:03:24] - Preparing to get new work unit...
[15:03:24] Cleaning up work directory
[15:03:25] + Attempting to get work packet
[15:03:25] Passkey found
[15:03:25] - Will indicate memory of 12032 MB
[15:03:25] - Connecting to assignment server
[15:03:25] Connecting to http://assign.stanford.edu:8080/
[15:03:25] Posted data.
[15:03:25] Initial: ED82; - Successful: assigned to (130.237.232.237).
[15:03:25] + News From Folding@Home: Welcome to Folding@Home
[15:03:25] Loaded queue successfully.
[15:03:25] Sent data
[15:03:25] Connecting to http://130.237.232.237:8080/
[15:03:38] Posted data.
[15:03:38] Initial: 0000; - Receiving payload (expected size: 57251557)
[15:04:51] - Downloaded at ~765 kB/s
[15:04:51] - Averaged speed for that direction ~824 kB/s
[15:04:51] + Received work.
[15:04:51] Trying to send all finished work units
[15:04:51] + No unsent completed units remaining.
[15:04:51] + Closed connections
[15:04:51] 


[15:04:51] + Processing work unit
[15:04:51] Core required: FahCore_a5.exe
[15:04:51] Core found.
[15:04:51] Working on queue slot 08 [February 11 15:04:51 UTC]
[15:04:51] + Working ...
[15:04:51] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 08 -np 24 -priority 96 -checkpoint 30 -verbose -lifeline 1734 -version 634'

[15:04:51] 
[15:04:51] *------------------------------*
[15:04:51] Folding@Home Gromacs SMP Core
[15:04:51] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[15:04:51] 
[15:04:51] Preparing to commence simulation
[15:04:51] - Looking at optimizations...
[15:04:51] - Created dyn
[15:04:51] - Files status OK
[15:04:56] - Expanded 57251045 -> 71846524 (decompressed 50.4 percent)
[15:04:56] Called DecompressByteArray: compressed_data_size=57251045 data_size=71846524, decompressed_data_size=71846524 diff=0
[15:04:56] - Digital signature verified
[15:04:56] 
[15:04:56] Project: 6903 (Run 5, Clone 18, Gen 45)
[15:04:56] 
[15:04:57] Assembly optimizations on if available.
[15:04:57] Entering M.D.
[15:05:04] Mapping NT from 24 to 24 
[15:05:09] Completed 0 out of 500000 steps  (0%)
[16:08:58] Completed 5000 out of 500000 steps  (1%)
Patriot
Posts: 76
Joined: Sat Aug 16, 2008 2:04 pm

Re: Merged problems with projects 6903/6904

Post by Patriot »

times were close to regular for me so I didn't notice till it was finished and gave me an error before submitting...

Code: Select all

[08:04:03] Project: 6903 (Run 2, Clone 5, Gen 66)
[08:04:03]
[08:04:03] Assembly optimizations on if available.
[08:04:03] Entering M.D.
[08:04:11] Mapping NT from 48 to 48
[08:04:15] Completed 0 out of 500000 steps  (0%)
Image
3.0charlie
Posts: 13
Joined: Wed Jul 29, 2009 4:34 pm

Re: Merged problems with projects 6903/6904

Post by 3.0charlie »

Grandpa_01 wrote:I do not know if there is anything we can do to help get rid of these WU's or not But I downloaded Project: 6903 (Run 6, Clone 2, Gen 75) and it has 19,000,000 steps It ran for 1:15:00 before I stoped it and did not complete 1% normaly 40:00 per % I have saved it to my desktop just in case there is anything I can do to help. Which I doubt there is but if there is anything we can do as end users let us know. This one started from step 18,500,000 and say's it has 500,000 steps for this WU's normal is 250,000 steps so this is double size. Would it do any good to move it to my 4P and see if it would complete it before the deadline.
My 4p is actually folding the exact same WU (23% completed), HFM shows a tpf of 28 min 41 secs. Still 19,000,000 steps.

So I should dump it?
Folding for Hardware Canucks
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

Yes you should dump it. As near as I can tell the people that have tried to run them can successfully run them, but they do not send, so it is just a waste of time to run them. Until somebody tells us differently dumping them is what I am recommending. The 19,000,000 steps is not what is important it is the next line ([04:21:26] Completed 0 out of 500000 steps (0%)) where it says 500,000 it should be 250,000 steps. That particular WU is twise the size it should be.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Merged problems with projects 6903/6904

Post by bruce »

The problem with dumping them, the AS just sends them to somebody else. They really need to be removed from circulation and/or fixed. Dumping them just shifts the problem to someone else.

Dr. Kasson said he had fixed a number of them. Since some are obviously still in circulation, we need to figure out the best way to deal with them.

I don't understand why Kasson's procedure didn't work. When WUs are stopped, there always is some delay -- WUs which folks already are working on, and perhaps they still get reassigned to others without being checked if they've been flagged for stopping. The servers are designed to make sure every WU is completed by somebody and I guess that sometimes works just too well.
MtM
Posts: 1579
Joined: Fri Jun 27, 2008 2:20 pm
Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot
Location: The Netherlands
Contact:

Re: Merged problems with projects 6903/6904

Post by MtM »

Seems probable that new problem wu's keep getting generated/added to the work servers. If so, the only way to stop them is a> pull the project and see why work units generate new generations with wrong steps and prevent that from happening or b> add logic to the server which checks the wu's before sending them out ( doesn't solve the bad work unit from being generated/existing but it will prevent them being assigned to people ).
3.0charlie
Posts: 13
Joined: Wed Jul 29, 2009 4:34 pm

Re: Merged problems with projects 6903/6904

Post by 3.0charlie »

Grandpa_01 wrote:Yes you should dump it. As near as I can tell the people that have tried to run them can successfully run them, but they do not send, so it is just a waste of time to run them. Until somebody tells us differently dumping them is what I am recommending. The 19,000,000 steps is not what is important it is the next line ([04:21:26] Completed 0 out of 500000 steps (0%)) where it says 500,000 it should be 250,000 steps. That particular WU is twise the size it should be.
That's what I did, and the 6903 was indeed a 500,000 steps per %. The next one in line is a 6904 (R0, C5, G82) which has 250,000 steps per %.
Folding for Hardware Canucks
Post Reply