Project: 6013 (Run 0, Clone 56, Gen 136)

Moderators: Site Moderators, FAHC Science Team

Post Reply
ChasR
Posts: 402
Joined: Sun Dec 02, 2007 5:36 am
Location: Atlanta, GA

Project: 6013 (Run 0, Clone 56, Gen 136)

Post by ChasR »

The subject WU runs very slowly and will not make the deadline on a Q9450 @ 3.4 GHz, Win XP x64. This WU has been assigned to me multiple times with the same result.

Code: Select all

[00:45:03] *------------------------------*
[00:45:03] Folding@Home Gromacs SMP Core
[00:45:03] Version 2.19 (Mar 12, 2010)
[00:45:03] 
[00:45:03] Preparing to commence simulation
[00:45:03] - Looking at optimizations...
[00:45:03] - Created dyn
[00:45:03] - Files status OK
[00:45:04] - Expanded 979406 -> 10427873 (decompressed 1064.7 percent)
[00:45:04] Called DecompressByteArray: compressed_data_size=979406 data_size=10427873, decompressed_data_size=10427873 diff=0
[00:45:04] - Digital signature verified
[00:45:04] 
[00:45:04] Project: 6013 (Run 0, Clone 56, Gen 136)
[00:45:04] 
[00:45:04] Assembly optimizations on if available.
[00:45:04] Entering M.D.
[00:45:27] Completed 0 out of 250000 steps  (0%)
[01:46:10] Completed 2500 out of 250000 steps  (1%)
[02:04:08] - Autosending finished units... [June 10 02:04:08 UTC]
[02:04:08] Trying to send all finished work units
[02:04:08] + No unsent completed units remaining.
[02:04:08] - Autosend completed
[02:41:13] Completed 5000 out of 250000 steps  (2%)
[03:34:43] Completed 7500 out of 250000 steps  (3%)
[04:28:04] Completed 10000 out of 250000 steps  (4%)
[05:21:36] Completed 12500 out of 250000 steps  (5%)
[06:14:57] Completed 15000 out of 250000 steps  (6%)
[07:08:21] Completed 17500 out of 250000 steps  (7%)
[08:01:47] Completed 20000 out of 250000 steps  (8%)
[08:04:08] - Autosending finished units... [June 10 08:04:08 UTC]
[08:04:08] Trying to send all finished work units
[08:04:08] + No unsent completed units remaining.
[08:04:08] - Autosend completed
[08:55:16] Completed 22500 out of 250000 steps  (9%)
[09:48:49] Completed 25000 out of 250000 steps  (10%)
[10:45:47] Completed 27500 out of 250000 steps  (11%)
[11:43:53] Completed 30000 out of 250000 steps  (12%)
[12:37:28] Completed 32500 out of 250000 steps  (13%)
[13:34:29] Completed 35000 out of 250000 steps  (14%)
[14:04:08] - Autosending finished units... [June 10 14:04:08 UTC]
[14:04:08] Trying to send all finished work units
[14:04:08] + No unsent completed units remaining.
[14:04:08] - Autosend completed
[14:32:20] Completed 37500 out of 250000 steps  (15%)
 
Image
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project: 6013 (Run 0, Clone 56, Gen 136)

Post by kasson »

It's hard to say what's making the WU slow (bad WU vs. unusual interesting WU). But if it won't make the deadline on a machine that usually finishes that project with time to spare, delete and move on.
susato
Site Moderator
Posts: 512
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: Project: 6013 (Run 0, Clone 56, Gen 136)

Post by susato »

A similar problem was reported by a team member running a Linux quad (Dell Optiplex 755, stock clocks). Not only did frame times exceed 1 hour, the unit repeatedly crashed and was reassigned without reporting partial results back to Stanford. Here's typical FAHlog.txt:

Code: Select all

[14:35:27] + Number of Units Completed: 1044

[14:35:27] Trying to send all finished work units
[14:35:27] + No unsent completed units remaining.
[14:35:27] - Preparing to get new work unit...
[14:35:27] Cleaning up work directory
[14:35:27] + Attempting to get work packet
[14:35:27] Passkey found
[14:35:27] - Will indicate memory of 1500 MB
[14:35:27] - Connecting to assignment server
[14:35:27] Connecting to http://assign.stanford.edu:8080/
[14:35:28] Posted data.
[14:35:28] Initial: ED82; - Successful: assigned to (130.237.232.140).
[14:35:28] + News From Folding@Home: Welcome to Folding@Home
[14:35:28] Loaded queue successfully.
[14:35:28] Connecting to http://130.237.232.140:8080/
[14:35:32] Posted data.
[14:35:32] Initial: 0000; - Receiving payload (expected size: 979918)
[14:35:35] - Downloaded at ~318 kB/s
[14:35:35] - Averaged speed for that direction ~426 kB/s
[14:35:35] + Received work.
[14:35:35] Trying to send all finished work units
[14:35:35] + No unsent completed units remaining.
[14:35:35] + Closed connections
[14:35:35] 
[14:35:35] + Processing work unit
[14:35:35] Core required: FahCore_a3.exe
[14:35:35] Core found.
[14:35:35] Working on queue slot 04 [June 10 14:35:35 UTC]
[14:35:35] + Working ...
[14:35:35] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 04 -np 4 -checkpoint 15 -verbose -lifeline 31368 -version 629'

[14:35:35] 
[14:35:35] *------------------------------*
[14:35:35] Folding@Home Gromacs SMP Core
[14:35:35] Version 2.19 (March 6, 2010)
[14:35:35] 
[14:35:35] Preparing to commence simulation
[14:35:35] - Looking at optimizations...
[14:35:35] - Created dyn
[14:35:35] - Files status OK
[14:35:36] - Expanded 979406 -> 10427873 (decompressed 1064.7 percent)
[14:35:36] Called DecompressByteArray: compressed_data_size=979406 data_size=10427873, decompressed_data_size=10427873 diff=0
[14:35:36] - Digital signature verified
[14:35:36] 
[14:35:36] Project: 6013 (Run 0, Clone 56, Gen 136)
[14:35:36] 
[14:35:36] Assembly optimizations on if available.
[14:35:36] Entering M.D.
[14:35:57] Completed 0 out of 250000 steps  (0%)
[15:36:54] Completed 2500 out of 250000 steps  (1%)
[16:37:51] Completed 5000 out of 250000 steps  (2%)
[17:38:47] Completed 7500 out of 250000 steps  (3%)
[18:39:44] Completed 10000 out of 250000 steps  (4%)
[18:58:49] - Autosending finished units... [June 10 18:58:49 UTC]
[18:58:49] Trying to send all finished work units
[18:58:49] + No unsent completed units remaining.
[18:58:49] - Autosend completed
[19:40:41] Completed 12500 out of 250000 steps  (5%)
[20:03:58] CoreStatus = 89 (137)
[20:03:58] Client-core communications error: ERROR 0x89
[20:03:58] Deleting current work unit & continuing...
Previous generations of this WU had average completion times of around 0.5 days. Gen 135 of this P,R,C was completed on April 29. The individual who finally completed gen 136 took 6.27 days of run time and finished it on June 9. Wow - that's 40 days and 40 nights of folding attempts.

Gen 137 was assigned around 30 hours ago and hasn't returned yet; it may prove to be similar.

I think it's a piece of good luck that someone completed this WU at all! :) There's definitely something odd about this WU - Hoping that it falls into the unusual and interesting category after all the trouble it's caused.
susato
Site Moderator
Posts: 512
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: Project: 6013 (Run 0, Clone 56, Gen 136)

Post by susato »

Excuse the double post - as of this morning, this troubled WU is still being reassigned. Note the June 11 timestamp on the FAHlog.txt.

The donor is running Linux on a C2Q machine at stock clocks. Knowing the history of this WU the donor deleted it as soon as he recognized it.

FWIW, the next generation of this WU (Project: 6013 (Run 0, Clone 56, Gen 137) ) was probably assigned between 16:10 and 17:10 on June 9 2010, based on
Donator: M.... Team: 4....
CPUId: 0018XXXXXXXXXXXX4C
Credit: 380 Credit Time: 2010-06-09 17:06:20
Entered into logs at: 2010-06-09 17:05:58
WU assigned to donor at: 2010-06-03 10:31:18
Days taken to complete WU: 6.27
Error code: 0
These usually complete in around 12 hours but 40 hours after the estimated assignment time, it's not back.

Here's the FAHlog.txt from this morning for the reassigned unit:

Code: Select all

07:23:44] + Attempting to send results [June 11 07:23:44 UTC]
[07:23:44] - Reading file work/wuresults_05.dat from core
[07:23:44]   (Read 3792104 bytes from disk)
[07:23:44] Connecting to http://171.64.65.54:8080/
[07:23:51] Posted data.
[07:23:51] Initial: 0000; - Uploaded at ~463 kB/s
[07:23:52] - Averaged speed for that direction ~397 kB/s
[07:23:52] + Results successfully sent
[07:23:52] Thank you for your contribution to Folding@Home.
[07:23:52] + Number of Units Completed: 1045

[07:23:52] Trying to send all finished work units
[07:23:52] + No unsent completed units remaining.
[07:23:52] - Preparing to get new work unit...
[07:23:52] Cleaning up work directory
[07:23:52] + Attempting to get work packet
[07:23:52] Passkey found
[07:23:52] - Will indicate memory of 1500 MB
[07:23:52] - Connecting to assignment server
[07:23:52] Connecting to http://assign.stanford.edu:8080/
[07:23:53] Posted data.
[07:23:53] Initial: ED82; - Successful: assigned to (130.237.232.140).
[07:23:53] + News From Folding@Home: Welcome to Folding@Home
[07:23:53] Loaded queue successfully.
[07:23:53] Connecting to http://130.237.232.140:8080/
[07:23:57] Posted data.
[07:23:57] Initial: 0000; - Receiving payload (expected size: 979918)
[07:24:00] - Downloaded at ~318 kB/s
[07:24:00] - Averaged speed for that direction ~428 kB/s
[07:24:00] + Received work.
[07:24:00] Trying to send all finished work units
[07:24:00] + No unsent completed units remaining.
[07:24:00] + Closed connections
[07:24:00]
[07:24:00] + Processing work unit
[07:24:00] Core required: FahCore_a3.exe
[07:24:00] Core found.
[07:24:00] Working on queue slot 06 [June 11 07:24:00 UTC]
[07:24:00] + Working ...
[07:24:00] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 06 -np 4 -checkpoint 15 -verbose -lifeline 31368 -version 629'

[07:24:00]
[07:24:00] *------------------------------*
[07:24:00] Folding@Home Gromacs SMP Core
[07:24:00] Version 2.19 (March 6, 2010)
[07:24:00]
[07:24:00] Preparing to commence simulation
[07:24:00] - Looking at optimizations...
[07:24:00] - Created dyn
[07:24:00] - Files status OK
[07:24:01] - Expanded 979406 -> 10427873 (decompressed 1064.7 percent)
[07:24:01] Called DecompressByteArray: compressed_data_size=979406 data_size=10427873, decompressed_data_size=10427873 diff=0
[07:24:01] - Digital signature verified
[07:24:01]
[07:24:01] Project: 6013 (Run 0, Clone 56, Gen 136)
[07:24:01]
[07:24:01] Assembly optimizations on if available.
[07:24:01] Entering M.D.
[07:24:22] Completed 0 out of 250000 steps  (0%)
[08:10:51] CoreStatus = 89 (137)
[08:10:51] Client-core communications error: ERROR 0x89
[08:10:51] Deleting current work unit & continuing...
Blasphemous Cannibal
Posts: 27
Joined: Wed Oct 28, 2009 11:20 pm

Re: Project: 6013 (Run 0, Clone 56, Gen 136)

Post by Blasphemous Cannibal »

I was assigned this WU early this morning GMT. I binned it. I dont like any of the 'bigger' a3 WUs (6040 & 6041 also) in the first place never mind one as broken as this.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6013 (Run 0, Clone 56, Gen 136)

Post by bruce »

I've reported this as a bad WU.
Post Reply