Project: 5013 (Run 2, Clone 825, Gen 2)

Moderators: Site Moderators, FAHC Science Team

Post Reply
^w^ing
Posts: 136
Joined: Fri Mar 07, 2008 7:29 pm
Hardware configuration: C2D E6400 2.13 GHz @ 3.2 GHz
Asus EN8800GTS 640 (G80) @ 660/792/1700 running the 6.23 w/ core11 v1.19
forceware 260.89
Asus P5N-E SLi
2GB 800MHz DDRII (2xCorsair TwinX 512MB)
WinXP 32 SP3
Location: Prague

Project: 5013 (Run 2, Clone 825, Gen 2)

Post by ^w^ing »

ok so I have seen a few reports of this in here but afaik this is the first time it occured to me. I didnt see any competent answer to whether it is fine for the WU to start in the middle or not. I included the previous two WUs in the log so that you can see there was not any problem before this.

Code: Select all

[05:32:46] Trying to send all finished work units
[05:32:46] + No unsent completed units remaining.
[05:32:46] + Closed connections
[05:32:46] 
[05:32:46] + Processing work unit
[05:32:46] Core required: FahCore_11.exe
[05:32:46] Core found.
[05:32:46] Working on queue slot 02 [January 5 05:32:46 UTC]
[05:32:46] + Working ...
[05:32:46] - Calling '.\FahCore_11.exe -dir work/ -suffix 02 -priority 96 -checkpoint 30 -forceasm -verbose -lifeline 2744 -version 623'

[05:32:46] 
[05:32:46] *------------------------------*
[05:32:46] Folding@Home GPU Core - Beta
[05:32:46] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[05:32:46] 
[05:32:46] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[05:32:46] Build host: amoeba
[05:32:46] Board Type: Nvidia
[05:32:46] Core      : 
[05:32:46] Preparing to commence simulation
[05:32:46] - Assembly optimizations manually forced on.
[05:32:46] - Not checking prior termination.
[05:32:46] - Expanded 44195 -> 244433 (decompressed 553.0 percent)
[05:32:46] Called DecompressByteArray: compressed_data_size=44195 data_size=244433, decompressed_data_size=244433 diff=0
[05:32:46] - Digital signature verified
[05:32:46] 
[05:32:46] Project: 5016 (Run 6, Clone 638, Gen 11)
[05:32:46] 
[05:32:46] Assembly optimizations on if available.
[05:32:46] Entering M.D.
[05:32:53] Working on 576 p5005_supervillin_e1
[05:32:53] Client config found, loading data.
[05:32:53] Starting GUI Server
[05:34:17] Completed 1%
[05:35:41] Completed 2%
[05:37:05] Completed 3%
[05:38:29] Completed 4%
[05:39:53] Completed 5%
[05:41:17] Completed 6%
[05:42:41] Completed 7%
[05:44:05] Completed 8%
[05:45:29] Completed 9%
[05:46:53] Completed 10%
[05:48:17] Completed 11%
[05:49:41] Completed 12%
[05:51:05] Completed 13%
[05:52:29] Completed 14%
[05:53:53] Completed 15%
[05:55:17] Completed 16%
[05:56:41] Completed 17%
[05:58:05] Completed 18%
[05:59:29] Completed 19%
[06:00:53] Completed 20%
[06:02:17] Completed 21%
[06:03:41] Completed 22%
[06:05:05] Completed 23%
[06:06:29] Completed 24%
[06:07:53] Completed 25%
[06:09:17] Completed 26%
[06:10:41] Completed 27%
[06:12:05] Completed 28%
[06:13:29] Completed 29%
[06:14:53] Completed 30%
[06:16:17] Completed 31%
[06:17:41] Completed 32%
[06:19:05] Completed 33%
[06:20:29] Completed 34%
[06:21:53] Completed 35%
[06:23:17] Completed 36%
[06:24:41] Completed 37%
[06:26:05] Completed 38%
[06:27:29] Completed 39%
[06:28:53] Completed 40%
[06:30:17] Completed 41%
[06:31:41] Completed 42%
[06:33:05] Completed 43%
[06:34:29] Completed 44%
[06:35:53] Completed 45%
[06:37:17] Completed 46%
[06:38:41] Completed 47%
[06:40:05] Completed 48%
[06:41:29] Completed 49%
[06:42:53] Completed 50%
[06:44:17] Completed 51%
[06:45:41] Completed 52%
[06:47:05] Completed 53%
[06:48:29] Completed 54%
[06:49:53] Completed 55%
[06:51:17] Completed 56%
[06:52:41] Completed 57%
[06:54:05] Completed 58%
[06:55:29] Completed 59%
[06:56:53] Completed 60%
[06:58:17] Completed 61%
[06:59:41] Completed 62%
[07:01:05] Completed 63%
[07:02:29] Completed 64%
[07:03:53] Completed 65%
[07:05:18] Completed 66%
[07:06:42] Completed 67%
[07:08:06] Completed 68%
[07:09:30] Completed 69%
[07:10:54] Completed 70%
[07:12:17] Completed 71%
[07:13:41] Completed 72%
[07:15:05] Completed 73%
[07:16:29] Completed 74%
[07:17:53] Completed 75%
[07:19:17] Completed 76%
[07:20:41] Completed 77%
[07:22:05] Completed 78%
[07:23:29] Completed 79%
[07:24:53] Completed 80%
[07:26:17] Completed 81%
[07:27:42] Completed 82%
[07:29:06] Completed 83%
[07:30:30] Completed 84%
[07:31:56] Completed 85%
[07:33:22] Completed 86%
[07:34:49] Completed 87%
[07:36:13] Completed 88%
[07:37:37] Completed 89%
[07:39:01] Completed 90%
[07:40:25] Completed 91%
[07:41:50] Completed 92%
[07:43:15] Completed 93%
[07:44:41] Completed 94%
[07:46:08] Completed 95%
[07:47:35] Completed 96%
[07:49:01] Completed 97%
[07:50:27] Completed 98%
[07:51:54] Completed 99%
[07:53:20] Completed 100%
[07:53:20] Successful run
[07:53:20] DynamicWrapper: Finished Work Unit: sleep=10000
[07:53:30] Reserved 1126276 bytes for xtc file; Cosm status=0
[07:53:30] Allocated 1126276 bytes for xtc file
[07:53:30] - Reading up to 1126276 from "work/wudata_02.xtc": Read 1126276
[07:53:30] Read 1126276 bytes from xtc file; available packet space=785304188
[07:53:30] xtc file hash check passed.
[07:53:30] Reserved 34800 34800 785304188 bytes for arc file=<work/wudata_02.trr> Cosm status=0
[07:53:30] Allocated 34800 bytes for arc file
[07:53:30] - Reading up to 34800 from "work/wudata_02.trr": Read 34800
[07:53:30] Read 34800 bytes from arc file; available packet space=785269388
[07:53:30] trr file hash check passed.
[07:53:30] Allocated 560 bytes for edr file
[07:53:30] Read bedfile
[07:53:30] edr file hash check passed.
[07:53:30] Allocated 129960 bytes for logfile
[07:53:30] Read logfile
[07:53:30] GuardedRun: success in DynamicWrapper
[07:53:30] GuardedRun: done
[07:53:30] Run: GuardedRun completed.
[07:53:31] - Writing 1292108 bytes of core data to disk...
[07:53:32] Done: 1291596 -> 1154070 (compressed to 89.3 percent)
[07:53:32]   ... Done.
[07:53:32] - Shutting down core 
[07:53:32] 
[07:53:32] Folding@home Core Shutdown: FINISHED_UNIT
[07:53:35] CoreStatus = 64 (100)
[07:53:35] Unit 2 finished with 97 percent of time to deadline remaining.
[07:53:35] Updated performance fraction: 0.966790
[07:53:35] Sending work to server
[07:53:35] Project: 5016 (Run 6, Clone 638, Gen 11)
[07:53:35] - Read packet limit of 540015616... Set to 524286976.


[07:53:35] + Attempting to send results [January 5 07:53:35 UTC]
[07:53:35] - Reading file work/wuresults_02.dat from core
[07:53:35]   (Read 1154582 bytes from disk)
[07:53:35] Connecting to http://171.64.65.20:8080/
[07:53:40] Posted data.
[07:53:40] Initial: 0000; - Uploaded at ~225 kB/s
[07:53:40] - Averaged speed for that direction ~213 kB/s
[07:53:40] + Results successfully sent
[07:53:40] Thank you for your contribution to Folding@Home.
[07:53:40] + Number of Units Completed: 1676

[07:53:44] Trying to send all finished work units
[07:53:44] + No unsent completed units remaining.
[07:53:44] - Preparing to get new work unit...
[07:53:44] + Attempting to get work packet
[07:53:44] - Will indicate memory of 2046 MB
[07:53:44] - Connecting to assignment server
[07:53:44] Connecting to http://assign-GPU.stanford.edu:8080/
[07:53:45] Posted data.
[07:53:45] Initial: 40AB; - Successful: assigned to (171.64.65.20).
[07:53:45] + News From Folding@Home: GPU folding beta
[07:53:45] Loaded queue successfully.
[07:53:45] Connecting to http://171.64.65.20:8080/
[07:53:46] Posted data.
[07:53:46] Initial: 0000; - Receiving payload (expected size: 44538)
[07:53:47] - Downloaded at ~43 kB/s
[07:53:47] - Averaged speed for that direction ~51 kB/s
[07:53:47] + Received work.
[07:53:47] Trying to send all finished work units
[07:53:47] + No unsent completed units remaining.
[07:53:47] + Closed connections
[07:53:47] 
[07:53:47] + Processing work unit
[07:53:47] Core required: FahCore_11.exe
[07:53:47] Core found.
[07:53:47] Working on queue slot 03 [January 5 07:53:47 UTC]
[07:53:47] + Working ...
[07:53:47] - Calling '.\FahCore_11.exe -dir work/ -suffix 03 -priority 96 -checkpoint 30 -forceasm -verbose -lifeline 2744 -version 623'

[07:53:47] 
[07:53:47] *------------------------------*
[07:53:47] Folding@Home GPU Core - Beta
[07:53:47] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[07:53:47] 
[07:53:47] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[07:53:47] Build host: amoeba
[07:53:47] Board Type: Nvidia
[07:53:47] Core      : 
[07:53:47] Preparing to commence simulation
[07:53:47] - Assembly optimizations manually forced on.
[07:53:47] - Not checking prior termination.
[07:53:47] - Expanded 44026 -> 244433 (decompressed 555.2 percent)
[07:53:47] Called DecompressByteArray: compressed_data_size=44026 data_size=244433, decompressed_data_size=244433 diff=0
[07:53:47] - Digital signature verified
[07:53:47] 
[07:53:47] Project: 5015 (Run 1, Clone 702, Gen 24)
[07:53:47] 
[07:53:47] Assembly optimizations on if available.
[07:53:47] Entering M.D.
[07:53:53] Working on 576 p5005_supervillin_e1
[07:53:54] Client config found, loading data.
[07:53:54] Starting GUI Server
[07:55:20] Completed 1%
[07:56:47] Completed 2%
[07:58:14] Completed 3%
[07:59:40] Completed 4%
[08:01:07] Completed 5%
[08:02:34] Completed 6%
[08:04:00] Completed 7%
[08:05:24] Completed 8%
[08:06:51] Completed 9%
[08:08:17] Completed 10%
[08:09:43] Completed 11%
[08:11:10] Completed 12%
[08:12:37] Completed 13%
[08:14:03] Completed 14%
[08:15:30] Completed 15%
[08:16:56] Completed 16%
[08:18:23] Completed 17%
[08:19:50] Completed 18%
[08:21:17] Completed 19%
[08:22:43] Completed 20%
[08:24:10] Completed 21%
[08:25:37] Completed 22%
[08:26:42] - Autosending finished units... [January 5 08:26:42 UTC]
[08:26:42] Trying to send all finished work units
[08:26:42] + No unsent completed units remaining.
[08:26:42] - Autosend completed
[08:27:04] Completed 23%
[08:28:30] Completed 24%
[08:29:57] Completed 25%
[08:31:24] Completed 26%
[08:32:50] Completed 27%
[08:34:17] Completed 28%
[08:35:43] Completed 29%
[08:37:10] Completed 30%
[08:38:37] Completed 31%
[08:40:04] Completed 32%
[08:41:30] Completed 33%
[08:42:57] Completed 34%
[08:44:23] Completed 35%
[08:45:50] Completed 36%
[08:47:16] Completed 37%
[08:48:43] Completed 38%
[08:50:09] Completed 39%
[08:51:36] Completed 40%
[08:53:03] Completed 41%
[08:54:29] Completed 42%
[08:55:56] Completed 43%
[08:57:22] Completed 44%
[08:58:49] Completed 45%
[09:00:16] Completed 46%
[09:01:43] Completed 47%
[09:03:09] Completed 48%
[09:04:36] Completed 49%
[09:06:01] Completed 50%
[09:07:27] Completed 51%
[09:08:54] Completed 52%
[09:10:22] Completed 53%
[09:11:49] Completed 54%
[09:13:17] Completed 55%
[09:14:44] Completed 56%
[09:16:10] Completed 57%
[09:17:36] Completed 58%
[09:19:04] Completed 59%
[09:20:31] Completed 60%
[09:21:57] Completed 61%
[09:23:21] Completed 62%
[09:24:49] Completed 63%
[09:26:16] Completed 64%
[09:27:44] Completed 65%
[09:29:11] Completed 66%
[09:30:38] Completed 67%
[09:32:05] Completed 68%
[09:33:33] Completed 69%
[09:35:00] Completed 70%
[09:36:27] Completed 71%
[09:37:54] Completed 72%
[09:39:21] Completed 73%
[09:40:49] Completed 74%
[09:42:15] Completed 75%
[09:43:42] Completed 76%
[09:45:08] Completed 77%
[09:46:35] Completed 78%
[09:48:02] Completed 79%
[09:49:30] Completed 80%
[09:50:57] Completed 81%
[09:52:23] Completed 82%
[09:53:48] Completed 83%
[09:55:15] Completed 84%
[09:56:40] Completed 85%
[09:58:05] Completed 86%
[09:59:31] Completed 87%
[10:00:57] Completed 88%
[10:02:22] Completed 89%
[10:03:48] Completed 90%
[10:05:15] Completed 91%
[10:06:42] Completed 92%
[10:08:09] Completed 93%
[10:09:36] Completed 94%
[10:11:03] Completed 95%
[10:12:30] Completed 96%
[10:13:57] Completed 97%
[10:15:24] Completed 98%
[10:16:51] Completed 99%
[10:18:18] Completed 100%
[10:18:18] Successful run
[10:18:18] DynamicWrapper: Finished Work Unit: sleep=10000
[10:18:28] Reserved 1128084 bytes for xtc file; Cosm status=0
[10:18:28] Allocated 1128084 bytes for xtc file
[10:18:28] - Reading up to 1128084 from "work/wudata_03.xtc": Read 1128084
[10:18:28] Read 1128084 bytes from xtc file; available packet space=785302380
[10:18:28] xtc file hash check passed.
[10:18:28] Reserved 34800 34800 785302380 bytes for arc file=<work/wudata_03.trr> Cosm status=0
[10:18:28] Allocated 34800 bytes for arc file
[10:18:28] - Reading up to 34800 from "work/wudata_03.trr": Read 34800
[10:18:28] Read 34800 bytes from arc file; available packet space=785267580
[10:18:28] trr file hash check passed.
[10:18:28] Allocated 560 bytes for edr file
[10:18:28] Read bedfile
[10:18:28] edr file hash check passed.
[10:18:28] Allocated 130364 bytes for logfile
[10:18:28] Read logfile
[10:18:28] GuardedRun: success in DynamicWrapper
[10:18:28] GuardedRun: done
[10:18:28] Run: GuardedRun completed.
[10:18:33] - Writing 1294320 bytes of core data to disk...
[10:18:33] Done: 1293808 -> 1155750 (compressed to 89.3 percent)
[10:18:33]   ... Done.
[10:18:33] - Shutting down core 
[10:18:33] 
[10:18:33] Folding@home Core Shutdown: FINISHED_UNIT
[10:18:35] CoreStatus = 64 (100)
[10:18:35] Unit 3 finished with 97 percent of time to deadline remaining.
[10:18:35] Updated performance fraction: 0.966728
[10:18:35] Sending work to server
[10:18:35] Project: 5015 (Run 1, Clone 702, Gen 24)
[10:18:35] - Read packet limit of 540015616... Set to 524286976.


[10:18:35] + Attempting to send results [January 5 10:18:35 UTC]
[10:18:35] - Reading file work/wuresults_03.dat from core
[10:18:35]   (Read 1156262 bytes from disk)
[10:18:35] Connecting to http://171.64.65.20:8080/
[10:18:40] Posted data.
[10:18:40] Initial: 0000; - Uploaded at ~188 kB/s
[10:18:41] - Averaged speed for that direction ~208 kB/s
[10:18:41] + Results successfully sent
[10:18:41] Thank you for your contribution to Folding@Home.
[10:18:41] + Number of Units Completed: 1677

[10:18:45] Trying to send all finished work units
[10:18:45] + No unsent completed units remaining.
[10:18:45] - Preparing to get new work unit...
[10:18:45] + Attempting to get work packet
[10:18:45] - Will indicate memory of 2046 MB
[10:18:45] - Connecting to assignment server
[10:18:45] Connecting to http://assign-GPU.stanford.edu:8080/
[10:18:46] Posted data.
[10:18:46] Initial: 40AB; - Successful: assigned to (171.64.65.20).
[10:18:46] + News From Folding@Home: GPU folding beta
[10:18:46] Loaded queue successfully.
[10:18:46] Connecting to http://171.64.65.20:8080/
[10:18:46] Posted data.
[10:18:46] Initial: 0000; - Receiving payload (expected size: 44493)
[10:18:47] - Downloaded at ~43 kB/s
[10:18:47] - Averaged speed for that direction ~50 kB/s
[10:18:47] + Received work.
[10:18:47] Trying to send all finished work units
[10:18:47] + No unsent completed units remaining.
[10:18:47] + Closed connections
[10:18:47] 
[10:18:47] + Processing work unit
[10:18:47] Core required: FahCore_11.exe
[10:18:47] Core found.
[10:18:47] Working on queue slot 04 [January 5 10:18:47 UTC]
[10:18:47] + Working ...
[10:18:47] - Calling '.\FahCore_11.exe -dir work/ -suffix 04 -priority 96 -checkpoint 30 -forceasm -verbose -lifeline 2744 -version 623'

[10:18:47] 
[10:18:47] *------------------------------*
[10:18:47] Folding@Home GPU Core - Beta
[10:18:47] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[10:18:47] 
[10:18:47] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[10:18:47] Build host: amoeba
[10:18:47] Board Type: Nvidia
[10:18:47] Core      : 
[10:18:47] Preparing to commence simulation
[10:18:47] - Assembly optimizations manually forced on.
[10:18:47] - Not checking prior termination.
[10:18:47] - Expanded 43981 -> 244433 (decompressed 555.7 percent)
[10:18:47] Called DecompressByteArray: compressed_data_size=43981 data_size=244433, decompressed_data_size=244433 diff=0
[10:18:47] - Digital signature verified
[10:18:47] 
[10:18:47] Project: 5013 (Run 2, Clone 825, Gen 2)
[10:18:47] 
[10:18:47] Assembly optimizations on if available.
[10:18:47] Entering M.D.
[10:18:53] Will resume from checkpoint file
[10:19:02] Working on 576 p5005_supervillin_e1
[10:19:02] Client config found, loading data.
[10:19:02] Starting GUI Server
[10:19:02] Resuming from checkpoint
[10:19:02] Verified work/wudata_04.log
[10:19:02] Verified work/wudata_04.edr
[10:19:02] Verified work/wudata_04.trr
[10:19:02] Verified work/wudata_04.xtc
[10:19:02] Completed 63%
[10:20:20] Completed 64%
[10:21:45] Completed 65%
[10:23:11] Completed 66%
[10:24:37] Completed 67%
But I have to say one more thing, I backed the unit up, deleted all the work files except for the wudata_04.dat, and when I restarted the client, the WU started from 0 percent as it should in the first place imo.
So whats the deal with that?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 5013 (Run 2, Clone 825, Gen 2)

Post by bruce »

The current theory is that if certain types of EUEs happen, the checkpoint data is not erased from the work folder, and 10 WUs later, a new WU may mistakenly use a checkpoint that really belongs to some other WU. This is clearly not good.

If the theory is correct, you can manually prevent this from happening by looking in the work folder. Almost all file names contain a sequence number *_00* through *_09* You should never delete a WURESULTS_0* file and you should never delete files with the sequence number of the active WU, but those with the other 9 values can be discarded. (e.g. - Look for the line Working on queue slot 08 and avoid deleting *_08*).

The file names that you see will vary depending on which core has been run, and most files will be managed correctly, but in my case, I see the following. The green files can safely be deleted but not the red ones, since my active WU is in queue slot 09. In this case, there are no left-over checkpoint files so discarding the green files is also unnecessary.

current.xyz
wudata_09.dat
core82.sta
wudata_00.eng
wudata_01.eng
wudata_02.eng
wudata_03.eng
wudata_04.eng
wudata_05.eng
wudata_06.eng

logfile_09.txt
wudata_07.eng
wudata_09CP.arc
wuinfo_09.dat
wudata_09.inc
wudata_09.top
wudata_09.inp

wudata_08.eng
wudata_09.eng
wudata_09.out
wudata_09.trj
wudata_09.nfo


Now that the holidays are over, there's a pretty good chance that a fix for this problem will be forthcoming soon so the manual correction will be unnecessary.
^w^ing
Posts: 136
Joined: Fri Mar 07, 2008 7:29 pm
Hardware configuration: C2D E6400 2.13 GHz @ 3.2 GHz
Asus EN8800GTS 640 (G80) @ 660/792/1700 running the 6.23 w/ core11 v1.19
forceware 260.89
Asus P5N-E SLi
2GB 800MHz DDRII (2xCorsair TwinX 512MB)
WinXP 32 SP3
Location: Prague

Re: Project: 5013 (Run 2, Clone 825, Gen 2)

Post by ^w^ing »

Thanks for the answer, bruce.
Yes, I actually do prune the work folder like this from time to time, but with GPU, as it has much faster WU turnaroud than other clients, the queue usually do a few rounds before I do that :)
That sounds reasonable with the checkpoint files from the previous WU in the same slot. I tried to look what happened to the previous WU in slot 04 before this one got there, but unfortunately my fahlog-prev starts with WU in slot 05. :)
I remember that I do get occasional EUE after which the client prints out "couldnt get lenght of the file", i think it prints out for the wuresults and wudata.log files, but cant say for sure. Everytime this happens, as the client didnt send back anything about that WU, I get assigned the same WU that EUEd (but in +1 slot) and it always finishes it on the second run.
Post Reply