Page 1 of 1

Project: 5756 (Run 4, Clone 247, Gen 431)

Posted: Fri Jul 24, 2009 1:11 am
by geokilla
Got the same WU 2 times in a row. Both errored out at 7%.

Code: Select all

[23:25:09] 
[23:25:09] *------------------------------*
[23:25:09] Folding@Home GPU Core - Beta
[23:25:09] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[23:25:09] 
[23:25:09] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[23:25:09] Build host: amoeba
[23:25:09] Board Type: Nvidia
[23:25:09] Core      : 
[23:25:09] Preparing to commence simulation
[23:25:09] - Looking at optimizations...
[23:25:09] - Created dyn
[23:25:09] - Files status OK
[23:25:09] - Expanded 98733 -> 492276 (decompressed 498.5 percent)
[23:25:09] Called DecompressByteArray: compressed_data_size=98733 data_size=492276, decompressed_data_size=492276 diff=0
[23:25:09] - Digital signature verified
[23:25:09] 
[23:25:09] Project: 5756 (Run 4, Clone 247, Gen 431)
[23:25:09] 
[23:25:09] Assembly optimizations on if available.
[23:25:09] Entering M.D.
[23:25:16] Working on Protein
[23:25:19] Client config found, loading data.
[23:25:19] Starting GUI Server
[23:28:36] Completed 1%
[23:32:03] Completed 2%
[23:35:37] Completed 3%
[23:39:13] Completed 4%
[23:42:47] Completed 5%
[23:46:24] Completed 6%
[23:49:27] Completed 7%
[23:49:29] mdrun_gpu returned 
[23:49:29] NANs detected on GPU
[23:49:29] 
[23:49:29] Folding@home Core Shutdown: UNSTABLE_MACHINE
[23:49:32] CoreStatus = 7A (122)
[23:49:32] Sending work to server
[23:49:32] Project: 5756 (Run 4, Clone 247, Gen 431)
[23:49:32] - Error: Could not get length of results file work/wuresults_09.dat
[23:49:32] - Error: Could not read unit 09 file. Removing from queue.
[23:49:32] Trying to send all finished work units
[23:49:32] + No unsent completed units remaining.
[23:49:32] - Preparing to get new work unit...
[23:49:32] + Attempting to get work packet
[23:49:32] - Will indicate memory of 2046 MB
[23:49:32] - Connecting to assignment server
[23:49:32] Connecting to http://assign-GPU.stanford.edu:8080/
[23:49:32] Posted data.
[23:49:32] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[23:49:32] + News From Folding@Home: Welcome to Folding@Home
[23:49:33] Loaded queue successfully.
[23:49:33] Connecting to http://171.67.108.11:8080/
[23:49:34] Posted data.
[23:49:34] Initial: 0000; - Receiving payload (expected size: 99245)
[23:49:34] Conversation time very short, giving reduced weight in bandwidth avg
[23:49:34] - Downloaded at ~193 kB/s
[23:49:34] - Averaged speed for that direction ~137 kB/s
[23:49:34] + Received work.
[23:49:34] Trying to send all finished work units
[23:49:34] + No unsent completed units remaining.
[23:49:34] + Closed connections
[23:49:39] 
[23:49:39] + Processing work unit
[23:49:39] Core required: FahCore_11.exe
[23:49:39] Core found.
[23:49:39] Working on queue slot 00 [July 23 23:49:39 UTC]
[23:49:39] + Working ...
[23:49:39] - Calling '.\FahCore_11.exe -dir work/ -suffix 00 -checkpoint 15 -verbose -lifeline 2592 -version 623'

[23:49:39] 
[23:49:39] *------------------------------*
[23:49:39] Folding@Home GPU Core - Beta
[23:49:39] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[23:49:39] 
[23:49:39] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[23:49:39] Build host: amoeba
[23:49:39] Board Type: Nvidia
[23:49:39] Core      : 
[23:49:39] Preparing to commence simulation
[23:49:39] - Looking at optimizations...
[23:49:39] - Created dyn
[23:49:39] - Files status OK
[23:49:39] - Expanded 98733 -> 492276 (decompressed 498.5 percent)
[23:49:39] Called DecompressByteArray: compressed_data_size=98733 data_size=492276, decompressed_data_size=492276 diff=0
[23:49:39] - Digital signature verified
[23:49:39] 
[23:49:39] Project: 5756 (Run 4, Clone 247, Gen 431)
[23:49:39] 
[23:49:39] Assembly optimizations on if available.
[23:49:39] Entering M.D.
[23:49:46] Working on Protein
[23:49:49] Client config found, loading data.
[23:49:49] Starting GUI Server
[23:53:24] Completed 1%
[23:57:00] Completed 2%
[00:00:36] Completed 3%
[00:04:05] Completed 4%
[00:07:41] Completed 5%
[00:11:18] Completed 6%
[00:14:51] Completed 7%
[00:14:53] mdrun_gpu returned 
[00:14:53] NANs detected on GPU
[00:14:53] 
[00:14:53] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:14:56] CoreStatus = 7A (122)
[00:14:56] Sending work to server
[00:14:56] Project: 5756 (Run 4, Clone 247, Gen 431)
[00:14:56] - Error: Could not get length of results file work/wuresults_00.dat
[00:14:56] - Error: Could not read unit 00 file. Removing from queue.
[00:14:56] Trying to send all finished work units
[00:14:56] + No unsent completed units remaining.
[00:14:56] - Preparing to get new work unit...
[00:14:56] + Attempting to get work packet
[00:14:56] - Will indicate memory of 2046 MB
[00:14:56] - Connecting to assignment server
[00:14:56] Connecting to http://assign-GPU.stanford.edu:8080/
[00:14:56] Posted data.
[00:14:56] Initial: 40AB; - Successful: assigned to (171.64.65.20).
[00:14:56] + News From Folding@Home: Welcome to Folding@Home
[00:14:57] Loaded queue successfully.
[00:14:57] Connecting to http://171.64.65.20:8080/
[00:14:58] Posted data.
[00:14:58] Initial: 0000; - Receiving payload (expected size: 69153)
[00:14:58] Conversation time very short, giving reduced weight in bandwidth avg
[00:14:58] - Downloaded at ~135 kB/s
[00:14:58] - Averaged speed for that direction ~137 kB/s
[00:14:58] + Received work.
[00:14:58] Trying to send all finished work units
[00:14:58] + No unsent completed units remaining.
[00:14:58] + Closed connections
[00:15:03] 

Re: Project: 5756 (Run 4, Clone 247, Gen 431)

Posted: Sun Aug 02, 2009 6:49 am
by noprob

Code: Select all

[18:10:41] Project: 5756 (Run 6, Clone 297, Gen 420)
[18:10:41] 
[18:10:41] Assembly optimizations on if available.
[18:10:41] Entering M.D.
[18:10:47] Tpr hash work/wudata_00.tpr:  81026764 978733818 153880425 2905441614 2920082832
[18:10:47] 
[18:10:47] Calling fah_main args: 14 usage=100
[18:10:47] 
[18:10:47] Working on Protein
[18:10:53] Client config found, loading data.
[18:10:53] Starting GUI Server
[18:10:53] mdrun_gpu returned 
[18:10:53] SHAKE violations on GPU
[18:10:53] 
[18:10:53] Folding@home Core Shutdown: UNSTABLE_MACHINE
There are bad WU's?

This also happened on a few other WU in this class using the same experimental core causing
EUE limit exceeded. Pausing 24 hours. (different error messages)

any way I had forgot to adjust this experimental core as suggested, I rebooted with the suggested settings and have had no more issues with this type or class of WU (crossing fingers)

spec's located in this post near the bottom