I appreciate your reply and have implemented the variable you suggested. However I can tell you both GPU clients picked up a 5786 and both EUEd. Luckily for me they both picked up new WUs they could fold.
I really doubt it is a power draw issue as the PSU is new and more than capable of supporting both GPUs. For reference it is a Corsair HX650. I have watched my voltages and they do not budge. It is a rock solid PSU. Also, since the WUs crash before they even start crunching overheating is not an issue.
If my WUs crashed mid-way through then I would accept a power or heat issue. Since the WU crashes at the start, the problem is elsewhere. Dozens of successful WUs done on the same system would suggest the problem isn't my system, unless you know of an issue with certain WUs, the GPU client and SBS2008 or certain versions of NVidia drivers.
Log from first GPU:
Code: Select all
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.23
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Users\pspain\FaH\GPU0
Executable: C:\Users\pspain\FaH\GPU0\gpu0.exe
Arguments: -gpu 0 -verbosity 9
[16:05:58] - Ask before connecting: No
[16:05:58] - User name: SiriusB_[OcUK] (Team 10)
[16:05:58] - User ID: C34613D10C4C717
[16:05:58] - Machine ID: 15
[16:05:58]
[16:05:58] Loaded queue successfully.
[16:05:58]
[16:05:58] + Processing work unit
[16:05:58] - Autosending finished units... [April 10 16:05:58 UTC]
[16:05:58] Core required: FahCore_11.exe
[16:05:58] Trying to send all finished work units
[16:05:58] Core found.
[16:05:58] + No unsent completed units remaining.
[16:05:58] Working on queue slot 07 [April 10 16:05:58 UTC]
[16:05:58] - Autosend completed
[16:05:58] + Working ...
[16:05:58] - Calling '.\FahCore_11.exe -dir work/ -suffix 07 -checkpoint 15 -ver
bose -lifeline 9604 -version 623'
[16:05:58]
[16:05:58] *------------------------------*
[16:05:58] Folding@Home GPU Core
[16:05:58] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[16:05:58]
[16:05:58] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14
.00.50727.762 for 80x86
[16:05:58] Build host: amoeba
[16:05:58] Board Type: Nvidia
[16:05:58] Core :
[16:05:58] Preparing to commence simulation
[16:05:58] - Looking at optimizations...
[16:05:58] - Files status OK
[16:05:58] - Expanded 64681 -> 341507 (decompressed 527.9 percent)
[16:05:58] Called DecompressByteArray: compressed_data_size=64681 data_size=3415
07, decompressed_data_size=341507 diff=0
[16:05:58] - Digital signature verified
[16:05:58]
[16:05:58] Project: 5786 (Run 5, Clone 10, Gen 80)
[16:05:58]
[16:05:58] Assembly optimizations on if available.
[16:05:58] Entering M.D.
[16:06:04] Tpr hash work/wudata_07.tpr: 1454933175 1927711105 1281420434 278658
3095 1540034925
[16:06:04]
[16:06:04] Calling fah_main args: 14 usage=90
[16:06:04]
[16:06:05] Working on GRoups of Organic Molecules in ACtion for Science
[16:06:06] Client config found, loading data.
[16:06:07] mdrun_gpu returned
[16:06:07] NANs detected on GPU
[16:06:07]
[16:06:07] Folding@home Core Shutdown: UNSTABLE_MACHINE
[16:06:10] CoreStatus = 7A (122)
[16:06:10] Sending work to server
[16:06:10] Project: 5786 (Run 5, Clone 10, Gen 80)
[16:06:10] - Read packet limit of 540015616... Set to 524286976.
[16:06:10] - Error: Could not get length of results file work/wuresults_07.dat
[16:06:10] - Error: Could not read unit 07 file. Removing from queue.
[16:06:10] Trying to send all finished work units
[16:06:10] + No unsent completed units remaining.
[16:06:10] - Preparing to get new work unit...
[16:06:10] + Attempting to get work packet
[16:06:10] - Will indicate memory of 4094 MB
[16:06:10] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 1
1
[16:06:10] - Connecting to assignment server
[16:06:10] Connecting to http://assign-GPU.stanford.edu:8080/
[16:06:11] Posted data.
[16:06:11] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[16:06:11] + News From Folding@Home: Welcome to Folding@Home
[16:06:11] Loaded queue successfully.
[16:06:11] Connecting to http://171.67.108.11:8080/
[16:06:12] Posted data.
[16:06:12] Initial: 0000; - Receiving payload (expected size: 47249)
[16:06:13] - Downloaded at ~46 kB/s
[16:06:13] - Averaged speed for that direction ~60 kB/s
[16:06:13] + Received work.
[16:06:13] Trying to send all finished work units
[16:06:13] + No unsent completed units remaining.
[16:06:13] + Closed connections
[16:06:18]
[16:06:18] + Processing work unit
[16:06:18] Core required: FahCore_11.exe
[16:06:18] Core found.
[16:06:18] Working on queue slot 08 [April 10 16:06:18 UTC]
[16:06:18] + Working ...
[16:06:18] - Calling '.\FahCore_11.exe -dir work/ -suffix 08 -checkpoint 15 -ver
bose -lifeline 9604 -version 623'
[16:06:18]
[16:06:18] *------------------------------*
[16:06:18] Folding@Home GPU Core
[16:06:18] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[16:06:18]
[16:06:18] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14
.00.50727.762 for 80x86
[16:06:18] Build host: amoeba
[16:06:18] Board Type: Nvidia
[16:06:18] Core :
[16:06:18] Preparing to commence simulation
[16:06:18] - Looking at optimizations...
[16:06:18] DeleteFrameFiles: successfully deleted file=work/wudata_08.ckp
[16:06:18] - Created dyn
[16:06:18] - Files status OK
[16:06:18] - Expanded 46737 -> 252912 (decompressed 541.1 percent)
[16:06:18] Called DecompressByteArray: compressed_data_size=46737 data_size=2529
12, decompressed_data_size=252912 diff=0
[16:06:18] - Digital signature verified
[16:06:18]
[16:06:18] Project: 5765 (Run 12, Clone 244, Gen 105)
[16:06:18]
[16:06:18] Assembly optimizations on if available.
[16:06:18] Entering M.D.
[16:06:24] Tpr hash work/wudata_08.tpr: 4190163122 3436051368 3726458250 223932
2703 3824773853
[16:06:24]
[16:06:24] Calling fah_main args: 14 usage=90
[16:06:24]
[16:06:25] Working on Protein
[16:06:25] Client config found, loading data.
[16:06:25] Starting GUI Server
[16:07:03] Completed 1%
[16:07:41] Completed 2%
[16:08:19] Completed 3%
[16:08:56] Completed 4%
[16:09:34] Completed 5%
[16:10:12] Completed 6%
[16:10:50] Completed 7%
[16:11:28] Completed 8%
[16:12:06] Completed 9%
[16:12:44] Completed 10%
[16:13:27] Completed 11%
[16:14:05] Completed 12%
[16:14:43] Completed 13%
[16:15:28] Completed 14%
Log from second GPU:
Code: Select all
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.23
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Users\pspain\FaH\GPU1
Executable: C:\Users\pspain\FaH\GPU1\gpu1.exe
Arguments: -gpu 1 -verbosity 9
[16:06:29] - Ask before connecting: No
[16:06:29] - User name: SiriusB_[OcUK] (Team 10)
[16:06:29] - User ID: C34613D10C4C717
[16:06:29] - Machine ID: 11
[16:06:29]
[16:06:29] Loaded queue successfully.
[16:06:29]
[16:06:29] + Processing work unit
[16:06:29] Core required: FahCore_11.exe
[16:06:29] - Autosending finished units... [April 10 16:06:29 UTC]
[16:06:29] Core found.
[16:06:29] Trying to send all finished work units
[16:06:29] Working on queue slot 01 [April 10 16:06:29 UTC]
[16:06:29] + Working ...
[16:06:29] + No unsent completed units remaining.
[16:06:29] - Calling '.\FahCore_11.exe -dir work/ -suffix 01 -checkpoint 15 -ver
bose -lifeline 8564 -version 623'
[16:06:29] - Autosend completed
[16:06:29]
[16:06:29] *------------------------------*
[16:06:29] Folding@Home GPU Core
[16:06:29] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[16:06:29]
[16:06:29] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14
.00.50727.762 for 80x86
[16:06:29] Build host: amoeba
[16:06:29] Board Type: Nvidia
[16:06:29] Core :
[16:06:29] Preparing to commence simulation
[16:06:29] - Looking at optimizations...
[16:06:29] - Files status OK
[16:06:29] - Expanded 64622 -> 341507 (decompressed 528.4 percent)
[16:06:29] Called DecompressByteArray: compressed_data_size=64622 data_size=3415
07, decompressed_data_size=341507 diff=0
[16:06:29] - Digital signature verified
[16:06:29]
[16:06:29] Project: 5786 (Run 6, Clone 0, Gen 39)
[16:06:29]
[16:06:29] Assembly optimizations on if available.
[16:06:29] Entering M.D.
[16:06:35] Tpr hash work/wudata_01.tpr: 2390296719 949216375 630254473 32763587
84 135935992
[16:06:35]
[16:06:36] Calling fah_main args: 14 usage=90
[16:06:36]
[16:06:36] Working on GRoups of Organic Molecules in ACtion for Science
[16:06:38] Client config found, loading data.
[16:06:38] Starting GUI Server
[16:06:38] mdrun_gpu returned
[16:06:38] NANs detected on GPU
[16:06:38]
[16:06:38] Folding@home Core Shutdown: UNSTABLE_MACHINE
[16:06:42] CoreStatus = 7A (122)
[16:06:42] Sending work to server
[16:06:42] Project: 5786 (Run 6, Clone 0, Gen 39)
[16:06:42] - Error: Could not get length of results file work/wuresults_01.dat
[16:06:42] - Error: Could not read unit 01 file. Removing from queue.
[16:06:42] Trying to send all finished work units
[16:06:42] + No unsent completed units remaining.
[16:06:42] - Preparing to get new work unit...
[16:06:42] + Attempting to get work packet
[16:06:42] - Will indicate memory of 4094 MB
[16:06:42] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 1
1
[16:06:42] - Connecting to assignment server
[16:06:42] Connecting to http://assign-GPU.stanford.edu:8080/
[16:06:43] Posted data.
[16:06:43] Initial: 40AB; - Successful: assigned to (171.64.65.61).
[16:06:43] + News From Folding@Home: Welcome to Folding@Home
[16:06:43] Loaded queue successfully.
[16:06:43] Connecting to http://171.64.65.61:8080/
[16:06:44] Posted data.
[16:06:44] Initial: 0000; - Receiving payload (expected size: 74336)
[16:06:45] - Downloaded at ~72 kB/s
[16:06:45] - Averaged speed for that direction ~68 kB/s
[16:06:45] + Received work.
[16:06:45] Trying to send all finished work units
[16:06:45] + No unsent completed units remaining.
[16:06:45] + Closed connections
[16:06:50]
[16:06:50] + Processing work unit
[16:06:50] Core required: FahCore_11.exe
[16:06:50] Core found.
[16:06:50] Working on queue slot 02 [April 10 16:06:50 UTC]
[16:06:50] + Working ...
[16:06:50] - Calling '.\FahCore_11.exe -dir work/ -suffix 02 -checkpoint 15 -ver
bose -lifeline 8564 -version 623'
[16:06:52]
[16:06:52] *------------------------------*
[16:06:52] Folding@Home GPU Core
[16:06:52] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[16:06:52]
[16:06:52] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14
.00.50727.762 for 80x86
[16:06:52] Build host: amoeba
[16:06:52] Board Type: Nvidia
[16:06:52] Core :
[16:06:52] Preparing to commence simulation
[16:06:52] - Looking at optimizations...
[16:06:52] DeleteFrameFiles: successfully deleted file=work/wudata_02.ckp
[16:06:52] - Created dyn
[16:06:52] - Files status OK
[16:06:52] - Expanded 73824 -> 383588 (decompressed 519.5 percent)
[16:06:52] Called DecompressByteArray: compressed_data_size=73824 data_size=3835
88, decompressed_data_size=383588 diff=0
[16:06:52] - Digital signature verified
[16:06:52]
[16:06:52] Project: 6601 (Run 8, Clone 838, Gen 6)
[16:06:52]
[16:06:52] Assembly optimizations on if available.
[16:06:52] Entering M.D.
[16:06:58] Tpr hash work/wudata_02.tpr: 1573423998 3212315442 3651801022 318698
4192 669694147
[16:06:58]
[16:06:58] Calling fah_main args: 14 usage=90
[16:06:59]
[16:06:59] Working on Protein
[16:07:01] Client config found, loading data.
[16:07:01] Starting GUI Server
[16:07:58] Completed 1%
[16:08:54] Completed 2%
[16:09:52] Completed 3%
[16:10:50] Completed 4%
[16:11:48] Completed 5%
[16:12:45] Completed 6%
[16:13:43] Completed 7%
[16:14:40] Completed 8%
[16:15:38] Completed 9%
[16:16:35] Completed 10%
[16:17:33] Completed 11%
Both the clients were restarted several hours after they stopped due to EUEs so zero possibility they were already hot. Also, you can see from the first log my GPU can crunch 5785 just fine. Are 5784 and 5786 highly optimised and 5785 isn't?