Page 1 of 1

Project: 4746 (Run 0, Clone 55, Gen 49) multiple NaNs at 43%

Posted: Mon Nov 24, 2008 3:37 am
by P5-133XL
4746 (0,55,49) is repeatedly NaN'ing at 43% and not sending to the server ...

Code: Select all

[11:48:20] Thank you for your contribution to Folding@Home.
[11:48:20] + Number of Units Completed: 166

[11:48:24] Trying to send all finished work units
[11:48:24] + No unsent completed units remaining.
[11:48:24] - Preparing to get new work unit...
[11:48:24] + Attempting to get work packet
[11:48:24] - Will indicate memory of 2046 MB
[11:48:24] - Connecting to assignment server
[11:48:24] Connecting to http://assign-GPU.stanford.edu:8080/
[11:48:25] Posted data.
[11:48:25] Initial: 40AB; - Successful: assigned to (171.64.65.103).
[11:48:25] + News From Folding@Home: GPU folding beta
[11:48:25] Loaded queue successfully.
[11:48:25] Connecting to http://171.64.65.103:8080/
[11:48:26] Posted data.
[11:48:26] Initial: 0000; - Receiving payload (expected size: 88784)
[11:48:26] Conversation time very short, giving reduced weight in bandwidth avg
[11:48:26] - Downloaded at ~173 kB/s
[11:48:26] - Averaged speed for that direction ~131 kB/s
[11:48:26] + Received work.
[11:48:26] Trying to send all finished work units
[11:48:26] + No unsent completed units remaining.
[11:48:26] + Closed connections
[11:48:26] 
[11:48:26] + Processing work unit
[11:48:26] Core required: FahCore_11.exe
[11:48:26] Core found.
[11:48:26] Working on queue slot 02 [November 23 11:48:26 UTC]
[11:48:26] + Working ...
[11:48:26] - Calling '.\FahCore_11.exe -dir work/ -suffix 02 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 3776 -version 620'

[11:48:26] 
[11:48:26] *------------------------------*
[11:48:26] Folding@Home GPU Core - Beta
[11:48:26] Version 1.18 (Mon Oct 13 11:11:30 PDT 2008)
[11:48:26] 
[11:48:26] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[11:48:26] Build host: amoeba
[11:48:26] Board Type: AMD
[11:48:26] Core      : 
[11:48:26] Preparing to commence simulation
[11:48:26] - Looking at optimizations...
[11:48:26] - Created dyn
[11:48:26] - Files status OK
[11:48:26] - Expanded 88272 -> 447304 (decompressed 506.7 percent)
[11:48:26] Called DecompressByteArray: compressed_data_size=88272 data_size=447304, decompressed_data_size=447304 diff=0
[11:48:26] - Digital signature verified
[11:48:26] 
[11:48:26] Project: 4746 (Run 0, Clone 55, Gen 49)
[11:48:26] 
[11:48:26] Assembly optimizations on if available.
[11:48:26] Entering M.D.
[11:48:33] Working on p4746_lam5w_300K
[11:48:33] Client config found, loading data.
[11:48:33] Starting GUI Server
[11:52:25] Completed 1%
[11:56:14] Completed 2%
[12:00:04] Completed 3%

...

[14:25:12] Completed 41%
[14:29:01] Completed 42%
[14:32:49] Completed 43%
[14:32:50] mdrun_gpu returned 
[14:32:50] NANs detected on GPU
[14:32:50] 
[14:32:50] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:32:52] CoreStatus = 7A (122)
[14:32:52] Sending work to server
[14:32:52] Project: 4746 (Run 0, Clone 55, Gen 49)
[14:32:52] - Read packet limit of 540015616... Set to 524286976.
[14:32:52] - Error: Could not get length of results file work/wuresults_02.dat
[14:32:52] - Error: Could not read unit 02 file. Removing from queue.
[14:32:52] Trying to send all finished work units
[14:32:52] + No unsent completed units remaining.
[14:32:52] - Preparing to get new work unit...
[14:32:52] + Attempting to get work packet
[14:32:52] - Will indicate memory of 2046 MB
[14:32:52] - Connecting to assignment server
[14:32:52] Connecting to http://assign-GPU.stanford.edu:8080/
[14:32:54] Posted data.
[14:32:54] Initial: 40AB; - Successful: assigned to (171.64.65.103).
[14:32:54] + News From Folding@Home: GPU folding beta
[14:32:54] Loaded queue successfully.
[14:32:54] Connecting to http://171.64.65.103:8080/
[14:32:54] Posted data.
[14:32:54] Initial: 0000; - Receiving payload (expected size: 88784)
[14:32:54] Conversation time very short, giving reduced weight in bandwidth avg
[14:32:54] - Downloaded at ~173 kB/s
[14:32:54] - Averaged speed for that direction ~135 kB/s
[14:32:54] + Received work.
[14:32:54] Trying to send all finished work units
[14:32:54] + No unsent completed units remaining.
[14:32:54] + Closed connections
[14:32:59] 
[14:32:59] + Processing work unit
[14:32:59] Core required: FahCore_11.exe
[14:32:59] Core found.
[14:32:59] Working on queue slot 03 [November 23 14:32:59 UTC]
[14:32:59] + Working ...
[14:32:59] - Calling '.\FahCore_11.exe -dir work/ -suffix 03 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 3776 -version 620'

[14:33:00] 
[14:33:00] *------------------------------*
[14:33:00] Folding@Home GPU Core - Beta
[14:33:00] Version 1.18 (Mon Oct 13 11:11:30 PDT 2008)
[14:33:00] 
[14:33:00] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[14:33:00] Build host: amoeba
[14:33:00] Board Type: AMD
[14:33:00] Core      : 
[14:33:00] Preparing to commence simulation
[14:33:00] - Looking at optimizations...
[14:33:00] - Created dyn
[14:33:00] - Files status OK
[14:33:00] - Expanded 88272 -> 447304 (decompressed 506.7 percent)
[14:33:00] Called DecompressByteArray: compressed_data_size=88272 data_size=447304, decompressed_data_size=447304 diff=0
[14:33:00] - Digital signature verified
[14:33:00] 
[14:33:00] Project: 4746 (Run 0, Clone 55, Gen 49)
[14:33:00] 
[14:33:00] Assembly optimizations on if available.
[14:33:00] Entering M.D.
[14:33:06] Working on p4746_lam5w_300K
[14:33:06] Client config found, loading data.
[14:33:06] Starting GUI Server
[14:36:59] Completed 1%
[14:40:48] Completed 2%
[14:44:37] Completed 3%

...

[17:02:50] Completed 39%
[17:06:50] Completed 40%
[17:10:53] Completed 41%
[17:13:02] - Autosending finished units... [November 23 17:13:02 UTC]
[17:13:02] Trying to send all finished work units
[17:13:02] + No unsent completed units remaining.
[17:13:02] - Autosend completed
[17:13:02] + Working...
[17:14:56] Completed 42%
[17:18:58] Completed 43%
[17:18:58] mdrun_gpu returned 
[17:18:58] NANs detected on GPU
[17:18:58] 
[17:18:58] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:19:02] CoreStatus = 7A (122)
[17:19:02] Sending work to server
[17:19:02] Project: 4746 (Run 0, Clone 55, Gen 49)
[17:19:02] - Read packet limit of 540015616... Set to 524286976.
[17:19:02] - Error: Could not get length of results file work/wuresults_03.dat
[17:19:02] - Error: Could not read unit 03 file. Removing from queue.
[17:19:02] Trying to send all finished work units
[17:19:02] + No unsent completed units remaining.
[17:19:02] - Preparing to get new work unit...
[17:19:02] + Attempting to get work packet
[17:19:02] - Will indicate memory of 2046 MB
[17:19:02] - Connecting to assignment server
[17:19:02] Connecting to http://assign-GPU.stanford.edu:8080/
[17:19:04] Posted data.
[17:19:04] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[17:19:04] + News From Folding@Home: GPU folding beta
[17:19:04] Loaded queue successfully.
[17:19:04] Connecting to http://171.64.65.102:8080/
[17:19:04] Posted data.
[17:19:04] Initial: 0000; - Receiving payload (expected size: 93199)
[17:19:05] - Downloaded at ~91 kB/s
[17:19:05] - Averaged speed for that direction ~126 kB/s
[17:19:05] + Received work.
[17:19:05] Trying to send all finished work units
[17:19:05] + No unsent completed units remaining.
[17:19:05] + Closed connections
[17:19:10] 
[17:19:10] + Processing work unit
[17:19:10] Core required: FahCore_11.exe
[17:19:10] Core found.
[17:19:10] Working on queue slot 04 [November 23 17:19:10 UTC]
[17:19:10] + Working ...
[17:19:10] - Calling '.\FahCore_11.exe -dir work/ -suffix 04 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 3776 -version 620'

[17:19:10] 
[17:19:10] *------------------------------*
[17:19:10] Folding@Home GPU Core - Beta
[17:19:10] Version 1.18 (Mon Oct 13 11:11:30 PDT 2008)
[17:19:10] 
[17:19:10] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[17:19:10] Build host: amoeba
[17:19:10] Board Type: AMD
[17:19:10] Core      : 
[17:19:10] Preparing to commence simulation
[17:19:10] - Looking at optimizations...
[17:19:10] - Created dyn
[17:19:10] - Files status OK
[17:19:10] - Expanded 92687 -> 492188 (decompressed 531.0 percent)
[17:19:10] Called DecompressByteArray: compressed_data_size=92687 data_size=492188, decompressed_data_size=492188 diff=0
[17:19:10] - Digital signature verified
[17:19:10] 
[17:19:10] Project: 5733 (Run 1, Clone 39, Gen 0)

Re: 4746 (0,55,49) is repeatedly NaN'ing at 43%

Posted: Mon Nov 24, 2008 4:35 am
by shdbcamping
[quote="P5-133XL"]4746 (0,55,49) is repeatedly NaN'ing at 43% and not sending to the server ...

I have a hd3870 that I've recently put back into my XPS420. It Will not get to the GUI at all with any 47XX series WU. It used to get to the EUE waiting 24 hrs thing. I had a ATI driver problem of some kind creep up so I installed the catalyst 8.11 driver version. I still fail every thig but the 5XXX series WU's before startig gui. The good part is it retries ad has bee picking up a 5XXX WU before EUEing out.

Re: Project: 4746 (Run 0, Clone 55, Gen 49) multiple NaNs at 43%

Posted: Mon Nov 24, 2008 1:49 pm
by toTOW
There is no data for this WU in the DB :(

Re: 4746 (0,55,49) is repeatedly NaN'ing at 43%

Posted: Mon Nov 24, 2008 4:33 pm
by P5-133XL
shdbcamping wrote:
P5-133XL wrote:4746 (0,55,49) is repeatedly NaN'ing at 43% and not sending to the server ...

I have a hd3870 that I've recently put back into my XPS420. It Will not get to the GUI at all with any 47XX series WU. It used to get to the EUE waiting 24 hrs thing. I had a ATI driver problem of some kind creep up so I installed the catalyst 8.11 driver version. I still fail every thig but the 5XXX series WU's before startig gui. The good part is it retries ad has bee picking up a 5XXX WU before EUEing out.
I'm running dual Sapphire 3870's and am generally not having any problems with EUE's. This is an exception to the norm. If you have a consistant problem with EUE's may I suggest that you try under-clocking the card: that works for some.