Project: 5768 (Run 7, Clone 18, Gen 529)

Moderators: Site Moderators, FAHC Science Team

Post Reply
paulb39
Posts: 5
Joined: Sat Jun 13, 2009 6:17 pm

Project: 5768 (Run 7, Clone 18, Gen 529)

Post by paulb39 »

Latest GPU Client, usually when I get unstable machine I quit Folding@Home and then open it again, and it works, but it won't get past this error, how can I fix this?

Code: Select all

# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\Paul\AppData\Roaming\Folding@home-gpu


[21:57:50] - Ask before connecting: No
[21:57:50] - User name: louieb39 (Team 9999)
[21:57:50] - User ID: 73AAB0F25E19FDA9
[21:57:50] - Machine ID: 2
[21:57:50] 
[21:57:50] Loaded queue successfully.
[21:57:50] Initialization complete
[21:57:50] - Preparing to get new work unit...
[21:57:50] + Attempting to get work packet
[21:57:50] - Connecting to assignment server
[21:57:51] - Successful: assigned to (171.67.108.11).
[21:57:51] + News From Folding@Home: Welcome to Folding@Home
[21:57:51] Loaded queue successfully.
[21:57:52] + Closed connections
[21:57:52] 
[21:57:52] + Processing work unit
[21:57:52] Core required: FahCore_11.exe
[21:57:52] Core found.
[21:57:52] Working on queue slot 03 [November 2 21:57:52 UTC]
[21:57:52] + Working ...
[21:57:52] 
[21:57:52] *------------------------------*
[21:57:52] Folding@Home GPU Core - Beta
[21:57:52] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:57:52] 
[21:57:52] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:57:52] Build host: amoeba
[21:57:52] Board Type: Nvidia
[21:57:52] Core      : 
[21:57:52] Preparing to commence simulation
[21:57:52] - Looking at optimizations...
[21:57:52] - Created dyn
[21:57:52] - Files status OK
[21:57:52] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:57:52] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:57:52] - Digital signature verified
[21:57:52] 
[21:57:52] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:57:52] 
[21:57:53] Assembly optimizations on if available.
[21:57:53] Entering M.D.
[21:58:00] Working on Protein
[21:58:04] Client config found, loading data.
[21:58:04] Starting GUI Server
[21:58:05] mdrun_gpu returned 
[21:58:05] NANs detected on GPU
[21:58:05] 
[21:58:05] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:58:08] CoreStatus = 7A (122)
[21:58:08] Sending work to server
[21:58:08] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:08] - Error: Could not get length of results file work/wuresults_03.dat
[21:58:08] - Error: Could not read unit 03 file. Removing from queue.
[21:58:08] - Preparing to get new work unit...
[21:58:08] + Attempting to get work packet
[21:58:08] - Connecting to assignment server
[21:58:09] - Successful: assigned to (171.67.108.11).
[21:58:09] + News From Folding@Home: Welcome to Folding@Home
[21:58:09] Loaded queue successfully.
[21:58:10] + Closed connections
[21:58:15] 
[21:58:15] + Processing work unit
[21:58:15] Core required: FahCore_11.exe
[21:58:15] Core found.
[21:58:15] Working on queue slot 04 [November 2 21:58:15 UTC]
[21:58:15] + Working ...
[21:58:15] 
[21:58:15] *------------------------------*
[21:58:15] Folding@Home GPU Core - Beta
[21:58:15] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:58:15] 
[21:58:15] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:58:15] Build host: amoeba
[21:58:15] Board Type: Nvidia
[21:58:15] Core      : 
[21:58:15] Preparing to commence simulation
[21:58:15] - Looking at optimizations...
[21:58:15] - Created dyn
[21:58:15] - Files status OK
[21:58:15] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:58:15] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:58:15] - Digital signature verified
[21:58:15] 
[21:58:15] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:15] 
[21:58:16] Assembly optimizations on if available.
[21:58:16] Entering M.D.
[21:58:23] Working on Protein
[21:58:27] Client config found, loading data.
[21:58:27] Starting GUI Server
[21:58:28] mdrun_gpu returned 
[21:58:28] NANs detected on GPU
[21:58:28] 
[21:58:28] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:58:31] CoreStatus = 7A (122)
[21:58:31] Sending work to server
[21:58:31] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:31] - Error: Could not get length of results file work/wuresults_04.dat
[21:58:31] - Error: Could not read unit 04 file. Removing from queue.
[21:58:31] - Preparing to get new work unit...
[21:58:31] + Attempting to get work packet
[21:58:31] - Connecting to assignment server
[21:58:32] - Successful: assigned to (171.67.108.11).
[21:58:32] + News From Folding@Home: Welcome to Folding@Home
[21:58:32] Loaded queue successfully.
[21:58:33] + Closed connections
[21:58:38] 
[21:58:38] + Processing work unit
[21:58:38] Core required: FahCore_11.exe
[21:58:38] Core found.
[21:58:38] Working on queue slot 05 [November 2 21:58:38 UTC]
[21:58:38] + Working ...
[21:58:38] 
[21:58:38] *------------------------------*
[21:58:38] Folding@Home GPU Core - Beta
[21:58:38] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:58:38] 
[21:58:38] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:58:38] Build host: amoeba
[21:58:38] Board Type: Nvidia
[21:58:38] Core      : 
[21:58:38] Preparing to commence simulation
[21:58:38] - Looking at optimizations...
[21:58:38] - Created dyn
[21:58:38] - Files status OK
[21:58:38] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:58:38] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:58:38] - Digital signature verified
[21:58:38] 
[21:58:38] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:38] 
[21:58:39] Assembly optimizations on if available.
[21:58:39] Entering M.D.
[21:58:46] Working on Protein
[21:58:50] Client config found, loading data.
[21:58:50] Starting GUI Server
[21:58:51] mdrun_gpu returned 
[21:58:51] NANs detected on GPU
[21:58:51] 
[21:58:51] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:58:54] CoreStatus = 7A (122)
[21:58:54] Sending work to server
[21:58:54] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:58:54] - Error: Could not get length of results file work/wuresults_05.dat
[21:58:54] - Error: Could not read unit 05 file. Removing from queue.
[21:58:54] - Preparing to get new work unit...
[21:58:54] + Attempting to get work packet
[21:58:54] - Connecting to assignment server
[21:58:54] - Successful: assigned to (171.67.108.11).
[21:58:54] + News From Folding@Home: Welcome to Folding@Home
[21:58:55] Loaded queue successfully.
[21:58:56] + Closed connections
[21:59:01] 
[21:59:01] + Processing work unit
[21:59:01] Core required: FahCore_11.exe
[21:59:01] Core found.
[21:59:01] Working on queue slot 06 [November 2 21:59:01 UTC]
[21:59:01] + Working ...
[21:59:01] 
[21:59:01] *------------------------------*
[21:59:01] Folding@Home GPU Core - Beta
[21:59:01] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:59:01] 
[21:59:01] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:59:01] Build host: amoeba
[21:59:01] Board Type: Nvidia
[21:59:01] Core      : 
[21:59:01] Preparing to commence simulation
[21:59:01] - Looking at optimizations...
[21:59:01] - Created dyn
[21:59:01] - Files status OK
[21:59:01] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:59:01] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:59:01] - Digital signature verified
[21:59:01] 
[21:59:01] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:59:01] 
[21:59:02] Assembly optimizations on if available.
[21:59:02] Entering M.D.
[21:59:09] Working on Protein
[21:59:14] Client config found, loading data.
[21:59:14] mdrun_gpu returned 
[21:59:14] NANs detected on GPU
[21:59:14] 
[21:59:14] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:59:17] CoreStatus = 7A (122)
[21:59:17] Sending work to server
[21:59:17] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:59:17] - Error: Could not get length of results file work/wuresults_06.dat
[21:59:17] - Error: Could not read unit 06 file. Removing from queue.
[21:59:17] - Preparing to get new work unit...
[21:59:17] + Attempting to get work packet
[21:59:17] - Connecting to assignment server
[21:59:17] - Successful: assigned to (171.67.108.11).
[21:59:17] + News From Folding@Home: Welcome to Folding@Home
[21:59:17] Loaded queue successfully.
[21:59:18] + Closed connections
[21:59:23] 
[21:59:23] + Processing work unit
[21:59:23] Core required: FahCore_11.exe
[21:59:23] Core found.
[21:59:23] Working on queue slot 07 [November 2 21:59:23 UTC]
[21:59:23] + Working ...
[21:59:23] 
[21:59:23] *------------------------------*
[21:59:23] Folding@Home GPU Core - Beta
[21:59:23] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[21:59:23] 
[21:59:23] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[21:59:23] Build host: amoeba
[21:59:23] Board Type: Nvidia
[21:59:23] Core      : 
[21:59:23] Preparing to commence simulation
[21:59:23] - Looking at optimizations...
[21:59:23] - Created dyn
[21:59:23] - Files status OK
[21:59:23] - Expanded 46682 -> 252912 (decompressed 541.7 percent)
[21:59:23] Called DecompressByteArray: compressed_data_size=46682 data_size=252912, decompressed_data_size=252912 diff=0
[21:59:23] - Digital signature verified
[21:59:23] 
[21:59:23] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:59:23] 
[21:59:24] Assembly optimizations on if available.
[21:59:24] Entering M.D.
[21:59:31] Working on Protein
[21:59:35] Client config found, loading data.
[21:59:35] Starting GUI Server
[21:59:36] mdrun_gpu returned 
[21:59:36] NANs detected on GPU
[21:59:36] 
[21:59:36] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:59:40] CoreStatus = 7A (122)
[21:59:40] Sending work to server
[21:59:40] Project: 5768 (Run 7, Clone 18, Gen 529)
[21:59:40] - Error: Could not get length of results file work/wuresults_07.dat
[21:59:40] - Error: Could not read unit 07 file. Removing from queue.
[21:59:40] EUE limit exceeded. Pausing 24 hours.
OlivierZ
Posts: 2
Joined: Sun Sep 27, 2009 3:30 pm

Re: Unstable Machine Loop

Post by OlivierZ »

Stop the gpu client.
In the directory where you have your gpu client :

Delete :
queue.dat
unitinfo.txt
work directory

Edit client.cfg (if possible with an editor like UltraEdit) :
change the value of machineid
paulb39
Posts: 5
Joined: Sat Jun 13, 2009 6:17 pm

Re: Unstable Machine Loop

Post by paulb39 »

OlivierZ wrote:Stop the gpu client.
In the directory where you have your gpu client :

Delete :
queue.dat
unitinfo.txt
work directory

Edit client.cfg (if possible with an editor like UltraEdit) :
change the value of machineid
Thank you, its working now. Can any one explain why I needed to do that? What made it stop working? And does this mean all the work I did is invalid now?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Unstable Machine Loop

Post by bruce »

paulb39 wrote:Thank you, its working now. Can any one explain why I needed to do that? What made it stop working? And does this mean all the work I did is invalid now?
The "unstable machine" error is designed to recognize hardware that is either defective or sufficiently overclocked or overheated that it can no longer make accurate calculations. Unfortunately I have seen situations where the WU itself is defective and it triggers the same error message. I have no way of knowing whether it's the WU or it's your hardware.

This error SHOULD stop your machine for 24 hours, and it's not doing that. The Work Server reassigned assumes (incorrectly) that the WU was corrupted during the download so it just re-sends the same WU. Hopefully this problem will be fixed either in the new client code or in the new server code.

In the portion of the log that you posted, no work was done, so the question about whether it was invalid or not doesn't really have an answer. If you want to dig up a log where some work was done, we can comment on what that log says.

You did successfully complete p5770 r13 c319 g1091 at 2009-11-01 18:18:43 (Stanford time)
paulb39
Posts: 5
Joined: Sat Jun 13, 2009 6:17 pm

Re: Unstable Machine Loop

Post by paulb39 »

bruce wrote:
paulb39 wrote:Thank you, its working now. Can any one explain why I needed to do that? What made it stop working? And does this mean all the work I did is invalid now?
The "unstable machine" error is designed to recognize hardware that is either defective or sufficiently overclocked or overheated that it can no longer make accurate calculations. Unfortunately I have seen situations where the WU itself is defective and it triggers the same error message. I have no way of knowing whether it's the WU or it's your hardware.

This error SHOULD stop your machine for 24 hours, and it's not doing that. The Work Server reassigned assumes (incorrectly) that the WU was corrupted during the download so it just re-sends the same WU. Hopefully this problem will be fixed either in the new client code or in the new server code.

In the portion of the log that you posted, no work was done, so the question about whether it was invalid or not doesn't really have an answer. If you want to dig up a log where some work was done, we can comment on what that log says.

You did successfully complete p5770 r13 c319 g1091 at 2009-11-01 18:18:43 (Stanford time)
Thanks for the info, sadly the log (status/log) doesn't go far back enough where work was successfully done.
OlivierZ
Posts: 2
Joined: Sun Sep 27, 2009 3:30 pm

Re: Project: 5768 (Run 7, Clone 18, Gen 529)

Post by OlivierZ »

Try reading directly the following files in the directory of your gpu client :
FAHlog.txt
FAHlog-Prev.txt
paulb39
Posts: 5
Joined: Sat Jun 13, 2009 6:17 pm

Re: Project: 5768 (Run 7, Clone 18, Gen 529)

Post by paulb39 »

OlivierZ wrote:Try reading directly the following files in the directory of your gpu client :
FAHlog.txt
FAHlog-Prev.txt

Both only go to Nov 2, doesn't show work that was successfully done.
paulb39
Posts: 5
Joined: Sat Jun 13, 2009 6:17 pm

Re: Project: 5768 (Run 7, Clone 18, Gen 529)

Post by paulb39 »

Bumping because after a couple of days its doing the same thing.

Can some one tell me why this is happening?

Posted my log in a google doc since the forums says its to many characters

http://docs.google.com/View?id=dfp79536_27hgf7qwm4
Post Reply