Project: 4743 (Run 9, Clone 902, Gen 11)

Moderators: Site Moderators, FAHC Science Team

Post Reply
blisk
Posts: 2
Joined: Tue Jan 27, 2009 5:00 am

Project: 4743 (Run 9, Clone 902, Gen 11)

Post by blisk »

I believe its a problem with the work unit I'm being assigned. I had just successfully sent and competed a work unit, then I recieved this work unit: p4743_lam5w_300K

ever since then my log has been filled with just getting the p4743_lam5w_300K work unit, and then getting the unstable machine error, here's my log:

Code: Select all

[01:38:39] Project: 4743 (Run 9, Clone 902, Gen 11)
[01:38:39] 
[01:38:39] Assembly optimizations on if available.
[01:38:39] Entering M.D.
[01:38:45] Working on p4743_lam5w_300K
[01:38:46] Client config found, loading data.
[01:38:46] Starting GUI Server
[01:38:50] mdrun_gpu returned 
[01:38:50] NANs detected on GPU
[01:38:50] 
[01:38:50] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:38:53] CoreStatus = 7A (122)
[01:38:53] Sending work to server
[01:38:53] Project: 4743 (Run 9, Clone 902, Gen 11)
[01:38:53] - Read packet limit of 540015616... Set to 524286976.
[01:38:53] - Error: Could not get length of results file work/wuresults_04.dat
[01:38:53] - Error: Could not read unit 04 file. Removing from queue.
[01:38:53] - Preparing to get new work unit...
[01:38:53] + Attempting to get work packet
[01:38:53] - Connecting to assignment server
[01:38:53] - Successful: assigned to (171.64.65.103).
[01:38:53] + News From Folding@Home: GPU folding beta
[01:38:54] Loaded queue successfully.
[01:38:55] + Closed connections
[01:39:00] 
[01:39:00] + Processing work unit
[01:39:00] Core required: FahCore_11.exe
[01:39:00] Core found.
[01:39:00] Working on queue slot 05 [January 30 01:39:00 UTC]
[01:39:00] + Working ...
[01:39:00] 
[01:39:00] *------------------------------*
[01:39:00] Folding@Home GPU Core - Beta
[01:39:00] Version 1.22 (Mon Dec 8 12:57:56 PST 2008)
[01:39:00] 
[01:39:00] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[01:39:00] Build host: amoeba
[01:39:00] Board Type: AMD
[01:39:00] Core      : 
[01:39:00] Preparing to commence simulation
[01:39:00] - Looking at optimizations...
[01:39:00] - Created dyn
[01:39:00] - Files status OK
[01:39:00] - Expanded 88298 -> 447304 (decompressed 506.5 percent)
[01:39:00] Called DecompressByteArray: compressed_data_size=88298 data_size=447304, decompressed_data_size=447304 diff=0
[01:39:00] - Digital signature verified
[01:39:00] 
[01:39:00] Project: 4743 (Run 9, Clone 902, Gen 11)
[01:39:00] 
[01:39:00] Assembly optimizations on if available.
[01:39:00] Entering M.D.
[01:39:06] Working on p4743_lam5w_300K
[01:39:06] Client config found, loading data.
[01:39:07] Starting GUI Server
[01:39:11] mdrun_gpu returned 
[01:39:11] NANs detected on GPU
[01:39:11] 
[01:39:11] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:39:16] CoreStatus = 7A (122)
[01:39:16] Sending work to server
[01:39:16] Project: 4743 (Run 9, Clone 902, Gen 11)
[01:39:16] - Read packet limit of 540015616... Set to 524286976.
[01:39:16] - Error: Could not get length of results file work/wuresults_05.dat
[01:39:16] - Error: Could not read unit 05 file. Removing from queue.
[01:39:16] - Preparing to get new work unit...
[01:39:16] + Attempting to get work packet
[01:39:16] - Connecting to assignment server
[01:39:16] - Successful: assigned to (171.64.65.103).
[01:39:16] + News From Folding@Home: GPU folding beta
[01:39:17] Loaded queue successfully.
[01:39:18] + Closed connections
EDIT: Just noticed this on another post:
"If your GPU has never given you problems running FAH, it could simply be a bad WU. Delete the work files, queue.dat, and unitinfo.txt until you get a different WU. If you continue to get problems, please post your log file."

Unfortunately it doesn't work.
Last edited by blisk on Fri Jan 30, 2009 4:17 am, edited 2 times in total.
DanGe
Posts: 118
Joined: Sat Nov 08, 2008 2:46 am
Hardware configuration: 2018 Mac Mini / MacOS Catalina
MSI Radeon RX Vega 56 (eGPU via Sonnet Breakaway Box 550)
3.2 GHz 6-Core Intel Core i7
Location: California, United States

Re: Issue with "UNSTABLE_MACHINE"

Post by DanGe »

I noticed the server is assigning you the same WU (same project, run, clone, gen numbers). The assignment servers normally reassign the same WUs 3-4 times when your WU fails. Since this *might* be a bad WU, you have to pretty much delete the work folder, queue.dat, and unitinfo.txt files a few times before you get a different WU.

If your GPU continues to fail on different WUs, we will need to have more specifics, such as hardware specs, whether you overclocked your GPU, and OS.
blisk
Posts: 2
Joined: Tue Jan 27, 2009 5:00 am

Re: Issue with "UNSTABLE_MACHINE"

Post by blisk »

It finally ended up giving me a good one, I had given up and just checked now after a couple hours and its a new one.. working fine.

How often do these bad WU's come up? How is it finally determined that its a bad work unit?
DanGe
Posts: 118
Joined: Sat Nov 08, 2008 2:46 am
Hardware configuration: 2018 Mac Mini / MacOS Catalina
MSI Radeon RX Vega 56 (eGPU via Sonnet Breakaway Box 550)
3.2 GHz 6-Core Intel Core i7
Location: California, United States

Re: "unstable_machine" in log file

Post by DanGe »

Glad to hear it's working again :)

Bad WUs do not come up too often. With WUs for beta clients like GPU, though, they do come up in a *slightly* higher frequency, I suppose. We usually finally determine a WU is a bad one if we find that the WU stops at nearly the same place for many people.

If you suspect your WU is a bad one, post in the Issues with a Specific WU section of the forum (viewforum.php?f=19).

To the mods: I think this thread should be moved to the aforementioned forum since this might be a bad WU.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 4743 (Run 9, Clone 902, Gen 11)

Post by bruce »

Thread moved and title changed.

Project: 4743 (Run 9, Clone 902, Gen 11) has been successfully completed by someone, so it's not really a bad WU.
Post Reply