Project: 2686 (Run 8, Clone 18, Gen 63)

texinga · Post by **texinga** » Thu Feb 10, 2011 6:50 pm

Had another strange issue with a WU today (2686). In the log below, I was getting (repeatedly) the "CoreStatus = C0000005" error right after the WU begins at 0%. I went over to the CoreStatus Codes FAHWiki and read about it. Seems the Wiki was suggesting that I check my memory (especially) if overclocked. I've been running a stable, conservative OC (3.8GHz) on my 980X in this rig for quite awhile and the memory was at 1366 (but capable of 2000). I've run several Bigadvs (6900's mostly) under that conservative OC template and all have run OK and completed fine. Intel Burn test has been run after the OC and Memtest86. But I'm not one to blindly trust everything and realize that Bigadv WU's are not all the same.

So, I went back into my BIOS and put the 980X back to my stock (non-overclocked) template that has settings of 3.3GHz and "auto" everything. I went back into Windows and restarted the same 2686 Bigadv that was failing and there was no change. Same "C0000005" error statement. All I could figure is that maybe I have a suspect WU, so I deleted the FAH Log folder and Queue file, reset my ID to another number and requested a new Bigadv. I also went back to my conservative OC template (3.8Ghz) in the BIOS, received a 6900 WU and it is progressing OK (currently 4% done). I don't know exactly what to think, so a question for the experts:

1. When you have a Bigadv fail like I did and then try to re-run that same WU (after having put everything back to stock), does it continue to fail anyway as I experienced or should it have taken off and started running OK?

My Log file info from today with the 2686 issue:

Code: Select all

--- Opening Log file [February 10 16:20:53 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\xxxxxxx\FAH
Executable: C:\Users\xxxxxxx\FAH\FAH-630.exe
Arguments: -bigadv -smp 

[16:20:53] - Ask before connecting: No
[16:20:53] - User name: texinga (Team 111065)
[16:20:53] - User ID: 
[16:20:53] - Machine ID: 1
[16:20:53] 
[16:20:53] Loaded queue successfully.
[16:20:53] 
[16:20:53] + Processing work unit
[16:20:53] Core required: FahCore_a3.exe
[16:20:53] Core found.
[16:20:53] Working on queue slot 02 [February 10 16:20:53 UTC]
[16:20:53] + Working ...
[16:20:53] 
[16:20:53] *------------------------------*
[16:20:53] Folding@Home Gromacs SMP Core
[16:20:53] Version 2.22 (Mar 12, 2010)
[16:20:53] 
[16:20:53] Preparing to commence simulation
[16:20:53] - Looking at optimizations...
[16:20:53] - Files status OK
[16:20:59] - Expanded 25473348 -> 31941441 (decompressed 125.3 percent)
[16:20:59] Called DecompressByteArray: compressed_data_size=25473348 data_size=31941441, decompressed_data_size=31941441 diff=0
[16:20:59] - Digital signature verified
[16:20:59] 
[16:20:59] Project: 2686 (Run 8, Clone 18, Gen 63)
[16:20:59] 
[16:20:59] Assembly optimizations on if available.
[16:20:59] Entering M.D.
[16:21:08] Completed 0 out of 250000 steps  (0%)
[16:21:19] CoreStatus = C0000005 (-1073741819)
[16:21:19] Client-core communications error: ERROR 0xc0000005
[16:21:19] Deleting current work unit & continuing...

Folding@Home Client Shutdown at user request.

Folding@Home Client Shutdown.

Here's the Log File of the 6900 WU that started and ran OK afterwards (after I had cleared the FAH Log folder, queue, etc). This is with the conservative OC template (3.8GHz) applied that has been running OK with other 6900 WU's:

Code: Select all

--- Opening Log file [February 10 16:34:18 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\xxxxxx\FAH
Executable: C:\Users\xxxxxx\FAH\FAH-630.exe
Arguments: -oneunit -bigadv -smp 

[16:34:18] - Ask before connecting: No
[16:34:18] - User name: texinga (Team 111065)
[16:34:18] - User ID: 
[16:34:18] - Machine ID: 2
[16:34:18] 
[16:34:19] Loaded queue successfully.
[16:34:19] - Preparing to get new work unit...
[16:34:19] Cleaning up work directory
[16:34:19] + Attempting to get work packet
[16:34:19] Passkey found
[16:34:19] - Connecting to assignment server
[16:34:19] - Successful: assigned to (130.237.232.141).
[16:34:19] + News From Folding@Home: Welcome to Folding@Home
[16:34:19] Loaded queue successfully.
[16:34:45] + Closed connections
[16:34:45] 
[16:34:45] + Processing work unit
[16:34:45] Core required: FahCore_a3.exe
[16:34:45] Core found.
[16:34:45] Working on queue slot 01 [February 10 16:34:45 UTC]
[16:34:45] + Working ...
[16:34:45] 
[16:34:45] *------------------------------*
[16:34:45] Folding@Home Gromacs SMP Core
[16:34:45] Version 2.22 (Mar 12, 2010)
[16:34:45] 
[16:34:45] Preparing to commence simulation
[16:34:45] - Looking at optimizations...
[16:34:45] - Created dyn
[16:34:45] - Files status OK
[16:34:50] - Expanded 24860225 -> 30796293 (decompressed 123.8 percent)
[16:34:50] Called DecompressByteArray: compressed_data_size=24860225 data_size=30796293, decompressed_data_size=30796293 diff=0
[16:34:51] - Digital signature verified
[16:34:51] 
[16:34:51] Project: 6900 (Run 35, Clone 18, Gen 23)
[16:34:51] 
[16:34:51] Assembly optimizations on if available.
[16:34:51] Entering M.D.
[16:34:59] Completed 0 out of 250000 steps  (0%)
[16:59:09] Completed 2500 out of 250000 steps  (1%)
[17:23:53] Completed 5000 out of 250000 steps  (2%)
[17:47:58] Completed 7500 out of 250000 steps  (3%)
[18:12:07] Completed 10000 out of 250000 steps  (4%)

Rick

Post by **PantherX** » Thu Feb 10, 2011 7:09 pm

In the WU Database, there are already 2 failures and once I included yours, that's 3 so I have reported it as a bad WU.

If you initially got a bigadv and it gave you an error, don't try to run it from a backup (by changing your OC settings) since it won't do you any good as the Server has already been informed (in most cases) that you failed and will try to reassign it (hopefully to somebody else).

If you are being assigned the same WU again, you could reduce your OC settings and try but there isn't any guarantee that you will be assigned the same WU.

If you have folded plenty of bigadv WUs or normal WUs successfully, you shouldn't worry about an occasional error since it can be a bad WU and there's nothing much you can do about it, other than to report it here.

texinga · Post by **texinga** » Thu Feb 10, 2011 7:47 pm

Whew...thank-you Panther for that very informative reply. My head was spinning as to what I needed to do because I couldn't figure out why the WU wouldn't run after I reset my rig to stock timings. Each time I tried to adjust something on my PC (in attempt) to get past this WU failure, when I ran the FAH630 application it would present me with what appeared to be the same WU 2686: (Run 8, Clone 18, Gen 63). I was presented with what I thought was this same WU (because the WU# and ID didn't change). I got the same WU 5 times over a period of roughly an hour as I tried to see if I could get it to run. It was like I was in a cycle of repeatedly getting the same WU until I deleted the FAH Log, queue, etc to force getting a new WU. I was OK with that because I was wanting to see if any of my change efforts on my rig would make the WU happy and continue on. I really hate to have to kill a WU, but as you say there will be times that I have to do it.

Thanks so much for the help and quick reply today,

Rick

Post by **bruce** » Fri Feb 11, 2011 12:41 am

You bring up a good question about why the Wiki suggests removing the overclock and testing memory. Consider the following:

The smp cores do stress memory pretty hard and even an occasional error can create an EUE.

The error C0000005 can be caused either by the WU/Software (which you can do very little about) or by errors in your memory or memory controller (which you can do something about).

In general, suggesting that you reduce your overclock settings and/or run memtest overnight is useful, even if that just eliminates that one possibility. We could also suggest that you stop folding for a few days to see if someone else completes the reassigned WU, but that's a less effective way to proceed since initially nobody can tell the real cause of the EUE.

texinga · Post by **texinga** » Fri Feb 11, 2011 11:24 am

Thanks Bruce and I'd like to add how super-helpful you guys are in answering questions and follow-up!

Rick

Post by **bruce** » Fri Feb 11, 2011 6:11 pm

There are a larger number of failures now.
The WU (P2686,R8,C18,G63) has been reported as a bad WU.

Folding Forum

Project: 2686 (Run 8, Clone 18, Gen 63)

Project: 2686 (Run 8, Clone 18, Gen 63)

Re: Project: 2686 (Run 8, Clone 18, Gen 63)

Re: Project: 2686 (Run 8, Clone 18, Gen 63)

Re: Project: 2686 (Run 8, Clone 18, Gen 63)

Re: Project: 2686 (Run 8, Clone 18, Gen 63)

Re: Project: 2686 (Run 8, Clone 18, Gen 63)