Page 2 of 4

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 5:52 am
by bruce
I edited my earlier post so it contains new information.

Your GPU is clearly having trouble with that WU. I would not leave it running. Let's start by simply pausing your GPU. Go to FAHControl and in the middle of the initial screen, you'll see a small chart called "Folding Slots" and another called "Work Queue" In the upper chart, you'll see two green "Running" words, one called cpu and one called gpu. Right-click on the green gpu slot Status flag and select Pause

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 5:55 am
by crimson1077
bruce wrote:
... So far, what's weird is, i'm still noticing that PRCG is changing from 13422 (2975, 69, 2) to PRCG 16918 (4, 50, 39) then right back to PRCG 13422 (2975, 69, 2) again and repeat. Is that a sign of anything?
Yes, and it's a good sign. FAH is running two WUs that run independently of each other, one on your CPU and one on your GPU. The output logs are intermixed so you have to learn to read the combined output or use the filtering function that's built into to FAHControl.

Code: Select all

20:18:34:WU00:FS00:0xa7:       SIMD: avx_256
20:18:34:WU00:FS00:0xa7:Project: 14824 (Run 1225, Clone 3, Gen 52)
...
20:18:37:WU01:FS01:0x22:Project: 13422 (Run 3163, Clone 20, Gen 0)
FAHCore 0xa7 is running WU00 on slot FS00 using the avx-256 hardware feature. Independently, FAHCore 0x22 is running (or trying to) WU01 using slot FS01
Thanks bruce.

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 5:56 am
by crimson1077
Paused!

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 6:03 am
by bruce
I don't have a good explanation as to why the Hawaii [Radeon R9 200/300 Series] is having trouble with Project 13422 (2975, 69, 2) but it is ... and it looks like it's looping. That's not good so I suggested the Pause. The log should continue to show (only) what's happening to the CPU assignment and it will be a lot easier to follow.

Then we'll figure out what to do with the GPU.

Your GPU is running Core: Core22 Version: 0.0.11. It has been going through a process of bug fixing and I think you've found a new one. The developer is on the east coast, so I don't think we can contact him a 01:00 EST. :(

That leaves us 2 choices. 1) tell you how to dump the WU and hope that it's replaced with something your GPU can process or 2) wait until morning in NYC and let him recommend a way to figure out what's going on.

If you choose 2, it's actually better than leaving it running (wasting GPU power and gathering useless repeated messages.

https://apps.foldingathome.org/cpu shows that your AMD GPU has completed several other WUs from Project 13422 but this one is somehow different.

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 6:20 am
by crimson1077
bruce, Just can't shake it this one! I hope I found something! Because before this, I was downing 13,000 point WU's in an hour with this bad puppy. If it were you how would you go about?

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 6:25 am
by crimson1077
bruce wrote:I don't have a good explanation as to why the Hawaii [Radeon R9 200/300 Series] is having trouble with Project 13422 (2975, 69, 2) but it is ... and it looks like it's looping. That's not good so I suggested the Pause. The log should continue to show (only) what's happening to the CPU assignment and it will be a lot easier to follow.

Then we'll figure out what to do with the GPU.

Your GPU is running Core: Core22 Version: 0.0.11. It has been going through a process of bug fixing and I think you've found a new one. The developer is on the east coast, so I don't think we can contact him a 01:00 EST. :(

That leaves us 2 choices. 1) tell you how to dump the WU and hope that it's replaced with something your GPU can process or 2) wait until morning in NYC and let him recommend a way to figure out what's going on.

If you choose 2, it's actually better than leaving it running (wasting GPU power and gathering useless repeated messages.

https://apps.foldingathome.org/cpu shows that your AMD GPU has completed several other WUs from Project 13422 but this one is somehow different.
It's up to you, I don't mind taking a 1am dump! If that fails we can unleash the devs

(funny i was actually reading up about dumping a WU but I definitely need a walk though there.

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 6:27 am
by bruce
You have to choose. Personally, I'd rather help fix a bug than complete more WUs, but both are important. Others don't always have the same preferences that I do.

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 6:34 am
by crimson1077
bruce wrote:You have to choose. Personally, I'd rather help fix a bug than complete more WUs, but both are important. Others don't always have the same preferences that I do.
I rather wait for Dev my friend thank you for your help.

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 6:39 am
by bruce
I'll send him an email. If you change your mind, this should allow you to dump it. ... or if Dev takes Sunday off.

FAH's data files are at C:\Users\Crimson\AppData\Roaming\FAHClient
The work files are in \work\0n where n is the queue position. (01, in your case).

I'd make a backup of 01 somewhere. With the WU paused, think you can delete enough of the contents of 01 to force it to abort itself if that's your choice. There's no guarantee that the same thing might or might not happen to another WU.

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 6:42 am
by crimson1077
This is definitely Mr. Mcbuggy buggerton's bug house going on here.

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 6:57 am
by bruce
I take it from your handle that you're a dedicated Red-box (AMD) fan. There are a number of unexplained AMD bugs that Green-box (nV) fans don't encounter. All the red-box fans will thank you for you dedication if we can fix this one.

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 7:52 am
by crimson1077
bruce wrote:I take it from your handle that you're a dedicated Red-box (AMD) fan. There are a number of unexplained AMD bugs that Green-box (nV) fans don't encounter. All the red-box fans will thank you for you dedication if we can fix this one.
Not necessarily, I'm not a fan of either at the time the R9 390 was the better buy oppose to the GTX 970 for Me and for gaming at the time. (I never thought I'd fold on my gaming rig.)
I don't see this ever happening on my two 660's I have folding right now. I just got my first Nvida cards this year and I'm loving both! I have to say up, until this, it was kicking butt with 390! Easy 200k 24 avg. But, i'm not so sure about AMD folding now... sheesh. I do wanna at least get it going again, she's a beaut of a card... she's worth it. 8-)

hint, hint, Roll Tide.

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 8:27 am
by bruce
Oh, that Crimson 8-)

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 8:41 am
by crimson1077
bruce wrote:Oh, that Crimson 8-)
Thanks for your help bruce. I did as you said with 01 folder and backed up and deleted. As I fired up FAH I watched in windows file explore tab 01 reappear as it should but still on PRCG 13422.
Maybe too soon to tell as of now. We shall see!

Re: Failed GPU slot daily :(

Posted: Sun Aug 23, 2020 9:21 am
by Neil-B
Project 13422 is the P part of PRCG ... R is run, C is clone and G is generation ... a PRCG identifies a specific WU within a Project ... hopefully you have actually got a new PRCG within Project 13422 ... related to the buggy PRCG - was the couple of days pause iirc you mentioned during the folding of that WU? - and did you pause the slot and exit the client before shutting down? - I am simply wondering if the initial failure of the WU may have been linked to some corruption caused at that point ... for a safe shutdown it can be worth pausing slots (there are threads discussing best time to do this relating to checkpoints) quit the client then wait a bit to ensure everything is saved properly before shutting down the system - this seems to minimise chances of issues from that respect.