Page 3 of 3

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Thu May 14, 2020 8:33 pm
by HaloJones
the latest 1xxx and 2xxx Nvidia cards boost automatically depending on their BIOS settings regarding power limit and cooling. The better the cooling the higher the card will boost. That means there is headroom in every Nvidia card so long as you're prepared to improve the cooling or accept louder fans. All my cards are water-cooled - most with custom loops - and they run typically 30C below Nvidia's max temp so all boost far higher than their manufacturer rated numbers.

They still need some TLC and tweaking sometimes but saying a gpu can't be overclocked safely is nonsense.

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Thu May 14, 2020 9:43 pm
by clapanse
Yeah, cooling is definitely a huge part of it. Part of the reason I suspect my 1080ti runs so well at 2050MHz is because it's water cooled, and stays under 62-63C even at 300W, and normally when folding, it stays at 47-53C.

Re: GPU Projects dropping with error

Posted: Thu May 14, 2020 10:09 pm
by bruce
clapanse wrote:However, where Folding is more demanding is in tolerance of small errors - Furmark will still run with an occasional bad bit or small error. Folding will not tolerate any errors at all, so what seems to be stable in furmark may end up not being stable after all. In no way is folding as demanding of a workload for the GPU though.
Furmark is great if you plan to run games. You'll probably never notice if one light-blue pixel happens to be the wrong shade of light blue. When you're running science, that might be a critical bit, so yes, FAH is intolerant of errors -- whether you happen to mentally classify them as scientifically small or large.

Re: GPU Projects dropping with error

Posted: Fri May 15, 2020 3:32 am
by PantherX
clapanse wrote:...In addition, many people here have stated that folding is a harder workload than furmark. This is plainly and obviously false. Most folding WUs run between 65 and 100% TDP on my 1080ti, with some occasionally hitting 115% or so (I have the power limit set to 120%). If I remember right, the high load ones are 14415 and similar (the myosin projects) - I've seen those consistently hitting my GPU harder than most other projects. Even so, my GPU is able to maintain 1950MHz or above on basically any folding workload. Furmark on the other hand is able to push my GPU all the way to its power cap at 120% TDP with the card clocked all the way down at 1750MHz. No folding workload is this demanding. Not even close...
I don't consider the notion that higher TDP means more stressful...rather, generating higher TDP artificially simply tests the cooling system of the GPU/system. You can use 100 Watts and do something extremely inefficient or use 75 Watts and do something extremely efficient. F@H is optimized for maximum efficiency that does push the GPU to its limits. If the GPU is unstable, you get errors and if the GPU is stable, you don't.

Also, regarding Furmark, here's quote to sum it up nicely:
...use FurMark for identifying cooling issues or a graphics card’s potentially unstable power supply...but the more significant errors and crashes show up earlier in games like The Witcher 3. This means that FurMark is largely useless for realistic tests, especially since some drivers recognize it and automatically limit power...
https://www.tomshardware.com/reviews/ho ... 449-5.html

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Fri May 15, 2020 10:04 pm
by vtankovich
I would still avoid overclocking, even if you don't encounter errors. Not every error can be detected, and a few occasional wrong bits will result in atom location being slightly off, but will reduce validity of simulation as it essentially adds more "temperature" into the protein.

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Sat May 16, 2020 4:18 am
by bruce
A single bit change in a FP number may result in a small shift in an atom position or a very large shift. It depends on which bit.
e.g: 1.23456E-64 and 1.23456E+64

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Sun May 17, 2020 3:26 pm
by MeeLee
vtankovich wrote:I would still avoid overclocking, even if you don't encounter errors. Not every error can be detected, and a few occasional wrong bits will result in atom location being slightly off, but will reduce validity of simulation as it essentially adds more "temperature" into the protein.
Wouldn't this then not play in with real world scenarios?
Hardly ever do you have a a perfect temperature case scenario.

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Tue May 19, 2020 8:04 pm
by Joxster
So, I just wanted to drop an update over here about the issue I raised, which I see has turned into a wider discussion around overclocking in the meantime. :lol:

I took the advise of PantherX, and turned the memory clock down by a good 200Mhz as well, and bingo, the GPU projects stopped failing. But, I couldn't believe that the projects were failing because of my core or memory clock speeds, since my card OCs on it's own due to the lower temperatures, as already confirmed by another member in this thread, and I had not faced the issue even once in the 1.5 months of folding at full clock speeds on my GPU before reporting it on this thread.

A bit of monitoring in HWiNFO gave me the answer to my question of what was actually causing the issue - while my card was folding at it's normal clock speeds, I watched the temperature on the GPU VRM slowly increase, until it touched around 106C :eo, and the GPU project failed right at that moment. I tried another GPU project, and this time, the moment the VRM crossed 95C in temperature, the GPU project failed. By reducing the core and memory clocks the maximum I could, the VRM temperature was hovering around the 90C mark, but the GPU projects were not failing anymore.

Things took a turn for the worse when Red Dead Redemption 2 also started crashing due to the VRMs crossing 95C while playing the game. I have contacted the seller about this issue, and need to handover my GPU to them for the next couple of weeks, for them to either repair or replace. Hopefully they will give me a newer replacement card.

In summary - I agree with the other guys on this thread who say that overclocking should not be a problem with F@H if it's done correctly, and in my case, the card was doing it on its own, so there was definitely nothing wrong with that. I hope to get back to folding very soon.

Thanks to everyone for their responses and support.

Cheers

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Tue May 19, 2020 9:02 pm
by HaloJones
I take it the card was bought refurbished or used?

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Tue May 19, 2020 10:04 pm
by Joxster
HaloJones wrote:I take it the card was bought refurbished or used?
Not at all. It was brand spanking new when I bought it in October 2017 - this is a top of the line ASUS ROG-STRIX-GTX1080TI-O11G-GAMING card btw. By seller, I meant the store where I bought the card from originally. Apparently ASUS doesn't allow customers to directly RMA graphics cards here in Europe, so the store is going to send the card to their distributor to take a look at :)

Cheers

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Wed May 20, 2020 1:32 am
by MeeLee
Asus does allow to RMA, but they're @ssholes at it! They won't fix it in warranty, even if it would fall within warranty.
More than likely, they will clean up the heat sink, and renew the thermal paste.
If I were you, I'd place an extra case fan to feed the GPU some cool air...

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Wed May 20, 2020 7:35 am
by Joxster
MeeLee wrote:Asus does allow to RMA, but they're @ssholes at it! They won't fix it in warranty, even if it would fall within warranty.
More than likely, they will clean up the heat sink, and renew the thermal paste.
If I were you, I'd place an extra case fan to feed the GPU some cool air...
The VRMs are cooled by the heatsink through thermal pads, and I believe that the way ASUS installed those thermal pads is sub-optimal, so I won't be surprised if they replace the thermal pads to better ones that transfer the heat more efficiently, and hand the card back over to me. If that resolves the issue, I won't complain about it either.

The airflow in my Corsair 570X Crystal case is fine. I am running 3x Corsair 120mm RGB PRO fans in the front + 1x Corsair 120mm RGB PRO fan in the back. Plenty of air for all the components.

Cheers

Re: Overclocked GPU Projects dropping with error. -- YES.

Posted: Sun Jun 07, 2020 2:29 pm
by toTOW
Joxster wrote:So, I just wanted to drop an update over here about the issue I raised, which I see has turned into a wider discussion around overclocking in the meantime. :lol:

I took the advise of PantherX, and turned the memory clock down by a good 200Mhz as well, and bingo, the GPU projects stopped failing. But, I couldn't believe that the projects were failing because of my core or memory clock speeds, since my card OCs on it's own due to the lower temperatures, as already confirmed by another member in this thread, and I had not faced the issue even once in the 1.5 months of folding at full clock speeds on my GPU before reporting it on this thread.

A bit of monitoring in HWiNFO gave me the answer to my question of what was actually causing the issue - while my card was folding at it's normal clock speeds, I watched the temperature on the GPU VRM slowly increase, until it touched around 106C :eo, and the GPU project failed right at that moment. I tried another GPU project, and this time, the moment the VRM crossed 95C in temperature, the GPU project failed. By reducing the core and memory clocks the maximum I could, the VRM temperature was hovering around the 90C mark, but the GPU projects were not failing anymore.

Things took a turn for the worse when Red Dead Redemption 2 also started crashing due to the VRMs crossing 95C while playing the game. I have contacted the seller about this issue, and need to handover my GPU to them for the next couple of weeks, for them to either repair or replace. Hopefully they will give me a newer replacement card.

In summary - I agree with the other guys on this thread who say that overclocking should not be a problem with F@H if it's done correctly, and in my case, the card was doing it on its own, so there was definitely nothing wrong with that. I hope to get back to folding very soon.

Thanks to everyone for their responses and support.

Cheers
It reminds me the fate of my 980 Ti ... RIP

It started with occasional Bad States detected on the GPU ... then, I started to get GPU (and driver) resets ... after a while, the system started to turn off by itself ... and one day, while booting Windows 10, I saw a nice flash and flame in the GPU VRMs ... and the PC would never turn on again until I swapped the card.

Luckily it was still under warranty, and I got it replaced for free ... The 1070 I got as replacement is still folding fine after 3 years ... :D