2nd GPU Spiking Up and Down

Driver issues associated with the Windows 10 roll-out

Moderators: Site Moderators, FAHC Science Team

schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

2nd GPU Spiking Up and Down

Post by schapman1978 »

I'm noticing some unusual second GPU slot activity tonight. It's working on WU project 14564 when I noticed this. Not sure if it's WU dependent or what. I have a pair of 2080 ti's folding and the first one's copy activity and CUDA usage is pretty flat and level as you can see in the images. The second one loads up for a few seconds, then drops off, then repeats over and over and over. I can hear the light coil whine, then it stops. Then restarts. It's not unusual to hear this but when I started looking at the card activity closer, I'm seeing these patterns. There is no NVI-link between them and they are not set up in SLI. They are both holding steady clocks and the second card is doing this whether its stock, under, or overclocked. It will grind on like this indefinitely but I'm trying to understand why it's loading up and dropping off. Then repeating. It causes a spike of about 125-150W each time. It's strange to me. Any thoughts? Reinstalled latest drivers, restarted client, OpenCL options at default.

Sorry for the zoomed in size - Imgur did something weird or I got the code wrong to link it right.

Image
Image
Last edited by schapman1978 on Sat Apr 25, 2020 8:28 am, edited 1 time in total.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 2nd GPU Spiking Up and Down

Post by PantherX »

It seems that your GPU is being starved of CPU Cycles. I would suggest that you pause whatever is causing your CPU hit 100% usage and see if the issue goes away or not.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: 2nd GPU Spiking Up and Down

Post by MeeLee »

Interesting.
There's always some up and down motion, as the GPU writes to VRAM, and waits for PCIE data.
Your first GPU shows this.
Are you running PCIE Gen 2.0 on your Motherboard (running DDR3 RAM; GPU-Z can tell you).
If so, you'll need to configure your system to run at x8 speeds on both GPUs. Running below this, might starve the GPU on PCIE bandwidth.
If you're running PCIE 3.0, you'll preferably have x8, but x4 speed should work as well.
Rel25917
Posts: 303
Joined: Wed Aug 15, 2012 2:31 am

Re: 2nd GPU Spiking Up and Down

Post by Rel25917 »

Do the dips coincide with every 1% or so of the workunit? Could just be a dip while it does its checkpoints.
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 2nd GPU Spiking Up and Down

Post by schapman1978 »

I paused the CPU entirely in FAH and changed it from 28 threads to 26 and 16 etc and it still occurs. If the CPU is fully paused in FAH the valleys are much more shallow. It’s an AMD 3950x fwiw. If I paused the cpu from folding it runs at about 1-3% usually handling tasks.

I’m also running 32GB DDR4-3600 at stock CL16 timings and have dual PCIE 2080ti’s which run natively at 8x/8x on this X570 board. I have a pcie 4.0 m.2 drive in slot one but I’m wondering if having a second m.2 on the second slot might be shorting bandwidth to card 2 for some reason - it’s a gen 3 pcie m.2 and that m.2 is run by the x570 chipset so it shouldn’t since 4 lanes are dedicated to the chipset by the cpu - but I’m willing to pop it out and see if you think that makes sense. I keep wondering if maybe because of the second m.2 if maybe the second gpu slot is going 4x or something.

I’ll check in bios if it’s posting 8/8 when I get up.

And unfortunately the 1% dips aren’t coinciding with the constant dips. I wish I could crunch stuff that fast lol... this is pretty rhythmic and every 5-7 seconds or so I’d guess by memory.

Good ideas - we’re thinking similarly. Open to anything.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 2nd GPU Spiking Up and Down

Post by bruce »

The frequency of the checkpoints is defined by the Project Owner.

FAH runs what is called a "sanity check" which gives the analysis a chance to be aborted if the WU is, in fact, unstable. The actual GPU process is suspended briefly when this is processed. It is probably synchronized with the checkpoints, but I'm not sure if that's always true.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 2nd GPU Spiking Up and Down

Post by PantherX »

Get GPU-Z and see what the PCIe utilization is and also the speed that it is operating at from within the OS.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 2nd GPU Spiking Up and Down

Post by schapman1978 »

Well, I'm up anyway for other reasons - it's still doing it - but in a twist - it's now doing it on the physical first slot card - not the second one. I also screenshotted my GPUz screens showing them at 8x/8x - I guess that defeats my possible 2nd pcie m.2 bandwidth theft theory. Hmm...

Sorry for delay pics sizes are terrible I'm working through the coding
Image
Image
uyaem
Posts: 222
Joined: Sat Mar 21, 2020 7:35 pm
Location: Esslingen, Germany

Re: 2nd GPU Spiking Up and Down

Post by uyaem »

Are the GPUs working on different projects now, could it be a project-specific "glitch" (intentional or not)?
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 2nd GPU Spiking Up and Down

Post by schapman1978 »

It also appears that both GPUs can do it at the same time. I wonder if it's just the sanity check happening every 5 or so seconds and might be normal? I've always heard the intermittent noise breaks of the cards even in single or double configuration but never investigated assuming it was normal behavior. My checkpoints are set at 5m manually but this is like clockwork every 5 seconds. Looks like I finished a unit and picked up another - they're both now folding a piece of 14564 and both are dipping like that. I wonder if it's just the project ?

Image
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 2nd GPU Spiking Up and Down

Post by PantherX »

For GPUs, the value you configure for checkpoints is ignored. However, for the CPU, the checkpoint value applies. In the case of GPU checkpoints, the researcher sets the checkpoint interval which can vary from 2% to 5% IIRC.

The drops every 5 seconds is weird. Can you try pausing GPU01 and observing GPU02. Then pause GPU02 and unpause GPU01 and observe what happens. I would observe each attempt for 5 minutes.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 2nd GPU Spiking Up and Down

Post by schapman1978 »

Good idea - I did a short 30 seconds of pausing 1 gpu and also pausing the cpu too with it so only 1 gpu was folding at a time (each scenario for about 30 seconds.) Then I repeated that test of the other GPU running with the CPU, then with the cpu paused. It exhibits the same spiking behavior. The only difference, regardless of which GPU is running this unit, is that if the CPU is totally paused (not reduced core usage but fully paused) either or both cards ramp up a few % points for usage and the dips become significantly less severe. But they always happen on time like a metronome - I think it might be programmed to run this way. Not sure. I'll try a longer sample test but I expect this behavior to persist.

I'm also going to reboot everything and see if it replicates. I'm only paying attention because I just jumped to Win10 Pro from Win10 Home tonight before I went to bed. OS swaps always makes me paranoid at first. Too bad I can't fire up a VM and run it in ubuntu or something to see if it's the same.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 2nd GPU Spiking Up and Down

Post by PantherX »

I am running Windows 10 Pro 1909 64-bit and this is what it looks like:
Image
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 2nd GPU Spiking Up and Down

Post by schapman1978 »

Right on - I'm going to let these two chunks finish and wait until I get a new project and see if it's just this particular project. Both my GPU's are folding 14564 and exhibiting this behavior after a reboot still. I've had other projects where it was a nice level line (with minor variations like yours) on other projects. If a different project doesn't replicate this issue, maybe I should post something in that "problems with a particular WU" thread? I don't know if it a problem or not by standards. It appears its going to finish them, but maybe this optimization is causing it to take a lot longer than if it wasn't faceplanting both GPU's every 5 seconds. I'm not a programmer - I'm just thinking out loud.

Here's a GPUz sensors shot with it showing info for both and task manager on the same shot - it does this with the GPU's clocked up some or at default settings. They just fold slower and a little quieter at default clocks but spikes exist.
Image
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 2nd GPU Spiking Up and Down

Post by schapman1978 »

Yeah I realized it's doing the exact same thing on another workstation I'm folding on here that's a single 2080 setup. Identical behavior with or without CPU running like this machine. I went ahead and put a thread up in the WU section for the owner to take a peek at. I just got my 6th chunk of 14564 and its doing the same thing. So far, it's been on
(1440, 0, 1)
(1251, 0, 2)
(341, 0, 2)
(1318, 0, 1)
(745, 0, 3)
(225, 0, 4)

Link to that thread here viewtopic.php?f=19&t=34797
Post Reply