General Troubleshooting ideas
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 349
- Joined: Sun Feb 10, 2013 6:06 pm
- Hardware configuration: Sys 1: I7 2700K@4,4GHz with NH-C14
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d
Sys 2: I7 3930K@4,4GHz with Corsair H110
16GB G.Skill Ripjaws X DDR3 1866MHz CL 9-10-9-28
ASUS Ranpage IV Formula, Ubuntu 10.10
Sys 3 i7 875K@3,826 GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI P55-GD80, Win7 64Bit Pro
Sapphire Radeon HD5870@1,163V 900/1250MHz
Sapphire Radeon HD7870@1,218V 1200/1300MHz
Sys 4 i7 2600K@4,4GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d
Optional:
ASUS P5Q Pro with Q9550
ASUS P5Q Pro with Q6300 - Location: Bavaria, Germany
Re: General Troubleshooting ideas
Did you disable all power-saving-options for the CPU in Bios?
Please check the download-size of Cat.14.4 - it may be 256MB (Win 7 /8/8.1 64Bit). If it´s bigger than you got the wrong one (with defective MoBo-drivers)
Please check the download-size of Cat.14.4 - it may be 256MB (Win 7 /8/8.1 64Bit). If it´s bigger than you got the wrong one (with defective MoBo-drivers)
-
- Posts: 56
- Joined: Mon Jun 02, 2014 10:56 pm
Re: General Troubleshooting ideas
The 14.4 drivers I have are 256 MB.
I have the power setting on all the machines set to "never" but I have not looked in the BIOS for specific power saving options. I'll do that.
Last night, on the absolutely lowest clock settings possible two machine still did not make it. The machines were up and running and I could even remote in to them but both GPU's on each machine were out of sync with the client and Windows was unable to do a restart on it's own. I had to intervene with a reset.
So I can not under clock the cards any more than they are. One of the machines should hit a point where it finishes it WU today if it can actually finish.
Future actions:
1) Check BIOS for CPU energy saving settings. Not sure why F@H would be impacted if mining wasn't but I'll check anyway. The power setting are already set to "Never" in Windows itself.
2) Try to run a machine with one GPU and see if anything changes. I'll leave the second GPU in place but just not use it. It this point I don't think it is the number of GPU's on that hardware. I think it is in incompatibility between the AMD drivers, the F@H client, and my combination of hardware.
3) Run a memory test and a CPU test on the same computer just to rule it out.
I do have two of the same GPU's running non-stop while folding in a different machine. The machine is much older, different MB, memory, PSU, etc. But they will run on that machine. The same software is installed on all the machines and actually a lot more is installed on the "older" machine. I've even swapped the GPU's with it just to see if there was something special about the two GPU's in it. Needless to say it made no difference.
I have the power setting on all the machines set to "never" but I have not looked in the BIOS for specific power saving options. I'll do that.
Last night, on the absolutely lowest clock settings possible two machine still did not make it. The machines were up and running and I could even remote in to them but both GPU's on each machine were out of sync with the client and Windows was unable to do a restart on it's own. I had to intervene with a reset.
So I can not under clock the cards any more than they are. One of the machines should hit a point where it finishes it WU today if it can actually finish.
Future actions:
1) Check BIOS for CPU energy saving settings. Not sure why F@H would be impacted if mining wasn't but I'll check anyway. The power setting are already set to "Never" in Windows itself.
2) Try to run a machine with one GPU and see if anything changes. I'll leave the second GPU in place but just not use it. It this point I don't think it is the number of GPU's on that hardware. I think it is in incompatibility between the AMD drivers, the F@H client, and my combination of hardware.
3) Run a memory test and a CPU test on the same computer just to rule it out.
I do have two of the same GPU's running non-stop while folding in a different machine. The machine is much older, different MB, memory, PSU, etc. But they will run on that machine. The same software is installed on all the machines and actually a lot more is installed on the "older" machine. I've even swapped the GPU's with it just to see if there was something special about the two GPU's in it. Needless to say it made no difference.
Re: General Troubleshooting ideas
You might try doing a visual inspection of the motherboards for bad capacitors(the little metal cans). The tops should be flat not bulged out. There was a huge problem a while back with bad caps. You could also try pulling one of them out of it's case and try running that way. Maybe there is some weird grounding issue. Doesn't seem likely since they worked for mining but nothing else is working.
-
- Posts: 56
- Joined: Mon Jun 02, 2014 10:56 pm
Re: General Troubleshooting ideas
Thanks Rel25917.
I checked the capacitors and they look fine (flat tops). I suspect it is going to come down to having to disassemble the machines and buy replacement hardware piece by piece and I'm not actually willing to do that. I've already got a ridiculous amount of money tied up with these machines and explaining to my wife that I need to buy yet more hardware just to try to find a problem would be a hard sell.
I'll most likely post in AMD's forum and see if they have anything else they can do and if not I'll either sell off the hardware or move the hardware that simply won't run the F@H client off to something else. I don't want to do that! Of course I will try an auto restart mechanism first because I would much rather have the hardware run the remainder of its life on F@H so something good can come from it but if it won't run there isn't much I can do about it although it isn't for lack of trying. The only thing with the auto restarts is that I would rather do it when an out of sync condition is detected and I'm not sure Windows will be stable enough to actually complete the restart on its own.
I've got about two more weeks before my available time drops to next to nothing. At that point detailed troubleshooting just isn't going to be an option because I'll only be near the machines a couple hours a day at most so the machines will need to be able to run unattended. So far they have shown zero ability to do that while folding.
I appreciate all the suggestions!
Update 1: I don't see CPU power saving options in my BIOS and I have a memory test running right now. So far no errors have been found. I'll let it do multiple passes at 1333 and then I'll move it back up to 2133 and run them again. Ironically the machines became noticeably more unstable when I under clocked the RAM.
Update 2: I only let the memory test finish one pass at 1333. I cancelled after that and moved the memory speed back to 2133. I'll let the 2133 memory test run multiple passes to see what happens. There were no errors found during the 1333 test.
I checked the capacitors and they look fine (flat tops). I suspect it is going to come down to having to disassemble the machines and buy replacement hardware piece by piece and I'm not actually willing to do that. I've already got a ridiculous amount of money tied up with these machines and explaining to my wife that I need to buy yet more hardware just to try to find a problem would be a hard sell.
I'll most likely post in AMD's forum and see if they have anything else they can do and if not I'll either sell off the hardware or move the hardware that simply won't run the F@H client off to something else. I don't want to do that! Of course I will try an auto restart mechanism first because I would much rather have the hardware run the remainder of its life on F@H so something good can come from it but if it won't run there isn't much I can do about it although it isn't for lack of trying. The only thing with the auto restarts is that I would rather do it when an out of sync condition is detected and I'm not sure Windows will be stable enough to actually complete the restart on its own.
I've got about two more weeks before my available time drops to next to nothing. At that point detailed troubleshooting just isn't going to be an option because I'll only be near the machines a couple hours a day at most so the machines will need to be able to run unattended. So far they have shown zero ability to do that while folding.
I appreciate all the suggestions!
Update 1: I don't see CPU power saving options in my BIOS and I have a memory test running right now. So far no errors have been found. I'll let it do multiple passes at 1333 and then I'll move it back up to 2133 and run them again. Ironically the machines became noticeably more unstable when I under clocked the RAM.
Update 2: I only let the memory test finish one pass at 1333. I cancelled after that and moved the memory speed back to 2133. I'll let the 2133 memory test run multiple passes to see what happens. There were no errors found during the 1333 test.
-
- Posts: 349
- Joined: Sun Feb 10, 2013 6:06 pm
- Hardware configuration: Sys 1: I7 2700K@4,4GHz with NH-C14
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d
Sys 2: I7 3930K@4,4GHz with Corsair H110
16GB G.Skill Ripjaws X DDR3 1866MHz CL 9-10-9-28
ASUS Ranpage IV Formula, Ubuntu 10.10
Sys 3 i7 875K@3,826 GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI P55-GD80, Win7 64Bit Pro
Sapphire Radeon HD5870@1,163V 900/1250MHz
Sapphire Radeon HD7870@1,218V 1200/1300MHz
Sys 4 i7 2600K@4,4GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d
Optional:
ASUS P5Q Pro with Q9550
ASUS P5Q Pro with Q6300 - Location: Bavaria, Germany
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: General Troubleshooting ideas
Okay, so if you can have the system idle for several hours without issue, then you know that the issue surfaces when there is load on the system. Do note that not only the memory usage is different between Scrypt mining and F@H, but also different GPU components can be used which may explain why you aren't seeing an issue with Scrypt mining but the issue occurs while folding.ChasingTheDream wrote:...I didn't leave a machine sitting on the BIOS screen for several hours. I suppose I can. The issue isn't keep the machines up. The issue is keeping the machine up and running the F@H client. As I've mentioned previously these machines ran for weeks unattended while Scrypt mining. They only have trouble when I try to use them for F@H and I'm not sure why. I'm aware the memory usage is quite different with F@H...
It seems that you have ruled out a significant portion of the hardware components. Below are the ones you may consider:ChasingTheDream wrote:...I guess the issue is I'm not exactly sure what else can to done to troubleshoot. I've swapped GPU's, memory, fresh Windows installs, fresh AMD driver installs, under clocked the GPU's, under clocked the system memory, moved power cables, switched GPU slots, updated BIOS on all machines, switched to PCI2, and enabled crossfire. None of it has made any difference at all...
1) Motherboard -> Perhaps you were unlucky and got a batch of defective ones. Thus, swapping them around to see if they are indeed defective or not.
2) HDD/SSD -> Maybe the drive is wonky and causing issues. On one of my systems, it refused to boot-up and the issue was a faulty drive.
3) PSU -> Swapping it with a good system to see if it makes any difference
4) Power cables -> Are you using extenders? If so, try without any extenders and see if it solves your issue or not.
5) Minimum hardware -> Physically remove all excess hardware and only use the minimum basic hardware to fold, i.e. 1 RAM stick, CPU, 1 GPU, 1 HDD.
6) CPU -> Swap the CPU with a good system to see if the CPU is got damaged or not.
Here is some F@H specific tools:ChasingTheDream wrote:...The only thing left I can even think of it running the memory stress test and maybe CPU stess test. I will be stunned if all the memory in all the computers is bad though and same applies to the CPU...
CPU -> StressCPU (http://folding.stanford.edu/home/downlo ... ties#ntoc1)
GPU -> FAHBench (http://fahbench.com/) - The website is down currently and I have informed the appropriate personal so you may want to check it out once its available.
That would suggest that something common is the issue but unfortunately, it hasn't been discovered yet.ChasingTheDream wrote:...I would be more inclined to think it is a hardware defect if it was one machine. It's hard to imagine I have the same defect in 6 nearly identical machines...
How did you remote in; Microsoft Remote Desktop, TeamViewer, etc? Was there any driver reset information in the Windows Event Log?ChasingTheDream wrote:...The machines were up and running and I could even remote in to them but both GPU's on each machine were out of sync with the client and Windows was unable to do a restart on it's own. I had to intervene with a reset...
Assuming that checkpoints are written correctly, restarting the system shouldn't cause any loss of WUs, only a certain amount of work will be lost. If Windows can't restart without issues, then it could indicate an unstable OS. Windows should be able to restart without any issues.ChasingTheDream wrote:...The only thing with the auto restarts is that I would rather do it when an out of sync condition is detected and I'm not sure Windows will be stable enough to actually complete the restart on its own...
You currently have two options, neither are standard and are considered experimental:ChasingTheDream wrote:...I've got about two more weeks before my available time drops to next to nothing. At that point detailed troubleshooting just isn't going to be an option because I'll only be near the machines a couple hours a day at most so the machines will need to be able to run unattended. So far they have shown zero ability to do that while folding...
1) You can install Ubuntu 12.04 64-bit with the AMD proprietary driver. Then, you install V7.4.4. set it to client-type advanced and allow it to fold FahCore_17 WUs. Help can be provided if you need it.
2) You could install Ubuntu 12.04 with the AMD proprietary driver and attempt to run ocores (viewtopic.php?f=66&t=26218). However, do note that since you aren't a Beta Team Member, help can't be provided in this Forum as part of the Forum policy but you can get help on the IRC.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: General Troubleshooting ideas
What' you've called the out-of-sync condition is not the cause of your problems; it's a result of a GPU that has hung. As I've said elsewhere, there are many possible reasons why a GPU can hang, but the most common is associated with a gpu-reset from the OS. That doesn't really change anything except it would be more accurate to call it a hung GPU.
There are no known cases where crossfire has changed anything associated with FAH once the code was changed to address each GPU independently (which was very early in the development of V7).
There have been cases where memory settings (under-/over-clocking) produced similar symptoms. I do not remember the exact settings, but I do remember that at least one of the settings seemed backward -- a faster/tighter timing setting was more stable than a slower/more relaxed setting.
You said you had a stable machine with identical GPUs. Does that mean that the ones you're having trouble with are all unmatched? It's kind of a wild idea, but I suppose there's a chance that the drivers have more difficulties dealing with multiple GPUs with different GPU clock rates. Of course that shouldn't be a problem, but it's one more thing you can look for (even more so if they're different GPU design generations).
When this is eventually resolved, It will be interesting to know what we've learned.
There are no known cases where crossfire has changed anything associated with FAH once the code was changed to address each GPU independently (which was very early in the development of V7).
There have been cases where memory settings (under-/over-clocking) produced similar symptoms. I do not remember the exact settings, but I do remember that at least one of the settings seemed backward -- a faster/tighter timing setting was more stable than a slower/more relaxed setting.
You said you had a stable machine with identical GPUs. Does that mean that the ones you're having trouble with are all unmatched? It's kind of a wild idea, but I suppose there's a chance that the drivers have more difficulties dealing with multiple GPUs with different GPU clock rates. Of course that shouldn't be a problem, but it's one more thing you can look for (even more so if they're different GPU design generations).
When this is eventually resolved, It will be interesting to know what we've learned.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 56
- Joined: Mon Jun 02, 2014 10:56 pm
Re: General Troubleshooting ideas
@Bruce Yeah I seem to won the hardware lottery nightmare!
It is frustrating to say the least. I will describe the issue as a hung GPU going forward.
I can confirm that unclocking the GPU's to their lower limits actually makes them far more unstable! I've seen that across all my machines.
Yeah I have one machine that is using a different motherboard and is much older that folds with two of the same GPU's without incident. It is actually the machine I'm typing this message on right now and it will fold at full speed while I multi-task, watch video's, have multiple browser tabs open, etc with no issues. The GPU clock settings across all the machine are the same though. I tried downclocking them in to find a point of stability but have discovered no such thing exists. So I moved their clock speeds back up to core: 940, memory: 1200.
All of the machines that are unstable folding with two 290X TRIX GPU's are using EVGA Z87 Classified motherboards. The machine that runs stable is on an older ASUS Sabertooth P67 motherboard.
It makes me wonder if it is a BIOS setting or something for the EVGA motherboards. I think that is what folding_hoomer is referring to above. I could not find the settings in the BIOS he was referring to so I called EVGA support and disabled the only power saving options in the BIOS. Unfortunately, it hasn't made a difference.
Per your recommendation I was running one of my most unstable systems with just one card folding although two cards are actually in the system. With only one card folding it has not had an issue for over 36 hours now. On that system running 36 hours is a milestone. I believe for whatever reason it can handle folding with one GPU just fine but it doesn't like folding with two. I'm not sure exactly what that means but that is what is happening.
I'm now trying to overvolt the GPU's to see if it helps.

I can confirm that unclocking the GPU's to their lower limits actually makes them far more unstable! I've seen that across all my machines.
Yeah I have one machine that is using a different motherboard and is much older that folds with two of the same GPU's without incident. It is actually the machine I'm typing this message on right now and it will fold at full speed while I multi-task, watch video's, have multiple browser tabs open, etc with no issues. The GPU clock settings across all the machine are the same though. I tried downclocking them in to find a point of stability but have discovered no such thing exists. So I moved their clock speeds back up to core: 940, memory: 1200.
All of the machines that are unstable folding with two 290X TRIX GPU's are using EVGA Z87 Classified motherboards. The machine that runs stable is on an older ASUS Sabertooth P67 motherboard.
It makes me wonder if it is a BIOS setting or something for the EVGA motherboards. I think that is what folding_hoomer is referring to above. I could not find the settings in the BIOS he was referring to so I called EVGA support and disabled the only power saving options in the BIOS. Unfortunately, it hasn't made a difference.
Per your recommendation I was running one of my most unstable systems with just one card folding although two cards are actually in the system. With only one card folding it has not had an issue for over 36 hours now. On that system running 36 hours is a milestone. I believe for whatever reason it can handle folding with one GPU just fine but it doesn't like folding with two. I'm not sure exactly what that means but that is what is happening.
I'm now trying to overvolt the GPU's to see if it helps.
-
- Posts: 56
- Joined: Mon Jun 02, 2014 10:56 pm
Re: General Troubleshooting ideas
@PatherX I think the common component across all the machines that are unstable is the motherboard which makes me wonder if there is a BIOS setting that needs to be changed. Unfortunately I have no idea how to figure out which setting to test. I've disabled the power saving type settings already.
The machine I've used for testing has passed stress tests just fine so I don't think it is a hardware failure. The FAHBench test is still not available but when it is I'll try it as well.
Unfortunately I know nothing about Ubuntu but if I need to go that route I'm willing to give it a shot. I want my hardware to live out its life folding. In the worst case scenario I could just use one card per machine because that seems stable (see message above to Bruce) but that sure is a waste of hardware having another GPU literally doing nothing. As mentioned above, I've overvolted GPU's on another machine just to see if they run more stable. I'll report my findings.
The machine I've used for testing has passed stress tests just fine so I don't think it is a hardware failure. The FAHBench test is still not available but when it is I'll try it as well.
Unfortunately I know nothing about Ubuntu but if I need to go that route I'm willing to give it a shot. I want my hardware to live out its life folding. In the worst case scenario I could just use one card per machine because that seems stable (see message above to Bruce) but that sure is a waste of hardware having another GPU literally doing nothing. As mentioned above, I've overvolted GPU's on another machine just to see if they run more stable. I'll report my findings.
-
- Posts: 2948
- Joined: Sun Dec 02, 2007 4:36 am
- Hardware configuration: Machine #1:
Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).
Machine #2:
Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.
Machine 3:
Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32
I am currently folding just on the 5x GTX 460's for aprox. 70K PPD - Location: Salem. OR USA
Re: General Troubleshooting ideas
Rather than playing with voltages, I would keep everything stock (KISS principal) and just switch folding to the other GPU (keeping it stock too) so you are still running one GPU, just the other GPU.
The goal is to simplify and isolate variables. If everything is stock, and you have two GPU's and it works on one then the next obvious question is to see if folding works on the other at stock. The objective is why won't it work with both at the same time and before that can be solved, you need to know if it works on both individually.
Slow, steady, and methodical with logic wins this type of frustrating diagnosis.
The goal is to simplify and isolate variables. If everything is stock, and you have two GPU's and it works on one then the next obvious question is to see if folding works on the other at stock. The objective is why won't it work with both at the same time and before that can be solved, you need to know if it works on both individually.
Slow, steady, and methodical with logic wins this type of frustrating diagnosis.
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: General Troubleshooting ideas
Okay, so now you can try resetting the BIOS to the factory stock settings and see if it improves the stability of the system when folding with 2 GPUs (you may have to remove the battery). This can be done on the other system while the another is only folding on the other GPU (the one which was folding for 36 hours without issues).ChasingTheDream wrote:@PatherX I think the common component across all the machines that are unstable is the motherboard which makes me wonder if there is a BIOS setting that needs to be changed. Unfortunately I have no idea how to figure out which setting to test. I've disabled the power saving type settings already...
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
-
- Posts: 349
- Joined: Sun Feb 10, 2013 6:06 pm
- Hardware configuration: Sys 1: I7 2700K@4,4GHz with NH-C14
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d
Sys 2: I7 3930K@4,4GHz with Corsair H110
16GB G.Skill Ripjaws X DDR3 1866MHz CL 9-10-9-28
ASUS Ranpage IV Formula, Ubuntu 10.10
Sys 3 i7 875K@3,826 GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI P55-GD80, Win7 64Bit Pro
Sapphire Radeon HD5870@1,163V 900/1250MHz
Sapphire Radeon HD7870@1,218V 1200/1300MHz
Sys 4 i7 2600K@4,4GHz with Scythe Mine2
8GB G.Skill Sniper DDR3 1866MHz CL 9-10-9-28
MSI Z68A-GD65 (G3), various operating systems (WinXP, Ubuntu: 10.4.3 LTS, 12.04.2 LTS)
Optional: GTX560TI 448@stock/OC´d
Optional:
ASUS P5Q Pro with Q9550
ASUS P5Q Pro with Q6300 - Location: Bavaria, Germany
Re: General Troubleshooting ideas
If one GPU is running stable than try to use another PCI-E-Slot for the second GPU than used in a former setup.
-
- Posts: 56
- Joined: Mon Jun 02, 2014 10:56 pm
Re: General Troubleshooting ideas
Here's an update:
The voltage changes made no difference with stability so I put them back to normal.
The computer that has been folding with one GPU (while having two GPU's physically in the machine) ran for three days without incident. I've now switched the processing to the second GPU and removed the first GPU from the FAH client. The GPU's are still in the same spots they have always been in. I'll report back on how this goes. I'm guessing it will run fine as long as only one GPU is in use at a time. Neither of the GPU's are running at stock clock speeds though. I couldn't keep a single GPU folding at stock clock speeds on any machine. I'll settle for slightly under clocked if I can just get things stable. I could reset the clocks on the second GPU to defaults just to see what happens but that isn't the clocks the first GPU ran at.
The BIOS settings are completely default except for memory speed (set to 2133 and highly unstable at 1333 which is the default) and I did disable a CPU power saving settings based on a previous request, but other than those two things the BIOS is completely default in all the machines. I've got a clear CMOS button on the motherboard so getting to defaults is no problem but I've already tested running the machines at completely default BIOS settings and it made no difference.
There is clearly something the machines don't like about trying to fold with two of my GPU's at the same time, but it is still a mystery as to what it is. There was a suggestion to move away from PSU cables that have split connectors and use single cable connectors (two individual cables) per GPU. I don't have enough individual cables to do that for all the GPU's but I could test one machine with it. I don't see how that can be an issue though since the same GPU's ran at much higher power draws before for weeks at a time but at this point I'm willing to try anything.
I could also use a supplemental power supply so one GPU is powered from a completely separate PSU just to see if that makes a difference.
The voltage changes made no difference with stability so I put them back to normal.
The computer that has been folding with one GPU (while having two GPU's physically in the machine) ran for three days without incident. I've now switched the processing to the second GPU and removed the first GPU from the FAH client. The GPU's are still in the same spots they have always been in. I'll report back on how this goes. I'm guessing it will run fine as long as only one GPU is in use at a time. Neither of the GPU's are running at stock clock speeds though. I couldn't keep a single GPU folding at stock clock speeds on any machine. I'll settle for slightly under clocked if I can just get things stable. I could reset the clocks on the second GPU to defaults just to see what happens but that isn't the clocks the first GPU ran at.
The BIOS settings are completely default except for memory speed (set to 2133 and highly unstable at 1333 which is the default) and I did disable a CPU power saving settings based on a previous request, but other than those two things the BIOS is completely default in all the machines. I've got a clear CMOS button on the motherboard so getting to defaults is no problem but I've already tested running the machines at completely default BIOS settings and it made no difference.
There is clearly something the machines don't like about trying to fold with two of my GPU's at the same time, but it is still a mystery as to what it is. There was a suggestion to move away from PSU cables that have split connectors and use single cable connectors (two individual cables) per GPU. I don't have enough individual cables to do that for all the GPU's but I could test one machine with it. I don't see how that can be an issue though since the same GPU's ran at much higher power draws before for weeks at a time but at this point I'm willing to try anything.
I could also use a supplemental power supply so one GPU is powered from a completely separate PSU just to see if that makes a difference.
-
- Posts: 2948
- Joined: Sun Dec 02, 2007 4:36 am
- Hardware configuration: Machine #1:
Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).
Machine #2:
Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.
Machine 3:
Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32
I am currently folding just on the 5x GTX 460's for aprox. 70K PPD - Location: Salem. OR USA
Re: General Troubleshooting ideas
If the two GPU's will run individually but not together, then power is by far the most likely cause of problems as long as temps stay reasonable.
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: General Troubleshooting ideas
FYI, you could maybe head over to the EVGA Forum (http://forums.evga.com/EVGA-Z87-Series-f88.aspx) and ask about your motherboard since that is the only common component across all your unstable systems?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues