Page 1 of 1

ASUS R904 G34

Posted: Fri Oct 15, 2021 9:27 am
by Markus$Cologne
Hi,

I obtained the aforementioned server (4 Processors, 64 cores in total, 128 GB RAM) which was part of a former supercomputer that comprised of 535 of these machines. (287 TFLOPS peak)
Operating System is Ubuntu 20.04.3 LTS with all patches etc. applied.
f@h Software that was installed is "fahclient_7.6.21_amd64.deb"

I can't send images here, but I can tell you that all processors are well above 80% or 85% load - and this must be true based on the noise of the fans in the machine, as well as the dissipated heat.

What wonders me here is the point that with this machine it still takes 4 hours with all 64 cores reported "in use" by the related fahcontrol program to complete a work-unit...

Are there any hints to speed up things - other than getting a faster machine ? My idea was that with 64 cores it would take an hour or so to handle on work-unit. Or is this a wrong idea from my part ?
My 4-core Desk-PC processes a work-unit in about 5 hours.

Many thanks for any usefule hint or advice from Cologne / Germany

Re: ASUS R904 G34

Posted: Fri Oct 15, 2021 3:17 pm
by JimboPalmer
Welcome to Folding@Home!

F@H sizes the Work Unit based on the number of CPUs devoted to folding, so while both may be taking the same amount of time, more CPUs should be getting more Points Per Day as it is working on more challenging proteins.

The different ages of CPUs have different capabities,so older CPUs, may be slower per CPU.

If you used Windows, there would be tricks to use over 32, Linux should be fine.

Re: ASUS R904 G34

Posted: Fri Oct 15, 2021 4:44 pm
by PaulTV
If you happened to get one of the monster CPU jobs, 4 hours isn't bad at all. I've had jobs that took like 36 hours on 14 threads (AMD 5800X) - and the next one may be done within an hour. Job sizes for different projects will have different sizes, depending on number of atoms and number of steps.

Re: ASUS R904 G34

Posted: Fri Oct 15, 2021 9:03 pm
by jchang6
are the CPUs Intel or AMD? 4 sockets, 64 cores : is that 32 cores and 64 threads or 64 physical cores?
if AMD, it could be Interlagos (2011) or Abu Dhabi (2012).
if Intel, it would have to Haswell generation of Xeon E7 v3 (2015) or more recent.
The AMD cores of that era (prior to Zen) were weaker, the Intel Haswell should be half way decent. What was PPD? FaH does seem to assign big jobs to high core count systems

Re: ASUS R904 G34

Posted: Sat Oct 16, 2021 4:13 pm
by Markus$Cologne
Hi, thanks for the quick reply and all the detailed information.
There are 4 AMD processors in the machine with 16 cores each. They are of the "Interlagos" type.

You mentioned that FaH seems to have problems with assigning jobs to high core count systems - Ubuntu Systems-Management reports 64 processors, all with loads 80% or higher - and this should be true, since the speed of the fans ramps up conderarbly as soon as FaH starts up automatically after system boot.

So despite the number of cores the performance per core seems to be the issue and my expectations were slightly wrong. I am testing some BIOS settings, eventually I can squeeze some performance out of the system. If anyone has a clue to speed up things - comments are very welcome !

Regards

Re: ASUS R904 G34

Posted: Sat Oct 16, 2021 4:33 pm
by Neil-B
There are some projects that will really use large thread counts well - others are far less scalable and you will not get optimal throughput/ppd from them ... you need to run a fair few projects to get a feel for what your highs/lows in throughput/ppd are.

Make sure you are monitoring temps/cpu boost speeds - it is perfectly possible to have a situation where you are running all threads/core at max but the thermals are reducing the clock rates by a significant amount - halving a core/thread count can cool off the system and increase clock speeds giving little is any drop in throughput/ppd.

Server grade kit can tend to be loud ... and you need to make sure it is configured properly or it can be more so ... with intel kit checking the fru/sdr is important as otherwise the server may not actually know what configuration it is and may not be managing itself properly (including clocks/thermals) - I guess that AMD kit has something similar that needs to be configured for the server to run optimally - not just a case of bios settings with some servers as they will have their own management suite as well.

Re: ASUS R904 G34

Posted: Sat Oct 16, 2021 9:22 pm
by gunnarre
I'm wondering if making one CPU slot for each of the processors might be a good idea (4x 16 threads), if inter-CPU communication has to be done through slow RAM, or if the hypervisor is moving threads between CPUs. One 64-thread slot should in theory be better, but with four 16-core CPUs instead of one 64-core Threadripper/Xeon (with shared fast cache), I'm not so sure that running one slot is the optimal configuration.

If indeed this is the problem, the high CPU load might be mainly comprised of actively waiting for RAM/bus to access data from a thread on a different CPU, rather than active processing.

Edit: Or perhaps some other kind of NUMA-related CPU affinity can be done on the OS level.

Re: ASUS R904 G34

Posted: Mon Oct 18, 2021 3:35 pm
by MeeLee
gunnarre wrote:I'm wondering if making one CPU slot for each of the processors might be a good idea (4x 16 threads), if inter-CPU communication has to be done through slow RAM, or if the hypervisor is moving threads between CPUs. One 64-thread slot should in theory be better, but with four 16-core CPUs instead of one 64-core Threadripper/Xeon (with shared fast cache), I'm not so sure that running one slot is the optimal configuration.

If indeed this is the problem, the high CPU load might be mainly comprised of actively waiting for RAM/bus to access data from a thread on a different CPU, rather than active processing.

Edit: Or perhaps some other kind of NUMA-related CPU affinity can be done on the OS level.
Exactly what I was going to suggest.
The main problem with assigning 1 WU to all cores, is inter-core activity. Certain data that's written to the L-cache in core 1, now has to travel to the significantly slower PCIE bus, to be read by a thread on another CPU core.
This is extremely inefficient.
Hence why allocating 4 CPUs in the program, each controlling their own CPU.
Also, leave about 1 thread of the CPU for background data processing, unless all it does is fold. Even then, 15 threads per CPU or WU are plenty and PPD will not be affected much over 16 threads.

Re: ASUS R904 G34

Posted: Sat Oct 23, 2021 11:58 pm
by WhitehawkEQ
I have 2 Opteron 6276 systems, I did run Ubuntu but I now run Win 10 Pro for workstations

I have these pics over on Overclockers.com, if you can post your pics to a forum, you can then link them here.

Image
Image
Image

Re: ASUS R904 G34

Posted: Sun Oct 24, 2021 9:51 am
by gunnarre
Have you tried the suggestion to make more CPU slots?

Re: ASUS R904 G34

Posted: Sat Nov 19, 2022 12:30 am
by b_comly
gunnarre wrote: Sun Oct 24, 2021 9:51 am Have you tried the suggestion to make more CPU slots?
I have actually tried this. It isn't sustainable currently in windows. Multiple CPU slots can find them both on the same NUMA node on a Threadripper. I've also tried manually setting affinity only to find it back on node0 on the next WU.

Currently this only really works when running in Linux when and with a cpu slot configured threads/4-2 and only using 2 slots at a time for the fastest result.

Re: ASUS R904 G34

Posted: Sat Nov 19, 2022 1:53 am
by JimboPalmer
https://www.amd.com/en/product/1546
https://www.cpu-world.com/CPUs/Bulldoze ... TGGGU.html

This may be your CPU.

I am guessing AVX is the fastest floating point math it knows.