NUMA node affinity

Moderators: Site Moderators, FAHC Science Team

CommanderLake
Posts: 15
Joined: Tue Mar 03, 2020 11:11 am

NUMA node affinity

Post by CommanderLake »

How do I assign CPU slots to specific NUMA nodes to avoid memory accesses traversing the QPI bus?
HaloJones
Posts: 920
Joined: Thu Jul 24, 2008 10:16 am

Re: NUMA node affinity

Post by HaloJones »

I don't believe you can do this within FAH. You could do this within your OS assuming that the FAH slots are configured to use no more than the number of cpus within one of your nodes. But exactly how would depend on your OS.
single 1070

Image
CommanderLake
Posts: 15
Joined: Tue Mar 03, 2020 11:11 am

Re: NUMA node affinity

Post by CommanderLake »

Forgot to mention I'm using Linux Mint with kernel 5.3, on Windows I could do it but there's a weird thread limit even with multiple slots, no idea how on Linux unless I could replace the command that launches the cores for each slot with a numactl command.
Some work units don't like 56 threads and crash so its hard to permanently get full utilization.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: NUMA node affinity

Post by bruce »

You can probably get better answers directly from gromacs.org. FAH has adapted their analysis package and uses it for folding using the CPUs on home computers.. As far as 56 threads is concerned, FAH is customized for @home computers and it's probably fair to assume that the typical home computer doesn't have 56 threads and NUMA. The work-around is to configure multiple CPU slots using a "reasonable" number of threads (for home computers).
CommanderLake
Posts: 15
Joined: Tue Mar 03, 2020 11:11 am

Re: NUMA node affinity

Post by CommanderLake »

I've been messing around with multiple slots and threads per slot and 8 slots with 8 threads each seems to be fully utilizing all logical cores without crashing(so far).
CommanderLake
Posts: 15
Joined: Tue Mar 03, 2020 11:11 am

Re: NUMA node affinity

Post by CommanderLake »

Still much lower PPD than one slot when its able to use 56 threads.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: NUMA node affinity

Post by foldy »

I remember 32 threads being the maximum used by FAH. So shouldn't 2 slots with 32 threads each be enough? Is the numa node memory accesses traversing the QPI bus really a bottleneck for FAH?
CommanderLake
Posts: 15
Joined: Tue Mar 03, 2020 11:11 am

Re: NUMA node affinity

Post by CommanderLake »

On windows its limited to 32 threads, I think its something to do with MPI or something being only 32 bit but there doesn't seem to be such a limit on Linux.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: NUMA node affinity

Post by bruce »

For Windows, 32 + 24 = 56 although projects with smaller numbers of atoms in the protein will place some additional restrictions on GROMACS.
(Unfortunately my Windows machines aren't big enough to test this all out myself.)
CommanderLake
Posts: 15
Joined: Tue Mar 03, 2020 11:11 am

Re: NUMA node affinity

Post by CommanderLake »

I tried multiple slots on windows and it just didn't want to cooperate.

The GPU core uses thousands of threads, the CPU version really needs to be updated especially with AMD's 32 and 64 core Threadripper.
Last edited by CommanderLake on Tue Mar 10, 2020 8:40 pm, edited 1 time in total.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: NUMA node affinity

Post by Joe_H »

F@h is not using MPI.

The 32 thread limit is from Windows. You need specific version of the Windows license to exceed that, and the executable needs to be compiled with the right flags and other options set.

The same code base is used by F@h for the versions running on Windows, Linux and OS X. Some of the I/O and other OS specific modules are different. Under Linux the CPU Core_A7 has been tested with over 100 threads on larger systems being worked on. But projects with that many atoms in the simulation usually go to GPU folding now.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
CommanderLake
Posts: 15
Joined: Tue Mar 03, 2020 11:11 am

Re: NUMA node affinity

Post by CommanderLake »

The 32 thread limit is not from windows, I code parallel stuff and have never found such a limit.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: NUMA node affinity

Post by bruce »

You're probably not running on the home version of Windows which does have the 32 thread limit. M$ wants to sell a higher priced license. It may also depend on Win7 vs Win10 -- I don't remember.

I'm also not sure which compiler flags were used when FAHCore_a7 was compiled and if that would cause problems to somebody with an older/cheaper license.
CommanderLake
Posts: 15
Joined: Tue Mar 03, 2020 11:11 am

Re: NUMA node affinity

Post by CommanderLake »

I have the 32 thread limit on Server 2016.
CommanderLake
Posts: 15
Joined: Tue Mar 03, 2020 11:11 am

Re: NUMA node affinity

Post by CommanderLake »

Started up server 2016 again, one slot with 28 threads(14 cores per node + HT) runs fine.
Any more than 32 threads is an invalid option.
With one slot running 28 threads a second slot will not run, even with 8, 4 or even 1 thread, it keeps downloading then returning work units and marking the project as faulty.
Post Reply