Creation of checkpoints taking over my CPU!

Moderators: Site Moderators, FAHC Science Team

ETA_2025
Posts: 112
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4070
20 x Raspberry Pi 5 Model B 2GB RAM
Location: VIC, Australia

Creation of checkpoints taking over my CPU!

Post by ETA_2025 »

Why is it that my GPU creates a checkpoint every two percent, that requires almost, if not, 100% of my CPU? This makes watching videos painful, as they hang with a loud buzzing sound, while the checkpoint is created.

None of my RPI 5's create a single checkpoint.

Is it possible to set when a checkpoint is created, say once every 5%, or even disable the creation of checkpoints completely?
Last edited by ETA_2025 on Mon Apr 21, 2025 7:22 am, edited 1 time in total.
Image
arisu
Posts: 373
Joined: Mon Feb 24, 2025 11:11 pm

Re: Creation of checkpoints taking over my CPU!

Post by arisu »

It's not the checkpoint taking all the CPU, it's sanity checks that are being done to verify the accuracy of the work. They just happen to be done at the same time as checkpoints. They're important. You'll see things like this in the science.log file in the work folder:

Code: Select all

Completed 1800000 out of 2500000 steps (72%)
  Performance since last checkpoint: 11.30890052 ns/day
  Running tests
  All tests passed.
  Appending to XTC file positions.xtc
  Writing binary checkpoint
  Binary checkpoint complete. Cleared numRetries file.
It is between lines the "running tests" and "test passed" that all the CPU is being used. And that part shouldn't be disabled.
calxalot
Site Moderator
Posts: 1476
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Creation of checkpoints taking over my CPU!

Post by calxalot »

Checkpoint interval is set by researchers and cannot be changed. AFAIK
arisu
Posts: 373
Joined: Mon Feb 24, 2025 11:11 pm

Re: Creation of checkpoints taking over my CPU!

Post by arisu »

calxalot wrote: Mon Apr 21, 2025 7:01 am Checkpoint interval is set by researchers and cannot be changed. AFAIK
For the GPU cores, it can't be changed without editing core.xml. But I imagine the core will detect that the file has been tampered with and the unit would probably get dumped. For CPU cores, it can be changed just by passing a flag to the core, but the client doesn't have a way to do that (anymore).

It's only the OpenMM GPU cores that have a long and slow sanity check during the checkpoint anyway.
muziqaz
Posts: 1661
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Creation of checkpoints taking over my CPU!

Post by muziqaz »

If a GPU sanity check is causing video playback issues, I have to say that the computer is not fit for FAH. I am yet to see any issues caused by sanity checks, in all of my PCs
FAH Omega tester
Image
arisu
Posts: 373
Joined: Mon Feb 24, 2025 11:11 pm

Re: Creation of checkpoints taking over my CPU!

Post by arisu »

OP, try to lock the GPU folding thread to just one core. On Linux you can do that with taskset. On Windows you can use something called Process Lasso. That will make it use only one core for the sanity check. It will slow down the sanity check but it will free up most of your CPU cores, and it shouldn't impact GPU folding in between checks.
muziqaz
Posts: 1661
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Creation of checkpoints taking over my CPU!

Post by muziqaz »

arisu wrote: Tue Apr 22, 2025 3:24 am OP, try to lock the GPU folding thread to just one core. On Linux you can do that with taskset. On Windows you can use something called Process Lasso. That will make it use only one core for the sanity check. It will slow down the sanity check but it will free up most of your CPU cores, and it shouldn't impact GPU folding in between checks.
That is not a solution for a broken system.
When your minute everyday tasks lag because CPU gets a bit loaded by some app, something is broken with the hardware or drivers.
Obviously, we would be able to guess better if OP posted system specs.
P.S. Just for sh*ts and giggles: last time I experienced something like that was back in very early 2000s, when by ATA HDD interface would drop from UDMA to PIO, and I would need to reset it in the HDD Properties :D That was on Athlon XP 1700, I think. Ever since SATA interface, I have never had anything like it
FAH Omega tester
Image
ETA_2025
Posts: 112
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4070
20 x Raspberry Pi 5 Model B 2GB RAM
Location: VIC, Australia

Re: Creation of checkpoints taking over my CPU!

Post by ETA_2025 »

Thanks for letting me know about the sanity check. It would be useful if that was logged, so one knew about it.
muziqaz wrote: Mon Apr 21, 2025 2:35 pm If a GPU sanity check is causing video playback issues, I have to say that the computer is not fit for FAH.
Ha ha! Diagnosing something without any details. Very useful muziqaz!
arisu wrote: Tue Apr 22, 2025 3:24 am On Windows you can use something called Process Lasso
I'm using that, and have all the fah-core's set to Real-time. That caused the issue. When watching videos, I'll just set the processor priority to High, instead of Real-time.
Image
muziqaz
Posts: 1661
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Creation of checkpoints taking over my CPU!

Post by muziqaz »

Real time?
Why? 😲
FAH Omega tester
Image
ETA_2025
Posts: 112
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4070
20 x Raspberry Pi 5 Model B 2GB RAM
Location: VIC, Australia

Re: Creation of checkpoints taking over my CPU!

Post by ETA_2025 »

muziqaz wrote: Tue Apr 22, 2025 7:59 am Real time?
Why? 😲
To ensure folding has priority. Without Process Lasso, fahcore's had idle or background processor priority, making folding unnecessary slow.

Just remember sanity checks are undocumented, so how would I have adjusted one's system, to account for their effect on the system?
Image
calxalot
Site Moderator
Posts: 1476
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Creation of checkpoints taking over my CPU!

Post by calxalot »

Running folding with real time priority is not something I remember anyone trying before.

Most people want it in the background.
muziqaz
Posts: 1661
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Creation of checkpoints taking over my CPU!

Post by muziqaz »

calxalot wrote: Tue Apr 22, 2025 8:23 am Running folding with real time priority is not something I remember anyone trying before.

Most people want it in the background.
People didn't try it because it is completely unnecessary. And mainly for the reason of this thread, where stuff goes pearshaped.
FAH Omega tester
Image
foxpy
Posts: 27
Joined: Mon Mar 03, 2025 5:41 pm

Re: Creation of checkpoints taking over my CPU!

Post by foxpy »

If you run a process with Realtime priority, then you should definitely expect audio hangs :)
And even if you pin fah core to a single cpu, it might still cause issues, because a Realtime process might interfere with audio-related DPC calls.

So, yeah, don't use Realtime. If you want to give fah priority, High is more than enough. And remember that modifying priority doesn't magically give fah more performance. It will just increase latency for interactive tasks like web browsing, etc. Everything interactive that runs under lower priority will still consume the same amount of resources it needs, just more slowly. And if you run long batch jobs other than fah, I would rather suggest dedicating less cpus to fah and less cpus for other jobs, so they don't have to compete with each other.
Image
ETA_2025
Posts: 112
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4070
20 x Raspberry Pi 5 Model B 2GB RAM
Location: VIC, Australia

Re: Creation of checkpoints taking over my CPU!

Post by ETA_2025 »

arisu wrote: Tue Apr 22, 2025 3:24 am OP, try to lock the GPU folding thread to just one core.
Fah core's average CPU use is 14.74%, which is more than one thread (12.5%), because of the sanity checks. So, locking fah core to one thread would slow down folding, by slowing down the sanity checks. Unless the sanity checks run in the background of folding, which I don't think is the case.

My understanding is that the process is this:
  • folding
  • stop/pause folding
  • checkpoint creation & sanity check
  • resume folding
  • repeat
Is this correct?
Image
muziqaz
Posts: 1661
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Creation of checkpoints taking over my CPU!

Post by muziqaz »

Yes, more or less.
Sanity checks can be multi threaded. It depends how researcher set up the project.
FAH Omega tester
Image
Post Reply