Page 1 of 1
possible memory leak then 'resources not found'
Posted: Sun Feb 23, 2025 9:21 pm
by roland_schweiger
System:
Desktop PC with Intel I7 9700k @ 3,6GHz, 16 GB RAM, nVidia GeForce GTX 1650, Windows11Pro.
FAH Client 8.49 configured for the GPU + 6 CPU Cores.
Have the client up and running for a few days, also joined the Gridcoin team as i have a CPID anyway (although not yet showing unfortunately).
Got a few WUs and after about 1 1/2 days my RAM started to fill up almost to max, then the pagefile grew slightly, finally the WUs were dumped and i got an error message like 'resources unavailable'
Only a PC restart helped, yet when i first got new WUs the FAC Client 8.49 crashed and only on second restart did i get new WUs which work up to date.
QUESTION : is there any known cause or is it most probably that the work units were bad?
LOG files do not help (at least not the ones i can see in the web interface) because they don't date back to the moment when the crash happened.
Are there any hidden settings or similar, any suggestions what i can do to prevent things like this happening?
greetings from Vienna
Roland Schweiger
Re: possible memory leak then 'resources not found'
Posted: Mon Feb 24, 2025 5:15 pm
by muziqaz
Please grab a screenshot of the task manager or similar with fahclient process stats
Re: possible memory leak then 'resources not found'
Posted: Mon Feb 24, 2025 5:35 pm
by Joe_H
There are three possible places for memory leaks. One would be in fah-client itself, it runs in the background handling WU fetching and returns as well as logging. Then there is the browser used to run the web control. Finally, and least likely are the folding cores. They run the WU from beginning to end and then exit after packing up the results for return by fah-client to the servers. Since they do not run continuously, not as likely to be the source.
If it is the web control running in the browser, then exiting when not being actively used would avoid the memory leak. Possibly using another browser would avoid it as well.
Re: possible memory leak then 'resources not found'
Posted: Wed Feb 26, 2025 7:46 pm
by roland_schweiger
would not have imagined a relatively simple piece of software like a webbrowser (compared to complex computational tasks) could be the cause of such problems. just out of trail and error i used Edge instead of Chrome and "installed" the webinterface-page "as an app" (whatever that does, i think it is not much more than a link on the windows desktop) but since i did that, no more memory leaks seemed to appear. Still i cannot really believe that this can be the culprit but thanks for the inspiration.
When in comes to work fetch, there are sometimes strange behaviours in F@H.
E.g. i have the nVidia GTX1650/Cuda enabled and i also have 6 CPU cores. Work is being done. I then also tried to enable my onboard graphics INTEL UHD-630 (yes i know it is slow but there might be some work) then the webinterface of F@H shows a red thermometer bar with repeating attempts to get work for the Intel card, no WUs found. But then when my CPU WUs are done and also when the nVidia WUs are done, the client gets no more work at all. I then must untick / disable the internal Intel GPU again and everything returns to normal.
apart from such little things the client works fine, i even now get a little bit of GRC RAC and also have 1 BOINC project enabled. Even on my elderly machine considerable amount of work is being done and i set nvidia card to 70% performance and in windows control panel i set "max cpu performance" to 85% which reduces clock cycle from 3,6 to 3,1 GHz and the machine stays nicely cool. Good to see that with a little bit of finger tips one can even get a slightly older machine to do sensible work.
Re: possible memory leak then 'resources not found'
Posted: Wed Feb 26, 2025 10:00 pm
by Joe_H
The Intel UHD 630 I would recommend just leaving disabled. While supported, few projects are generating small enough WUs for these iGPUs. There was more work for them when first used on a trial basis during COVID, but most of the small systems are being set up to use CPU processing these days. Intel did at least get some performance information out of the work done, and that may have helped with driver and hardware development going into the recent Battlemage cards which are usable and being used for F@h.
The CPU cores can in theory be set up to offload some vector processing to a GPU. That was investigated a bit a few years ago, but setting things up so that it would actually improve processing enough turned out to be complicated.
As for browsers mattering, some do a better job of handling Javascript pages that tend to continuously run like the Web Control. Much of the page downloaded when you connect is written in Javascript making it usable across a lot of platforms.
Re: possible memory leak then 'resources not found'
Posted: Tue Mar 04, 2025 9:34 pm
by ETA_2025
roland_schweiger wrote: ↑Wed Feb 26, 2025 7:46 pm
just out of trail and error i used Edge instead of Chrome and "installed" the webinterface-page "as an app" (whatever that does, i think
I believe Chrome loads every tab when first opened. If you have a lot of tabs open, that would use a lot of RAM. Firefox only loads a tab when you go to it. I don't know what Edge does. So it is possible that your web browser (Chrome) is responsible for the 'memory leak'.
Task Manager can show how much memory every app is using, in a condescend view (one instance of Firefox, not one instance for every tab of Firefox).
Re: possible memory leak then 'resources not found'
Posted: Wed Apr 23, 2025 5:10 am
by 91Notch
I'm noticing memory usage grow over time with fah-client running in a proxmox container and using an RTX3050 gpu passed through from the host. It's been a challenge getting fah to identify the gpu as "supported", and that's still a bit of a hit & miss thing, but once it does "get supported" it runs fine, although after a few work units, the memory use starts to grow.
There is no GUI/browser running in the container - I monitor my work units from various other machines. So I think that rules out browsers as the source of the problem if the only thing running in the container is fah-client.
I've been testing some operations to see what happens:
- if I have the client pause after finishing a job, the memory usage remains high until I use systemctl to stop the client. When I start the client (but it's still paused, without a WU loaded), the memory usage is small, about 250 MiB. I press the "play" button to download and start a WU, and the memory usage remains low, and might grow slightly, from 250 MiB until the WU is finished and it reaches 325 MiB. A new WU is downloaded and starts, and I see bump up in the memory, up to 585 MiB.
- If I do a restart of the client while it's running that WU, the memory usage comes down a bit, e.g. 470 MiB, and then continues a slow climb.
- After each WU, the memory is not released, and the next WU grows the memory usage from there. I don't know if it's a function of the particular project, or if the rate of memory growth increases with each subsequent WU. I bumped up the memory allocated in steps, up to 8GiB now, and all of that memory gets consumed if I let the client run without interruption.
This is just a test for me, with a low-end gpu, before I switch some other machines from Windows to Linux, so I just keep nursing this along for now, pausing after WU completion to stop the client and restart it to reset the memory usage, but I will try this with an RTX 4070 next week to see if the same problem presents. Hopefully this is u1seful information for the developers.
Re: possible memory leak then 'resources not found'
Posted: Wed Apr 23, 2025 5:31 am
by arisu
What kind of memory is this? The address space it requests isn't the amount it is really using. It downloads the WUs entirely into memory to extract them. Even if it frees the memory in the heap, it might still leave "used" memory that isn't really used. There is an optimization that operating systems use called overcommit and it lets them ask for more memory than they really need. It inflates the reported memory usage but the memory isn't actually occupied.
The "resources unavailable" error has nothing to do with memory. That actually means that there are no work units available at that time for your system configuration. Unless your system is undergoing memory pressure because of fah-client then there is nothing to worry about.
Re: possible memory leak then 'resources not found'
Posted: Wed Apr 23, 2025 5:47 am
by muziqaz
Memory leak has been experienced in internal testing and Dev is aware of it. GitHub issue is raised regarding this.
Though issue was raised as Windows only, since my Linux boxes have no issue with it.
In windows, of one pauses folding. Then restarts the PC, aster reboot, do not resume folding but observe task manager.
Fahclient will blow up in memory usage up to 3-4GB. Once you resume folding over time usage drops
Re: possible memory leak then 'resources not found'
Posted: Wed Apr 23, 2025 5:54 am
by arisu
muziqaz wrote: ↑Wed Apr 23, 2025 5:47 am
Memory leak has been experienced in internal testing and Dev is aware of it. GitHub issue is raised regarding this.
Though issue was raised as Windows only, since my Linux boxes have no issue with it.
In windows, of one pauses folding. Then restarts the PC, aster reboot, do not resume folding but observe task manager.
Fahclient will blow up in memory usage up to 3-4GB. Once you resume folding over time usage drops
Is it non-reclaimable memory type? I don't know what task manager shows by default but if it is reclaimable memory then it might not be an issue.
I don't use Windows but if you know how to reproduce it on Linux I can run it with Valgrind.
Re: possible memory leak then 'resources not found'
Posted: Wed Apr 23, 2025 6:57 am
by muziqaz
Linux shows many numbers which I don't understand. Windows has default memory usage numbers which I can understand. Big numbers bad, small numbers good
Re: possible memory leak then 'resources not found'
Posted: Wed Apr 23, 2025 7:41 pm
by foxpy
muziqaz wrote: ↑Wed Apr 23, 2025 6:57 am
Linux shows many numbers which I don't understand.
If you use htop to look into memory usage, your point of interest is RES (residential memory). That's the actual physical memory used by a process at the moment of observation (yes, it is slightly more complicated than that, just as everything else in Linux, but it always was a good rule of a thumb for me).