How does the WU system work?

Moderators: Site Moderators, FAHC Science Team

Post Reply
chumbucket843
Posts: 10
Joined: Sun Sep 06, 2009 2:24 am

How does the WU system work?

Post by chumbucket843 »

i'm confused on this. is folding at home embarrassingly parallel? and then you parallelize the simulation over all systems or am i doing the same WU over and over with different results each time like rosetta? what kind of fault tolerance system does F@H have?

you dont have to go into great detail. i have been thinking about how this works and its confusing me.

thanks
Nathan_P
Posts: 1180
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: How does the WU system work?

Post by Nathan_P »

From what i iunderstand - taken from the wiki
Each project consists of multiple WUs, and each WU calculates a slightly different portion of a trajectory for a particular protein. These trajectory parts are identified by the Run, Clone and Generation numbers.

Once someone has folded gen 1 of a run & clone e.g r42 C21 G1 - the next gen, G2 will be sent out to be folded and so on. Each WU only gets processed once unless:

1. You missed the preferred deadline - WU will be resent to someone
2. If the WU goes back early because of an error (i think this is true - can someone please confirm)

If the WU is processed normally, sent back OK and you get credit then thats it.

PG spent several years validating the science before they started to use the results sent back to them and they have no published in excess of 70 papers so there must be a ton of fault tolerance built into the system at various points

If you want a bit more info try the wiki: http://fahwiki.net
Image
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: How does the WU system work?

Post by 7im »

Both serial and parallel at the same time.

Serial in that Generation 2 work units are not created until the results from Generation 1 are returned to Stanford.

Parallel in that Project 1234 and Project 1235 might be working on the same protein, but with slightly different environment settings. Hot, colder, more water, etc.

Here is a good fah wiki article... http://fahwiki.net/index.php/Runs,_Clones_and_Gens
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: How does the WU system work?

Post by bruce »

Nathan_P wrote:2. If the WU goes back early because of an error (i think this is true - can someone please confirm)
Sort of true.

each individual Project/Run/Clone starts with an assumed random distribution of atomic velocities and continues to process with a degree of randomness due to thermal motions. Some of those combinations lead to rational motions that fold quickly, some to rational motions that simply stay at approximately the same shape, and some lead to motions that would never happen in nature -- and to an error. Thus some trajectories are very useful, some produce no results, and some must be discarded. That combination means that's a form of parallelism that FAH has been able to exploit because in real life,there are long periods of time when nothing useful is happening and then folding happens abruptly, due to those random thermal motions. After discarding the "bad WUs" the overall statistics lead to useful results.

On the other hand, if someone's hardware is malfunctioning due to overclocking, overheating or a defect of some kind, it will generate an error. There is no way to know whether the error is due to hardware or if it's due to the randomness mentioned above. Those WUs are reassigned and if they fail again, the error was a random bad WU. If that other computer does not produce an error, it was due to overclocking/etc. This is an important part of the overall FAH system redundancy. The scientists use redundancy wherever it makes sense but also minimize it wherever possible so that more work can be completed.
whynot
Posts: 91
Joined: Wed Mar 26, 2008 9:02 pm
Location: Kyiv, Ukraine

Re: How does the WU system work?

Post by whynot »

Nathan_P wrote: 2. If the WU goes back early because of an error (i think this is true - can someone please confirm)
Here is at least one EUE that was sent to two different donors. I can't find a thread, however I was told once that failed WUs are resent (sometimes immediately; look through the server-problems forum, there lots of threads about that). Although, I've got the impression that there're different strategies for different types of fails (I can be wrong about this).

Shortly, I dare to state (I'm not in authority to, to be honest) none successful WU is re-issued (ever) .
--
I'm counting for science.
Points just make me sick.
Nathan_P
Posts: 1180
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: How does the WU system work?

Post by Nathan_P »

whynot wrote:
Nathan_P wrote: 2. If the WU goes back early because of an error (i think this is true - can someone please confirm)
Here is at least one EUE that was sent to two different donors. I can't find a thread, however I was told once that failed WUs are resent (sometimes immediately; look through the server-problems forum, there lots of threads about that). Although, I've got the impression that there're different strategies for different types of fails (I can be wrong about this).

Shortly, I dare to state (I'm not in authority to, to be honest) none successful WU is re-issued (ever) .
You are correct in that statement with one exception, they will sometimes reissue completed projects on new core's to validate that the new core isn't returning garbage. You can see it with the new a3 core, most of the projects are reissues of the existing smp a1/a2 projects
Image
whynot
Posts: 91
Joined: Wed Mar 26, 2008 9:02 pm
Location: Kyiv, Ukraine

Re: How does the WU system work?

Post by whynot »

Nathan_P wrote: You are correct in that statement with one exception, they will sometimes reissue completed projects on new core's to validate that the new core isn't returning garbage. You can see it with the new a3 core, most of the projects are reissues of the existing smp a1/a2 projects
Please note, that's a new core what's validated but WU. Thanks for correction.
--
I'm counting for science.
Points just make me sick.
Post Reply