Project: 6510 (Run 1, Clone 139, Gen 6) - EUE at 54%

Moderators: Site Moderators, FAHC Science Team

Post Reply
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Project: 6510 (Run 1, Clone 139, Gen 6) - EUE at 54%

Post by Napoleon »

Bad WU? Note: I was running FurMark to stress test the integrated GPU at some point during folding. Passively cooled rig, external PSU. The CPU & GPU temps were within spec, but very high, and the top of the case got warm to touch. The ambient temperature may have gotten too high for the memory modules. The other classic client which was folding during the stress test is still going strong, though, 65% completed so far (P6515, R18, C147, G75).

Code: Select all

--- Opening Log file [January 9 10:48:47 UTC] 


# Windows CPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory:
Executable: Folding@home-Win32-x86
Arguments: -oneunit -advmethods -forceasm -verbosity 9 

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[10:48:47] - Ask before connecting: No
[10:48:47] - User name: Napoleon (Team 191980)
[10:48:47] - User ID:
[10:48:47] - Machine ID: 3
[10:48:47] 
[10:48:47] Loaded queue successfully.
[10:48:47] - Preparing to get new work unit...
[10:48:47] Cleaning up work directory
[10:48:47] + Attempting to get work packet
[10:48:47] Passkey found
[10:48:47] - Will indicate memory of 512 MB
[10:48:47] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 12, Stepping: 2
[10:48:47] - Connecting to assignment server
[10:48:47] Connecting to http://assign.stanford.edu:8080/
[10:48:47] - Autosending finished units... [January 9 10:48:47 UTC]
[10:48:47] Trying to send all finished work units
[10:48:47] + No unsent completed units remaining.
[10:48:47] - Autosend completed
[10:48:48] Posted data.
[10:48:48] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[10:48:48] + News From Folding@Home: Welcome to Folding@Home
[10:48:48] Loaded queue successfully.
[10:48:48] Sent data
[10:48:48] Connecting to http://171.64.65.62:8080/
[10:48:57] Posted data.
[10:48:57] Initial: 0000; - Receiving payload (expected size: 662728)
[10:49:01] - Downloaded at ~161 kB/s
[10:49:01] - Averaged speed for that direction ~175 kB/s
[10:49:01] + Received work.
[10:49:01] + Closed connections
[10:49:01] 
[10:49:01] + Processing work unit
[10:49:01] Core required: FahCore_78.exe
[10:49:01] Core found.
[10:49:01] Working on queue slot 05 [January 9 10:49:01 UTC]
[10:49:01] + Working ...
[10:49:01] - Calling '.\FahCore_78.exe -dir work/ -suffix 05 -nice 19 -nocpulock -checkpoint 30 -forceasm -verbose -lifeline 4572 -version 630'

[10:49:01] 
[10:49:01] *------------------------------*
[10:49:01] Folding@Home Gromacs Core
[10:49:01] Version 1.90 (March 8, 2006)
[10:49:01] 
[10:49:01] Preparing to commence simulation
[10:49:01] - Assembly optimizations manually forced on.
[10:49:01] - Not checking prior termination.
[10:49:02] - Expanded 662216 -> 3264761 (decompressed 493.0 percent)
[10:49:02] - Starting from initial work packet
[10:49:02] 
[10:49:02] Project: 6510 (Run 1, Clone 139, Gen 6)
[10:49:02] 
[10:49:02] Assembly optimizations on if available.
[10:49:02] Entering M.D.
[10:49:08] Protein: TR538_2 in water
[10:49:08] 
[10:49:09] Writing local files
[10:49:09] Extra SSE boost OK.
[10:49:10] Writing local files
[10:49:10] Completed 0 out of 250000 steps  (0%)
[11:19:10] Writing local files
[11:19:10] Completed 2500 out of 250000 steps  (1%)
[11:49:09] Writing local files
[11:49:09] Completed 5000 out of 250000 steps  (2%)
[12:19:08] Timered checkpoint triggered.
[12:19:11] Writing local files
[12:19:11] Completed 7500 out of 250000 steps  (3%)
[12:49:11] Timered checkpoint triggered.
[12:49:15] Writing local files
[12:49:15] Completed 10000 out of 250000 steps  (4%)
[13:19:15] Timered checkpoint triggered.
[13:19:31] Writing local files
[13:19:31] Completed 12500 out of 250000 steps  (5%)
[13:49:32] Timered checkpoint triggered.
[13:49:45] Writing local files
[13:49:45] Completed 15000 out of 250000 steps  (6%)
[14:19:44] Timered checkpoint triggered.
[14:19:51] Writing local files
[14:19:51] Completed 17500 out of 250000 steps  (7%)
[14:49:50] Timered checkpoint triggered.
[14:49:51] Writing local files
[14:49:51] Completed 20000 out of 250000 steps  (8%)
[15:19:52] Timered checkpoint triggered.
[15:19:55] Writing local files
[15:19:55] Completed 22500 out of 250000 steps  (9%)
[15:49:54] Timered checkpoint triggered.
[15:50:43] Writing local files
[15:50:43] Completed 25000 out of 250000 steps  (10%)
[16:20:44] Timered checkpoint triggered.
[16:21:17] Writing local files
[16:21:17] Completed 27500 out of 250000 steps  (11%)
[16:48:47] - Autosending finished units... [January 9 16:48:47 UTC]
[16:48:47] Trying to send all finished work units
[16:48:47] + No unsent completed units remaining.
[16:48:47] - Autosend completed
[16:51:17] Timered checkpoint triggered.
[16:55:52] Writing local files
[16:55:52] Completed 30000 out of 250000 steps  (12%)
[17:25:52] Timered checkpoint triggered.
[17:30:59] Writing local files
[17:30:59] Completed 32500 out of 250000 steps  (13%)
[18:01:00] Timered checkpoint triggered.
[18:06:09] Writing local files
[18:06:09] Completed 35000 out of 250000 steps  (14%)
[18:36:08] Timered checkpoint triggered.
[18:38:54] Writing local files
[18:38:55] Completed 37500 out of 250000 steps  (15%)
[19:08:54] Timered checkpoint triggered.
[19:09:35] Writing local files
[19:09:36] Completed 40000 out of 250000 steps  (16%)
[19:39:36] Timered checkpoint triggered.
[19:41:44] Writing local files
[19:41:44] Completed 42500 out of 250000 steps  (17%)
[20:11:44] Timered checkpoint triggered.
[20:13:28] Writing local files
[20:13:29] Completed 45000 out of 250000 steps  (18%)
[20:43:28] Timered checkpoint triggered.
[20:45:15] Writing local files
[20:45:15] Completed 47500 out of 250000 steps  (19%)
[21:15:16] Timered checkpoint triggered.
[21:16:45] Writing local files
[21:16:45] Completed 50000 out of 250000 steps  (20%)
[21:46:46] Timered checkpoint triggered.
[21:48:23] Writing local files
[21:48:23] Completed 52500 out of 250000 steps  (21%)
[22:18:24] Timered checkpoint triggered.
[22:20:03] Writing local files
[22:20:03] Completed 55000 out of 250000 steps  (22%)
[22:48:47] - Autosending finished units... [January 9 22:48:47 UTC]
[22:48:47] Trying to send all finished work units
[22:48:47] + No unsent completed units remaining.
[22:48:47] - Autosend completed
[22:50:04] Timered checkpoint triggered.
[22:51:37] Writing local files
[22:51:37] Completed 57500 out of 250000 steps  (23%)
[23:21:37] Timered checkpoint triggered.
[23:23:18] Writing local files
[23:23:18] Completed 60000 out of 250000 steps  (24%)
[23:53:17] Timered checkpoint triggered.
[23:54:54] Writing local files
[23:54:54] Completed 62500 out of 250000 steps  (25%)
[00:24:53] Timered checkpoint triggered.
[00:26:28] Writing local files
[00:26:28] Completed 65000 out of 250000 steps  (26%)
[00:56:28] Timered checkpoint triggered.
[00:58:14] Writing local files
[00:58:14] Completed 67500 out of 250000 steps  (27%)
[01:28:15] Timered checkpoint triggered.
[01:30:01] Writing local files
[01:30:01] Completed 70000 out of 250000 steps  (28%)
[02:00:02] Timered checkpoint triggered.
[02:01:39] Writing local files
[02:01:39] Completed 72500 out of 250000 steps  (29%)
[02:31:39] Timered checkpoint triggered.
[02:33:16] Writing local files
[02:33:16] Completed 75000 out of 250000 steps  (30%)
[03:03:16] Timered checkpoint triggered.
[03:04:51] Writing local files
[03:04:51] Completed 77500 out of 250000 steps  (31%)
[03:34:51] Timered checkpoint triggered.
[03:36:29] Writing local files
[03:36:29] Completed 80000 out of 250000 steps  (32%)
[04:06:30] Timered checkpoint triggered.
[04:08:09] Writing local files
[04:08:09] Completed 82500 out of 250000 steps  (33%)
[04:38:10] Timered checkpoint triggered.
[04:39:48] Writing local files
[04:39:48] Completed 85000 out of 250000 steps  (34%)
[04:48:47] - Autosending finished units... [January 10 04:48:47 UTC]
[04:48:47] Trying to send all finished work units
[04:48:47] + No unsent completed units remaining.
[04:48:47] - Autosend completed
[05:09:49] Timered checkpoint triggered.
[05:10:59] Writing local files
[05:10:59] Completed 87500 out of 250000 steps  (35%)
[05:41:00] Timered checkpoint triggered.
[05:41:54] Writing local files
[05:41:54] Completed 90000 out of 250000 steps  (36%)
[06:11:55] Timered checkpoint triggered.
[06:12:47] Writing local files
[06:12:48] Completed 92500 out of 250000 steps  (37%)
[06:42:47] Timered checkpoint triggered.
[06:43:04] Writing local files
[06:43:04] Completed 95000 out of 250000 steps  (38%)
[07:13:04] Timered checkpoint triggered.
[07:13:22] Writing local files
[07:13:22] Completed 97500 out of 250000 steps  (39%)
[07:43:23] Timered checkpoint triggered.
[07:43:40] Writing local files
[07:43:40] Completed 100000 out of 250000 steps  (40%)
[08:13:41] Timered checkpoint triggered.
[08:13:57] Writing local files
[08:13:58] Completed 102500 out of 250000 steps  (41%)
[08:43:58] Timered checkpoint triggered.
[08:44:13] Writing local files
[08:44:13] Completed 105000 out of 250000 steps  (42%)
[09:14:14] Timered checkpoint triggered.
[09:14:30] Writing local files
[09:14:30] Completed 107500 out of 250000 steps  (43%)
[09:44:30] Timered checkpoint triggered.
[09:44:50] Writing local files
[09:44:50] Completed 110000 out of 250000 steps  (44%)
[10:14:51] Timered checkpoint triggered.
[10:15:07] Writing local files
[10:15:07] Completed 112500 out of 250000 steps  (45%)
[10:45:08] Timered checkpoint triggered.
[10:45:21] Writing local files
[10:45:21] Completed 115000 out of 250000 steps  (46%)
[10:48:47] - Autosending finished units... [January 10 10:48:47 UTC]
[10:48:47] Trying to send all finished work units
[10:48:47] + No unsent completed units remaining.
[10:48:47] - Autosend completed
[11:15:22] Timered checkpoint triggered.
[11:15:37] Writing local files
[11:15:38] Completed 117500 out of 250000 steps  (47%)
[11:45:38] Timered checkpoint triggered.
[11:45:54] Writing local files
[11:45:55] Completed 120000 out of 250000 steps  (48%)
[12:15:55] Timered checkpoint triggered.
[12:16:12] Writing local files
[12:16:12] Completed 122500 out of 250000 steps  (49%)
[12:46:13] Timered checkpoint triggered.
[12:47:05] Writing local files
[12:47:05] Completed 125000 out of 250000 steps  (50%)
[13:17:05] Timered checkpoint triggered.
[13:17:22] Writing local files
[13:17:22] Completed 127500 out of 250000 steps  (51%)
[13:47:22] Timered checkpoint triggered.
[13:47:42] Writing local files
[13:47:42] Completed 130000 out of 250000 steps  (52%)
[14:17:43] Timered checkpoint triggered.
[14:18:13] Writing local files
[14:18:13] Completed 132500 out of 250000 steps  (53%)
[14:48:14] Timered checkpoint triggered.
[14:48:49] Writing local files
[14:48:49] Completed 135000 out of 250000 steps  (54%)
[15:18:41] Gromacs cannot continue further.
[15:18:41] Going to send back what have done.
[15:18:41] logfile size: 15016
[15:18:41] - Writing 15552 bytes of core data to disk...
[15:18:41] Done: 15040 -> 4225 (compressed to 28.0 percent)
[15:18:41]   ... Done.
[15:18:41] 
[15:18:41] Folding@home Core Shutdown: EARLY_UNIT_END
[15:18:45] CoreStatus = 72 (114)
[15:18:45] Sending work to server
[15:18:45] Project: 6510 (Run 1, Clone 139, Gen 6)
[15:18:45] - Read packet limit of 540015616... Set to 524286976.


[15:18:45] + Attempting to send results [January 10 15:18:45 UTC]
[15:18:45] - Reading file work/wuresults_05.dat from core
[15:18:45]   (Read 4737 bytes from disk)
[15:18:45] Connecting to http://171.64.65.62:8080/
[15:18:46] Posted data.
[15:18:46] Initial: 0000; - Uploaded at ~5 kB/s
[15:18:46] - Averaged speed for that direction ~32 kB/s
[15:18:46] + Results successfully sent
[15:18:46] Thank you for your contribution to Folding@Home.
[15:18:50] Trying to send all finished work units
[15:18:50] + No unsent completed units remaining.
[15:18:50] + -oneunit flag given and have now finished a unit. Exiting.***** Got a SIGTERM signal (2)
[15:18:50] Killing all core threads

Folding@Home Client Shutdown.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
lanhua
Pande Group Member
Posts: 4
Joined: Thu Feb 18, 2010 10:54 pm

Re: Project: 6510 (Run 1, Clone 139, Gen 6) - EUE at 54%

Post by lanhua »

Thank you for the feedback.
I've got a finished WU of project 6510 (Run 1, Clone 139, Gen 6). I am not sure what happened on your machine. I will double check the trajectory


Lan
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Project: 6510 (Run 1, Clone 139, Gen 6) - EUE at 54%

Post by ChelseaOilman »

lanhua wrote:I've got a finished WU of project 6510 (Run 1, Clone 139, Gen 6).
Are you sure? I see 5 people in the WU database returned this WU for the same partial credit which seems to confirm they all only got through about 54% of the WU. I don't see any that returned it for the full credit of 107 points.

Hi xxxxxx (team xxxxx),
Your WU (P6510 R1 C139 G6) was added to the stats database on 2011-01-05 11:01:53 for 58.83 points of credit.

Hi Anonymous (team 0),
Your WU (P6510 R1 C139 G6) was added to the stats database on 2011-01-08 09:02:03 for 58.83 points of credit.

Hi xxxxx (team xx),
Your WU (P6510 R1 C139 G6) was added to the stats database on 2011-01-09 12:02:29 for 58.83 points of credit.

Hi Napoleon (team 191980),
Your WU (P6510 R1 C139 G6) was added to the stats database on 2011-01-10 08:02:04 for 58.83 points of credit.

Hi Anonymous (team 0),
Your WU (P6510 R1 C139 G6) was added to the stats database on 2011-01-10 09:02:01 for 58.83 points of credit.
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Re: Project: 6510 (Run 1, Clone 139, Gen 6) - EUE at 54%

Post by Napoleon »

Any news on this? Is the WU showing up in the DB as being completed yet? I've noticed some other things I'm curious about, too...

First, P6500. I can understand why someone could complete a WU that EUE'd for me. After all, there are different code paths in the FAHcores, SSE2 vs FPU probably being the most significant ones. IIRC, FPU has higher internal precision, the x87 FPU registers are always 80bit and the results of floating point ops are rounded only when a FPU register is stored into memory in single or double precision format. Thus, I can imagine some borderline stable WU being completed successfully on the FPU, while it would run into a dead end on SSE2 when rounding errors get compounded. Still, why the two successful completions instead of just one? Redundancy? Both got credited, though.

Talking about credit, here's the second one, the no credit case. Two (or more?) successful completions, no credit for me. Admittedly, a rather unusual way of processing the WU on my part, plenty of restarts and other oddities, but still...

Mind you, I'm not in a points race with anyone. I'm talking classic WUs here, and I'm running them on a slow rig, so points as such don't matter that much in these cases. However, I'm trying to learn what I can about the intricate details of the art, plus I hate the idea of sending the calculations into the void. I consider my points to be PG's way of reporting back to me that I've accomplished at least something of scientific value for them, however marginal it may be. If reporting is inaccurate, what else may be going off the mark? Yes yes, there are plenty of peer reviewed papers, but it takes patience to wait for new ones, and I don't really understand what's in them anyway, it isn't my field. The points, however, are very easy to understand and immensely reassuring. :D

Could there possibly be bugs in the WU DB/stats system or somewhere else? At least for me, the "Date of last work unit" and "Active CPU count" are way off in the official stats. Oh well, I don't mind when at least the completed WU count and the points add up right, for the reasons I mentioned above.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6510 (Run 1, Clone 139, Gen 6) - EUE at 54%

Post by bruce »

That's probably not the best explanaion of differences, though it MIGHT be correct in a few cases. SSE2 is actually used very rarely; almost everything runs SSE. There are a few really old machines do not support SSE, but they're pretty rare so very, very few people ever use the unoptimized x86 FPU code. Yes, there are differences in precision, but that's almost never the real issue for protein simulation.

Molecular Dynamics involves both deterministic and random processes. I'm not sure how much you know about random number generators, but when you stop and resume a WU, there's a pretty good chance that you're going to produce an equally valid but mathematically different result than if you simulated it without stopping and resuming. (There are differences in how the various FAHcores manage the stochastic processes so I'm not able to make a universal statement here.)

Of course it's POSSIBLE that there are bugs in the stats system or somewhere else ... but you're more likely seeing the results of something else happening. Hardware can fail or crash or disks can get full or any number of other realities and sometimes things get lost. The Pande Group are human, and they do sometimes make mistakes. Considering how many WUs are processed and how few actually get "lost" I think they're doing a really good job, but they're still somewhat short of perfect. From the perspective of an individual Donor, I can get very upset if something happens to the work I've completed even if that's a very rare event. From the scientific standpoint, missing data can be reproduced.

I've been folding for quite a while. I've completed a lot of WUs, but there are some others which were lost due to something that happened in one of my machines. I don't gripe about them, because I have a pretty good idea what caused the problem and I accept the fact that I'm not perfect, nor is my hardware. I don't have any actual numbers, but I think I can confidently say that my local failure rate is quite a bit worse than the Pande Group's failure rate.
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Re: Project: 6510 (Run 1, Clone 139, Gen 6) - EUE at 54%

Post by Napoleon »

Thanks for the info, Bruce. I'm not familiar with RNGs, just about all I know about them is summed up in this quote: "The generation of random numbers is too important to be left to chance." :-D

Be that as it may, I don't really mind the occasional statistical failure every now and then, when it comes to EUEs. I didn't insert the -advmethods switch by accident, you know. I suppose there's a little accountant in me, wanting to get out... the numbers (read: points) don't matter much, as long as the books are balanced. :-D

A PG member says the WU has been completed successfully, but a mod says it can't be found in the DB. Of course it could have been been a fluke in my HW, but the history that could be found in the DB suggests otherwise. After all, several others have tried and failed, apparently in the exact same point. Then there is this successfully completed version that apparently came out of nowhere and can't be found anywhere. I'd like to take a look at the ledger, please... :twisted:

I do try to ensure my HW is stable before griping, goes double now that I've overclocked the CPU pretty heavily. However, there was no OC in place when this occurred. EDIT: and the rig didn't crash during the simultaneous stress testing / folding, and the other WU I was folding during the stress test finished just fine.

EDIT: My wireless network connection may have played some small part in "the no credit case". I'll leave it up to higher powers to diagnose these further, if they are so inclined. I very much doubt that, but if it happens, let me know if you need more info and I'll see what I can do.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
Post Reply