Project:3064 (Run 3, Clone 216, Gen 7) long 1-4 interactions

Moderators: Site Moderators, FAHC Science Team

Post Reply
ElectricVehicle
Posts: 157
Joined: Fri Feb 01, 2008 6:41 pm

Project:3064 (Run 3, Clone 216, Gen 7) long 1-4 interactions

Post by ElectricVehicle »

I just got two Project: 3064 (Run 3, Clone 216, Gen 7) Warning: long 1-4 interactions errors in a row, failing at 81% each time. This is on a stable SMP client that completed 5 other WUs (projects other than 3064) in a row with no issues since I started it. I ran qfix to submit the failures which didn't automatically submit. The client is currently working on Project: 3064 (Run 3, Clone 216, Gen 7) for the third time, freshly assigned by the assignment servers. Anyone care to wager what happens at 81 percent? :o

Here's a summary of the log for the two units, followed by the entire log in the code block, and finally the qfix output in the following code block.

[17:17:05] Project: 3064 (Run 3, Clone 216, Gen 7)
...
[09:39:37] Completed 4050000 out of 5000000 steps (81 percent)
[09:46:40] Warning: long 1-4 interactions
[09:46:40] Gromacs cannot continue further.
[09:46:40] Going to send back what have done.
[09:46:40] logfile size: 9983
[09:46:40] - Writing 10519 bytes of core data to disk...
[09:46:40] ... Done.
[09:46:40] No C.P. to delete.
[09:46:40] - Failed to delete work/wudata_01.sas
[09:46:40] - Failed to delete work/wudata_01.goe
[09:46:40] Warning: check for stray files
[09:48:40]
[09:48:40] Folding@home Core Shutdown: EARLY_UNIT_END
[09:48:40]
[09:48:40] Folding@home Core Shutdown: EARLY_UNIT_END
[09:48:43] CoreStatus = 7B (123)
[09:48:43] Client-core communications error: ERROR 0x7b
[09:48:43] Deleting current work unit & continuing...


Newly assigned by assignment servers to this client for second time - Same project and RCG.
[09:50:54] Project: 3064 (Run 3, Clone 216, Gen 7)
...
[00:18:10] Completed 4050000 out of 5000000 steps (81 percent)
[00:24:28] Warning: long 1-4 interactions
[00:40:59] - Autosending finished units... [September 18 00:40:59 UTC]
[00:40:59] Trying to send all finished work units
[00:40:59] + No unsent completed units remaining.
[00:40:59] - Autosend completed
[03:18:10] At least 3 hours since checkpoint written...
[03:20:10]
[03:20:10] Folding@home Core Shutdown: EARLY_UNIT_END
[03:20:10]
[03:20:10] Folding@home Core Shutdown: EARLY_UNIT_END
[03:20:13] CoreStatus = 7B (123)
[03:20:13] Client-core communications error: ERROR 0x7b
[03:20:13] Deleting current work unit & continuing...

[03:22:24] Project: 3064 (Run 3, Clone 216, Gen 7)

Code: Select all

--- Opening Log file [September 12 06:41:21 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.22 SMP Beta2r3

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Fah
Executable: C:\Fah\Folding@home-Win32-x86.exe
Arguments: -smp -verbosity 9 

[06:41:21] - Ask before connecting: No
[06:41:21] - User name: [EV]Solar (Team 104636)
[06:41:21] - User ID: xxxx
[06:41:21] - Machine ID: 1
...
[17:17:01] + Attempting to get work packet
[17:17:01] - Will indicate memory of 2046 MB
[17:17:01] - Connecting to assignment server
[17:17:01] Connecting to http://assign.stanford.edu:8080/
[17:17:02] Posted data.
[17:17:02] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[17:17:02] + News From Folding@Home: Welcome to Folding@Home
[17:17:02] Loaded queue successfully.
[17:17:02] Connecting to http://171.64.65.63:8080/
[17:17:03] Posted data.
[17:17:03] Initial: 0000; - Receiving payload (expected size: 610279)
[17:17:04] - Downloaded at ~595 kB/s
[17:17:04] - Averaged speed for that direction ~593 kB/s
[17:17:04] + Received work.
[17:17:04] Trying to send all finished work units
[17:17:04] + No unsent completed units remaining.
[17:17:04] + Closed connections
[17:17:04] 
[17:17:04] + Processing work unit
[17:17:04] Work type a1 not eligible for variable processors
[17:17:04] Core required: FahCore_a1.exe
[17:17:04] Core found.
[17:17:04] Using generic mpiexec calls
[17:17:04] Working on queue slot 01 [September 16 17:17:04 UTC]
[17:17:04] + Working ...
[17:17:04] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 30 -verbose -lifeline 3508 -version 622'

[17:17:04] 
[17:17:04] *------------------------------*
[17:17:04] Folding@Home Gromacs SMP Core
[17:17:04] Version 1.74 (March 10, 2007)
[17:17:04] 
[17:17:04] Preparing to commence simulation
[17:17:04] - Ensuring status. Please wait.
[17:17:05] - Starting from initial work packet
[17:17:05] 
[17:17:05] Project: 3064 (Run 3, Clone 216, Gen 7)
[17:17:05] 
[17:17:05] Assembly optimizations on if available.
[17:17:05] Entering M.D.
[17:17:21] 2 percent)
[17:17:21] - Starting from initial work packet
[17:17:21] 
[17:17:21] Project: 3Entering M.D.
[17:17:21] one 216, Gen 7)
[17:17:21] 
[17:17:21] Entering M.D.
[17:17:28]  ID
[17:17:28] Protein: p3064_lambda5_2003
[17:17:28] ID
[17:17:28] Protein: p3064_lambda5_2003
[17:17:28] Writing local files
[17:17:28] Extra SSE boost OK.
[17:17:28] 
[17:17:28] Extra SSE boost OK.
[17:17:28] Writing local files
[17:17:28] Completed 0 out of 5000000 steps  (0 percent)
[17:29:36] Writing local files
[17:29:36] Completed 50000 out of 5000000 steps  (1 percent)
[17:41:44] Writing local files
[17:41:44] Completed 100000 out of 5000000 steps  (2 percent)
[17:53:34] Writing local files
[17:53:34] Completed 150000 out of 5000000 steps  (3 percent)
[18:05:44] Writing local files
[18:05:44] Completed 200000 out of 5000000 steps  (4 percent)
[18:17:46] Writing local files
[18:17:46] Completed 250000 out of 5000000 steps  (5 percent)
[18:29:50] Writing local files
[18:29:50] Completed 300000 out of 5000000 steps  (6 percent)
[18:40:59] - Autosending finished units... [September 16 18:40:59 UTC]
[18:40:59] Trying to send all finished work units
[18:40:59] + No unsent completed units remaining.
[18:40:59] - Autosend completed
[18:41:54] Writing local files
[18:41:54] Completed 350000 out of 5000000 steps  (7 percent)
[18:53:56] Writing local files
[18:53:56] Completed 400000 out of 5000000 steps  (8 percent)
[19:05:56] Writing local files
[19:05:56] Completed 450000 out of 5000000 steps  (9 percent)
[19:17:58] Writing local files
[19:17:58] Completed 500000 out of 5000000 steps  (10 percent)
[19:30:00] Writing local files
[19:30:00] Completed 550000 out of 5000000 steps  (11 percent)
[19:42:04] Writing local files
[19:42:04] Completed 600000 out of 5000000 steps  (12 percent)
[19:54:08] Writing local files
[19:54:08] Completed 650000 out of 5000000 steps  (13 percent)
[20:06:10] Writing local files
[20:06:10] Completed 700000 out of 5000000 steps  (14 percent)
[20:18:01] Writing local files
[20:18:01] Completed 750000 out of 5000000 steps  (15 percent)
[20:29:56] Writing local files
[20:29:56] Completed 800000 out of 5000000 steps  (16 percent)
[20:42:04] Writing local files
[20:42:04] Completed 850000 out of 5000000 steps  (17 percent)
[20:54:06] Writing local files
[20:54:06] Completed 900000 out of 5000000 steps  (18 percent)
[21:06:08] Writing local files
[21:06:08] Completed 950000 out of 5000000 steps  (19 percent)
[21:18:08] Writing local files
[21:18:08] Completed 1000000 out of 5000000 steps  (20 percent)
[21:30:08] Writing local files
[21:30:08] Completed 1050000 out of 5000000 steps  (21 percent)
[21:42:09] Writing local files
[21:42:09] Completed 1100000 out of 5000000 steps  (22 percent)
[21:54:26] Writing local files
[21:54:26] Completed 1150000 out of 5000000 steps  (23 percent)
[22:06:33] Writing local files
[22:06:33] Completed 1200000 out of 5000000 steps  (24 percent)
[22:18:36] Writing local files
[22:18:36] Completed 1250000 out of 5000000 steps  (25 percent)
[22:30:40] Writing local files
[22:30:40] Completed 1300000 out of 5000000 steps  (26 percent)
[22:41:50] Writing local files
[22:41:50] Completed 1350000 out of 5000000 steps  (27 percent)
[22:53:06] Writing local files
[22:53:06] Completed 1400000 out of 5000000 steps  (28 percent)
[23:05:07] Writing local files
[23:05:07] Completed 1450000 out of 5000000 steps  (29 percent)
[23:17:09] Writing local files
[23:17:09] Completed 1500000 out of 5000000 steps  (30 percent)
[23:29:12] Writing local files
[23:29:12] Completed 1550000 out of 5000000 steps  (31 percent)
[23:41:16] Writing local files
[23:41:16] Completed 1600000 out of 5000000 steps  (32 percent)
[23:53:21] Writing local files
[23:53:21] Completed 1650000 out of 5000000 steps  (33 percent)
[00:05:24] Writing local files
[00:05:24] Completed 1700000 out of 5000000 steps  (34 percent)
[00:17:28] Writing local files
[00:17:28] Completed 1750000 out of 5000000 steps  (35 percent)
[00:29:37] Writing local files
[00:29:38] Completed 1800000 out of 5000000 steps  (36 percent)
[00:40:59] - Autosending finished units... [September 17 00:40:59 UTC]
[00:40:59] Trying to send all finished work units
[00:40:59] + No unsent completed units remaining.
[00:40:59] - Autosend completed
[00:41:39] Writing local files
[00:41:39] Completed 1850000 out of 5000000 steps  (37 percent)
[00:53:45] Writing local files
[00:53:45] Completed 1900000 out of 5000000 steps  (38 percent)
[01:05:50] Writing local files
[01:05:50] Completed 1950000 out of 5000000 steps  (39 percent)
[01:17:56] Writing local files
[01:17:56] Completed 2000000 out of 5000000 steps  (40 percent)
[01:29:40] Writing local files
[01:29:40] Completed 2050000 out of 5000000 steps  (41 percent)
[01:41:42] Writing local files
[01:41:42] Completed 2100000 out of 5000000 steps  (42 percent)
[01:53:45] Writing local files
[01:53:45] Completed 2150000 out of 5000000 steps  (43 percent)
[02:06:06] Writing local files
[02:06:06] Completed 2200000 out of 5000000 steps  (44 percent)
[02:18:41] Writing local files
[02:18:41] Completed 2250000 out of 5000000 steps  (45 percent)
[02:31:10] Writing local files
[02:31:10] Completed 2300000 out of 5000000 steps  (46 percent)
[02:43:40] Writing local files
[02:43:40] Completed 2350000 out of 5000000 steps  (47 percent)
[02:56:07] Writing local files
[02:56:07] Completed 2400000 out of 5000000 steps  (48 percent)
[03:08:33] Writing local files
[03:08:33] Completed 2450000 out of 5000000 steps  (49 percent)
[03:20:59] Writing local files
[03:20:59] Completed 2500000 out of 5000000 steps  (50 percent)
[03:33:25] Writing local files
[03:33:25] Completed 2550000 out of 5000000 steps  (51 percent)
[03:45:52] Writing local files
[03:45:52] Completed 2600000 out of 5000000 steps  (52 percent)
[03:58:01] Writing local files
[03:58:01] Completed 2650000 out of 5000000 steps  (53 percent)
[04:09:34] Writing local files
[04:09:34] Completed 2700000 out of 5000000 steps  (54 percent)
[04:21:59] Writing local files
[04:21:59] Completed 2750000 out of 5000000 steps  (55 percent)
[04:33:32] Writing local files
[04:33:32] Completed 2800000 out of 5000000 steps  (56 percent)
[04:45:05] Writing local files
[04:45:05] Completed 2850000 out of 5000000 steps  (57 percent)
[04:56:42] Writing local files
[04:56:42] Completed 2900000 out of 5000000 steps  (58 percent)
[05:09:19] Writing local files
[05:09:20] Completed 2950000 out of 5000000 steps  (59 percent)
[05:22:54] Writing local files
[05:22:54] Completed 3000000 out of 5000000 steps  (60 percent)
[05:34:57] Writing local files
[05:34:57] Completed 3050000 out of 5000000 steps  (61 percent)
[05:47:01] Writing local files
[05:47:01] Completed 3100000 out of 5000000 steps  (62 percent)
[05:59:08] Writing local files
[05:59:08] Completed 3150000 out of 5000000 steps  (63 percent)
[06:11:14] Writing local files
[06:11:14] Completed 3200000 out of 5000000 steps  (64 percent)
[06:23:15] Writing local files
[06:23:15] Completed 3250000 out of 5000000 steps  (65 percent)
[06:35:06] Writing local files
[06:35:06] Completed 3300000 out of 5000000 steps  (66 percent)
[06:40:59] - Autosending finished units... [September 17 06:40:59 UTC]
[06:40:59] Trying to send all finished work units
[06:40:59] + No unsent completed units remaining.
[06:40:59] - Autosend completed
[06:47:06] Writing local files
[06:47:06] Completed 3350000 out of 5000000 steps  (67 percent)
[06:59:06] Writing local files
[06:59:06] Completed 3400000 out of 5000000 steps  (68 percent)
[07:11:08] Writing local files
[07:11:08] Completed 3450000 out of 5000000 steps  (69 percent)
[07:23:11] Writing local files
[07:23:11] Completed 3500000 out of 5000000 steps  (70 percent)
[07:35:13] Writing local files
[07:35:13] Completed 3550000 out of 5000000 steps  (71 percent)
[07:47:16] Writing local files
[07:47:16] Completed 3600000 out of 5000000 steps  (72 percent)
[07:59:19] Writing local files
[07:59:19] Completed 3650000 out of 5000000 steps  (73 percent)
[08:11:29] Writing local files
[08:11:29] Completed 3700000 out of 5000000 steps  (74 percent)
[08:23:33] Writing local files
[08:23:33] Completed 3750000 out of 5000000 steps  (75 percent)
[08:35:36] Writing local files
[08:35:36] Completed 3800000 out of 5000000 steps  (76 percent)
[08:47:11] Writing local files
[08:47:11] Completed 3850000 out of 5000000 steps  (77 percent)
[09:03:54] Writing local files
[09:03:54] Completed 3900000 out of 5000000 steps  (78 percent)
[09:15:58] Writing local files
[09:15:58] Completed 3950000 out of 5000000 steps  (79 percent)
[09:27:33] Writing local files
[09:27:33] Completed 4000000 out of 5000000 steps  (80 percent)
[09:39:37] Writing local files
[09:39:37] Completed 4050000 out of 5000000 steps  (81 percent)
[09:46:40] Warning:  long 1-4 interactions
[09:46:40] Gromacs cannot continue further.
[09:46:40] Going to send back what have done.
[09:46:40] logfile size: 9983
[09:46:40] - Writing 10519 bytes of core data to disk...
[09:46:40]   ... Done.
[09:46:40] No C.P. to delete.
[09:46:40] - Failed to delete work/wudata_01.sas
[09:46:40] - Failed to delete work/wudata_01.goe
[09:46:40] Warning:  check for stray files
[09:48:40] 
[09:48:40] Folding@home Core Shutdown: EARLY_UNIT_END
[09:48:40] 
[09:48:40] Folding@home Core Shutdown: EARLY_UNIT_END
[09:48:43] CoreStatus = 7B (123)
[09:48:43] Client-core communications error: ERROR 0x7b
[09:48:43] Deleting current work unit & continuing...
[09:48:43] Using generic mpiexec calls
[09:50:45] - Warning: Could not delete all work unit files (1): Core returned invalid code
[09:50:45] Trying to send all finished work units
[09:50:45] + No unsent completed units remaining.
[09:50:45] - Preparing to get new work unit...
[09:50:45] + Attempting to get work packet
[09:50:45] - Will indicate memory of 2046 MB
[09:50:45] - Connecting to assignment server
[09:50:45] Connecting to http://assign.stanford.edu:8080/
[09:50:46] Posted data.
[09:50:46] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[09:50:46] + News From Folding@Home: Welcome to Folding@Home
[09:50:46] Loaded queue successfully.
[09:50:46] Connecting to http://171.64.65.63:8080/
[09:50:47] Posted data.
[09:50:47] Initial: 0000; - Receiving payload (expected size: 610279)
[09:50:48] - Downloaded at ~595 kB/s
[09:50:48] - Averaged speed for that direction ~594 kB/s
[09:50:48] + Received work.
[09:50:48] + Closed connections
[09:50:53] 
[09:50:53] + Processing work unit
[09:50:53] Work type a1 not eligible for variable processors
[09:50:53] Core required: FahCore_a1.exe
[09:50:53] Core found.
[09:50:53] Using generic mpiexec calls
[09:50:53] Working on queue slot 02 [September 17 09:50:53 UTC]
[09:50:53] + Working ...
[09:50:53] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 02 -checkpoint 30 -verbose -lifeline 3508 -version 622'

[09:50:53] 
[09:50:53] *------------------------------*
[09:50:53] Folding@Home Gromacs SMP Core
[09:50:53] Version 1.74 (March 10, 2007)
[09:50:53] 
[09:50:53] Preparing to commence simulation
[09:50:53] - Ensuring status. Please wait.
[09:50:54] - Starting from initial work packet
[09:50:54] 
[09:50:54] Project: 3064 (Run 3, Clone 216, Gen 7)
[09:50:54] 
[09:50:54] Assembly optimizations on if available.
[09:50:54] Entering M.D.
[09:51:11] ial work pa- Starting from initial work packet
[09:51:11] 
[09:51:11] Project: 3Entering M.D.
[09:51:11] one 216, Gen 7)
[09:51:11] 
[09:51:11] Entering M.D.
[09:51:17]  ID
[09:51:17] ing:  cannot record process ID
[09:51:17] Rejecting checkpoint
[09:51:17] a SSE boost OK.
[09:51:17] ambda5_2003Extra SSE boost OK.
[09:51:17] 
[09:51:17] Extra SSE boost OK.
[09:51:17] Writing local files
[09:51:17] Completed 0 out of 5000000 steps  (0 percent)
[10:02:11] Writing local files
[10:02:11] Completed 50000 out of 5000000 steps  (1 percent)
[10:12:56] Writing local files
[10:12:56] Completed 100000 out of 5000000 steps  (2 percent)
[10:23:41] Writing local files
[10:23:42] Completed 150000 out of 5000000 steps  (3 percent)
[10:34:26] Writing local files
[10:34:26] Completed 200000 out of 5000000 steps  (4 percent)
[10:45:08] Writing local files
[10:45:08] Completed 250000 out of 5000000 steps  (5 percent)
[10:55:52] Writing local files
[10:55:52] Completed 300000 out of 5000000 steps  (6 percent)
[11:06:36] Writing local files
[11:06:36] Completed 350000 out of 5000000 steps  (7 percent)
[11:17:17] Writing local files
[11:17:18] Completed 400000 out of 5000000 steps  (8 percent)
[11:27:55] Writing local files
[11:27:55] Completed 450000 out of 5000000 steps  (9 percent)
[11:38:33] Writing local files
[11:38:33] Completed 500000 out of 5000000 steps  (10 percent)
[11:49:14] Writing local files
[11:49:14] Completed 550000 out of 5000000 steps  (11 percent)
[11:59:56] Writing local files
[11:59:56] Completed 600000 out of 5000000 steps  (12 percent)
[12:10:39] Writing local files
[12:10:39] Completed 650000 out of 5000000 steps  (13 percent)
[12:21:21] Writing local files
[12:21:21] Completed 700000 out of 5000000 steps  (14 percent)
[12:32:10] Writing local files
[12:32:10] Completed 750000 out of 5000000 steps  (15 percent)
[12:40:59] - Autosending finished units... [September 17 12:40:59 UTC]
[12:40:59] Trying to send all finished work units
[12:40:59] + No unsent completed units remaining.
[12:40:59] - Autosend completed
[12:42:51] Writing local files
[12:42:51] Completed 800000 out of 5000000 steps  (16 percent)
[12:53:31] Writing local files
[12:53:31] Completed 850000 out of 5000000 steps  (17 percent)
[13:04:13] Writing local files
[13:04:13] Completed 900000 out of 5000000 steps  (18 percent)
[13:14:54] Writing local files
[13:14:54] Completed 950000 out of 5000000 steps  (19 percent)
[13:25:34] Writing local files
[13:25:34] Completed 1000000 out of 5000000 steps  (20 percent)
[13:36:13] Writing local files
[13:36:13] Completed 1050000 out of 5000000 steps  (21 percent)
[13:46:53] Writing local files
[13:46:53] Completed 1100000 out of 5000000 steps  (22 percent)
[13:57:29] Writing local files
[13:57:29] Completed 1150000 out of 5000000 steps  (23 percent)
[14:08:21] Writing local files
[14:08:21] Completed 1200000 out of 5000000 steps  (24 percent)
[14:19:03] Writing local files
[14:19:03] Completed 1250000 out of 5000000 steps  (25 percent)
[14:29:45] Writing local files
[14:29:45] Completed 1300000 out of 5000000 steps  (26 percent)
[14:40:26] Writing local files
[14:40:26] Completed 1350000 out of 5000000 steps  (27 percent)
[14:51:06] Writing local files
[14:51:06] Completed 1400000 out of 5000000 steps  (28 percent)
[15:01:47] Writing local files
[15:01:47] Completed 1450000 out of 5000000 steps  (29 percent)
[15:12:28] Writing local files
[15:12:28] Completed 1500000 out of 5000000 steps  (30 percent)
[15:23:11] Writing local files
[15:23:11] Completed 1550000 out of 5000000 steps  (31 percent)
[15:33:54] Writing local files
[15:33:54] Completed 1600000 out of 5000000 steps  (32 percent)
[15:44:38] Writing local files
[15:44:38] Completed 1650000 out of 5000000 steps  (33 percent)
[15:55:23] Writing local files
[15:55:23] Completed 1700000 out of 5000000 steps  (34 percent)
[16:06:10] Writing local files
[16:06:10] Completed 1750000 out of 5000000 steps  (35 percent)
[16:16:54] Writing local files
[16:16:54] Completed 1800000 out of 5000000 steps  (36 percent)
[16:27:33] Writing local files
[16:27:33] Completed 1850000 out of 5000000 steps  (37 percent)
[16:38:11] Writing local files
[16:38:11] Completed 1900000 out of 5000000 steps  (38 percent)
[16:48:54] Writing local files
[16:48:54] Completed 1950000 out of 5000000 steps  (39 percent)
[16:59:37] Writing local files
[16:59:37] Completed 2000000 out of 5000000 steps  (40 percent)
[17:10:20] Writing local files
[17:10:20] Completed 2050000 out of 5000000 steps  (41 percent)
[17:21:01] Writing local files
[17:21:01] Completed 2100000 out of 5000000 steps  (42 percent)
[17:31:41] Writing local files
[17:31:41] Completed 2150000 out of 5000000 steps  (43 percent)
[17:42:23] Writing local files
[17:42:23] Completed 2200000 out of 5000000 steps  (44 percent)
[17:53:04] Writing local files
[17:53:04] Completed 2250000 out of 5000000 steps  (45 percent)
[18:03:49] Writing local files
[18:03:49] Completed 2300000 out of 5000000 steps  (46 percent)
[18:14:32] Writing local files
[18:14:32] Completed 2350000 out of 5000000 steps  (47 percent)
[18:25:14] Writing local files
[18:25:14] Completed 2400000 out of 5000000 steps  (48 percent)
[18:35:55] Writing local files
[18:35:55] Completed 2450000 out of 5000000 steps  (49 percent)
[18:40:59] - Autosending finished units... [September 17 18:40:59 UTC]
[18:40:59] Trying to send all finished work units
[18:40:59] + No unsent completed units remaining.
[18:40:59] - Autosend completed
[18:46:35] Writing local files
[18:46:35] Completed 2500000 out of 5000000 steps  (50 percent)
[18:57:15] Writing local files
[18:57:15] Completed 2550000 out of 5000000 steps  (51 percent)
[19:07:56] Writing local files
[19:07:56] Completed 2600000 out of 5000000 steps  (52 percent)
[19:18:41] Writing local files
[19:18:41] Completed 2650000 out of 5000000 steps  (53 percent)
[19:29:26] Writing local files
[19:29:26] Completed 2700000 out of 5000000 steps  (54 percent)
[19:40:09] Writing local files
[19:40:09] Completed 2750000 out of 5000000 steps  (55 percent)
[19:50:52] Writing local files
[19:50:52] Completed 2800000 out of 5000000 steps  (56 percent)
[20:01:35] Writing local files
[20:01:35] Completed 2850000 out of 5000000 steps  (57 percent)
[20:12:20] Writing local files
[20:12:20] Completed 2900000 out of 5000000 steps  (58 percent)
[20:23:04] Writing local files
[20:23:04] Completed 2950000 out of 5000000 steps  (59 percent)
[20:33:54] Writing local files
[20:33:54] Completed 3000000 out of 5000000 steps  (60 percent)
[20:44:37] Writing local files
[20:44:37] Completed 3050000 out of 5000000 steps  (61 percent)
[20:55:20] Writing local files
[20:55:20] Completed 3100000 out of 5000000 steps  (62 percent)
[21:06:05] Writing local files
[21:06:05] Completed 3150000 out of 5000000 steps  (63 percent)
[21:16:48] Writing local files
[21:16:48] Completed 3200000 out of 5000000 steps  (64 percent)
[21:27:28] Writing local files
[21:27:28] Completed 3250000 out of 5000000 steps  (65 percent)
[21:38:07] Writing local files
[21:38:07] Completed 3300000 out of 5000000 steps  (66 percent)
[21:48:46] Writing local files
[21:48:46] Completed 3350000 out of 5000000 steps  (67 percent)
[21:59:24] Writing local files
[21:59:24] Completed 3400000 out of 5000000 steps  (68 percent)
[22:10:04] Writing local files
[22:10:04] Completed 3450000 out of 5000000 steps  (69 percent)
[22:20:43] Writing local files
[22:20:43] Completed 3500000 out of 5000000 steps  (70 percent)
[22:31:23] Writing local files
[22:31:23] Completed 3550000 out of 5000000 steps  (71 percent)
[22:42:04] Writing local files
[22:42:04] Completed 3600000 out of 5000000 steps  (72 percent)
[22:52:46] Writing local files
[22:52:46] Completed 3650000 out of 5000000 steps  (73 percent)
[23:03:27] Writing local files
[23:03:27] Completed 3700000 out of 5000000 steps  (74 percent)
[23:14:07] Writing local files
[23:14:07] Completed 3750000 out of 5000000 steps  (75 percent)
[23:24:49] Writing local files
[23:24:49] Completed 3800000 out of 5000000 steps  (76 percent)
[23:35:29] Writing local files
[23:35:29] Completed 3850000 out of 5000000 steps  (77 percent)
[23:46:09] Writing local files
[23:46:10] Completed 3900000 out of 5000000 steps  (78 percent)
[23:56:49] Writing local files
[23:56:50] Completed 3950000 out of 5000000 steps  (79 percent)
[00:07:28] Writing local files
[00:07:28] Completed 4000000 out of 5000000 steps  (80 percent)
[00:18:10] Writing local files
[00:18:10] Completed 4050000 out of 5000000 steps  (81 percent)
[00:24:28] Warning:  long 1-4 interactions
[00:40:59] - Autosending finished units... [September 18 00:40:59 UTC]
[00:40:59] Trying to send all finished work units
[00:40:59] + No unsent completed units remaining.
[00:40:59] - Autosend completed
[03:18:10] At least 3 hours since checkpoint written...
[03:20:10] 
[03:20:10] Folding@home Core Shutdown: EARLY_UNIT_END
[03:20:10] 
[03:20:10] Folding@home Core Shutdown: EARLY_UNIT_END
[03:20:13] CoreStatus = 7B (123)
[03:20:13] Client-core communications error: ERROR 0x7b
[03:20:13] Deleting current work unit & continuing...
[03:20:13] Using generic mpiexec calls
[03:22:16] - Warning: Could not delete all work unit files (2): Core returned invalid code
[03:22:16] Trying to send all finished work units
[03:22:16] + No unsent completed units remaining.
[03:22:16] - Preparing to get new work unit...
[03:22:16] + Attempting to get work packet
[03:22:16] - Will indicate memory of 2046 MB
[03:22:16] - Connecting to assignment server
[03:22:16] Connecting to http://assign.stanford.edu:8080/
[03:22:16] Posted data.
[03:22:16] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[03:22:16] + News From Folding@Home: Welcome to Folding@Home
[03:22:16] Loaded queue successfully.
[03:22:16] Connecting to http://171.64.65.63:8080/
[03:22:17] Posted data.
[03:22:17] Initial: 0000; - Receiving payload (expected size: 610279)
[03:22:18] - Downloaded at ~595 kB/s
[03:22:18] - Averaged speed for that direction ~594 kB/s
[03:22:18] + Received work.
[03:22:18] + Closed connections
[03:22:23] 
[03:22:23] + Processing work unit
[03:22:23] Work type a1 not eligible for variable processors
[03:22:23] Core required: FahCore_a1.exe
[03:22:23] Core found.
[03:22:23] Using generic mpiexec calls
[03:22:23] Working on queue slot 03 [September 18 03:22:23 UTC]
[03:22:23] + Working ...
[03:22:23] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 03 -checkpoint 30 -verbose -lifeline 3508 -version 622'

[03:22:24] 
[03:22:24] *------------------------------*
[03:22:24] Folding@Home Gromacs SMP Core
[03:22:24] Version 1.74 (March 10, 2007)
[03:22:24] 
[03:22:24] Preparing to commence simulation
[03:22:24] - Ensuring status. Please wait.
[03:22:24] - Starting from initial work packet
[03:22:24] 
[03:22:24] Project: 3064 (Run 3, Clone 216, Gen 7)
[03:22:24] 
[03:22:24] Assembly optimizations on if available.
[03:22:24] Entering M.D.
[03:22:41] 2 percent)
[03:22:41] - Starting from initial work packet
[03:22:41] 
[03:22:41] Project: 3064 (Run 3, Clone 216, Gen 7)
[03:22:41] 
[03:22:41] Entering M.D.
[03:22:47]  ID
[03:22:47] ing:  cannot record process ID
[03:22:47] Rejecting checkpoint
[03:22:48] a SSE boost OK.
[03:22:48] ambda5_2003Extra SSE boost OK.
[03:22:48] 
[03:22:48] Extra SSE boost OK.
[03:22:48] Writing local files
[03:22:48] Completed 0 out of 5000000 steps  (0 percent)
[03:33:36] Writing local files
[03:33:36] Completed 50000 out of 5000000 steps  (1 percent)
[03:44:29] Writing local files
[03:44:29] Completed 100000 out of 5000000 steps  (2 percent)
...
qfix output:

Code: Select all

C:\Windows\system32>cd C:\Fah

C:\Fah>qfix
entry 4, status 0, address 171.64.65.64:8080
entry 5, status 0, address 171.64.65.64:8080
entry 6, status 0, address 171.64.65.64:8080
entry 7, status 0, address 171.64.65.64:8080
entry 8, status 0, address 171.64.65.64:8080
entry 9, status 0, address 171.64.65.64:8080
entry 0, status 0, address 171.64.65.63:8080
entry 1, status 0, address 171.64.65.63:8080
  Found results <work\wuresults_01.dat>: proj 3064, run 3, clone 216, gen 7
   -- queue entry: proj 3064, run 3, clone 216, gen 7
   -- requeued for upload
entry 2, status 0, address 171.64.65.63:8080
  Found results <work\wuresults_02.dat>: proj 3064, run 3, clone 216, gen 7
   -- queue entry: proj 3064, run 3, clone 216, gen 7
   -- requeued for upload
entry 3, status 1, address 171.64.65.63:8080
File needed repair.  Errors fixed: 2.
toTOW
Site Moderator
Posts: 6429
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project:3064 (Run 3, Clone 216, Gen 7) long 1-4 interactions

Post by toTOW »

Thanks for your report.

Qfix worked, your partial result will be sent on next client start, or on next autosend process.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
ElectricVehicle
Posts: 157
Joined: Fri Feb 01, 2008 6:41 pm

Re: Project:3064 (Run 3, Clone 216, Gen 7) long 1-4 interactions

Post by ElectricVehicle »

Thanks for checking toTOW. I'm mostly concerned about the science. So I posted this and ran qfix so Stanford can get an idea of the WU and client behavior to continue improving the usefullness, performance anf stability of FAH.

Until I earned to run qfix, I was loosing a few points here and there but not very often. I presume by running qfix when the SMP client fails to upload partial results that we help FAH since Stanford will see the percent completion and completion code in the partial results? Otherwise they might not now about the error or it's relative frequency and what they don't know about, they can't fix.
Fold On! (with 100% Renewable, 0 Carbon electricity) ElectricVehicle EV1, RAV4 EV, LEAF, Bolt EV, Volt, M3, s4 Simulator
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project:3064 (Run 3, Clone 216, Gen 7) long 1-4 interactions

Post by bruce »

ElectricVehicle wrote:I presume by running qfix when the SMP client fails to upload partial results that we help FAH since Stanford will see the percent completion and completion code in the partial results? Otherwise they might not now about the error or it's relative frequency and what they don't know about, they can't fix.
There's a report generated but it's not clear how those results are reported to the researcher. (Perhaps someone from Stanford can comment here.)

I do know that they're aware of the general problem but (as I'm sure you know) there's no prediction when a fix will be available. Better error reports are just as important as fewer errors, and it's clear to me that lots of errors are happening to people who know nothing of qfix. That's a lot of wasted processing.
ElectricVehicle
Posts: 157
Joined: Fri Feb 01, 2008 6:41 pm

Re: Project:3064 (Run 3, Clone 216, Gen 7) long 1-4 interactions

Post by ElectricVehicle »

I guess on the road to 5 PetaFLOPS, a few MegaFLOPS are going to get lost! Fold on!

Also, I noticed after qfix I got 260 points for the two failed 3064 units which would normally be 1753 points. They progressed to 81%, so in some sense that's 2 units x 1753 points x 80% complete = 2839 points worth of work / time. And it's actually longer than that since one of the units stalled for three hours.

I'm presuming that re-running a potentially bad unit isn't of much scientific value, so next time if I happen to notice it, I'll qfix the first one and then dump the repeat units so the computer spends more time on useful work once the useful work of reporting the initial error is done. That would salvage 30 or 40 hours of folding time spent on repeating the problem unit for successful work on other units.

No worries, just trying to optimize for the most science.
Fold On! (with 100% Renewable, 0 Carbon electricity) ElectricVehicle EV1, RAV4 EV, LEAF, Bolt EV, Volt, M3, s4 Simulator
Post Reply