Project: 6503 (Run 3, Clone 254, Gen 53)

Moderators: Site Moderators, FAHC Science Team

Post Reply
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Project: 6503 (Run 3, Clone 254, Gen 53)

Post by DrSpalding »

On one of the remote Linux boxes I have FAH running on, I finally noted that it was continuing to retry the above project. It gets to 40% and quits with a "Client-core communications error: ERROR 0x0" every time and then sometimes downloads a new core (Fah_core78) and tries again. I have seen it try 18 times now since 15 August.

Here is the first of the many failure log files:

Code: Select all

--- Opening Log file [August 15 16:44:12]


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /var/www/html/folding
Executable: ./fah6
Arguments: -oneunit

[16:44:12] - Ask before connecting: No
[16:44:12] - User name: DrSpalding (Team 48083)
[16:44:12] - User ID: 1F23550B520F01BE
[16:44:12] - Machine ID: 1
[16:44:12]
[16:44:12] Loaded queue successfully.
[16:44:12] - Preparing to get new work unit...
[16:44:12] + Attempting to get work packet
[16:44:12] - Connecting to assignment server
[16:44:20] - Successful: assigned to (171.64.65.62).
[16:44:20] + News From Folding@Home: Welcome to Folding@Home
[16:44:20] Loaded queue successfully.
[16:47:29] - Couldn't send HTTP request to server
[16:47:29] + Could not connect to Work Server
[16:47:29] - Attempt #1  to get work failed, and no other work to do.
             Waiting before retry.
[16:47:35] + Attempting to get work packet
[16:47:35] - Connecting to assignment server
[16:47:35] - Successful: assigned to (171.64.65.62).
[16:47:35] + News From Folding@Home: Welcome to Folding@Home
[16:47:35] Loaded queue successfully.
[16:47:44] + Closed connections
[16:47:44]
[16:47:44] + Processing work unit
[16:47:44] Core required: FahCore_78.exe
[16:47:44] Core found.
[16:47:44] Working on Unit 08 [August 15 16:47:44]
[16:47:44] + Working ...
[16:47:45]
[16:47:45] *------------------------------*
[16:47:45] Folding@Home Gromacs Core
[16:47:45] Version 1.90 (March 8, 2006)
[16:47:45]
[16:47:45] Preparing to commence simulation
[16:47:45] - Looking at optimizations...
[16:47:45] - Created dyn
[16:47:45] - Files status OK
[16:47:45] - Expanded 516503 -> 2531073 (decompressed 490.0 percent)
[16:47:45] - Starting from initial work packet
[16:47:45]
[16:47:45] Project: 6503 (Run 3, Clone 254, Gen 53)
[16:47:45]
[16:47:45] Assembly optimizations on if available.
[16:47:45] Entering M.D.
[16:47:52] Protein: TR462_B_4 in water
[16:47:52]
[16:47:52] Writing local files
[16:47:52] Extra SSE boost OK.
[16:47:52] Writing local files
[16:47:53] Completed 0 out of 250000 steps  (0%)
[17:07:40] Writing local files
[17:07:40] Completed 2500 out of 250000 steps  (1%)
[17:27:27] Writing local files
[17:27:27] Completed 5000 out of 250000 steps  (2%)
[17:47:12] Writing local files
[17:47:12] Completed 7500 out of 250000 steps  (3%)
[18:06:58] Writing local files
[18:06:59] Completed 10000 out of 250000 steps  (4%)
[18:26:45] Writing local files
[18:26:45] Completed 12500 out of 250000 steps  (5%)
[18:46:32] Writing local files
[18:46:32] Completed 15000 out of 250000 steps  (6%)
[19:06:19] Writing local files
[19:06:19] Completed 17500 out of 250000 steps  (7%)
[19:26:07] Writing local files
[19:26:08] Completed 20000 out of 250000 steps  (8%)
[19:45:57] Writing local files
[19:45:57] Completed 22500 out of 250000 steps  (9%)
[20:05:46] Writing local files
[20:05:46] Completed 25000 out of 250000 steps  (10%)
[20:25:35] Writing local files
[20:25:35] Completed 27500 out of 250000 steps  (11%)
[20:45:23] Writing local files
[20:45:23] Completed 30000 out of 250000 steps  (12%)
[21:05:09] Writing local files
[21:05:09] Completed 32500 out of 250000 steps  (13%)
[21:24:55] Writing local files
[21:24:55] Completed 35000 out of 250000 steps  (14%)
[21:44:40] Writing local files
[21:44:40] Completed 37500 out of 250000 steps  (15%)
[22:04:25] Writing local files
[22:04:25] Completed 40000 out of 250000 steps  (16%)
[22:24:12] Writing local files
[22:24:12] Completed 42500 out of 250000 steps  (17%)
[22:43:59] Writing local files
[22:43:59] Completed 45000 out of 250000 steps  (18%)
[23:03:45] Writing local files
[23:03:45] Completed 47500 out of 250000 steps  (19%)
[23:23:29] Writing local files
[23:23:29] Completed 50000 out of 250000 steps  (20%)
[23:43:15] Writing local files
[23:43:15] Completed 52500 out of 250000 steps  (21%)
[00:03:02] Writing local files
[00:03:02] Completed 55000 out of 250000 steps  (22%)
[00:22:48] Writing local files
[00:22:48] Completed 57500 out of 250000 steps  (23%)
[00:42:35] Writing local files
[00:42:35] Completed 60000 out of 250000 steps  (24%)
[01:02:22] Writing local files
[01:02:22] Completed 62500 out of 250000 steps  (25%)
[01:22:10] Writing local files
[01:22:10] Completed 65000 out of 250000 steps  (26%)
[01:41:57] Writing local files
[01:41:57] Completed 67500 out of 250000 steps  (27%)
[02:01:43] Writing local files
[02:01:43] Completed 70000 out of 250000 steps  (28%)
[02:21:29] Writing local files
[02:21:29] Completed 72500 out of 250000 steps  (29%)
[02:41:14] Writing local files
[02:41:14] Completed 75000 out of 250000 steps  (30%)
[03:00:58] Writing local files
[03:00:58] Completed 77500 out of 250000 steps  (31%)
[03:20:45] Writing local files
[03:20:46] Completed 80000 out of 250000 steps  (32%)
[03:40:31] Writing local files
[03:40:31] Completed 82500 out of 250000 steps  (33%)
[04:00:18] Writing local files
[04:00:18] Completed 85000 out of 250000 steps  (34%)
[04:20:05] Writing local files
[04:20:05] Completed 87500 out of 250000 steps  (35%)
[04:39:53] Writing local files
[04:39:53] Completed 90000 out of 250000 steps  (36%)
[04:59:41] Writing local files
[04:59:41] Completed 92500 out of 250000 steps  (37%)
[05:19:27] Writing local files
[05:19:27] Completed 95000 out of 250000 steps  (38%)
[05:39:14] Writing local files
[05:39:14] Completed 97500 out of 250000 steps  (39%)
[05:59:00] Writing local files
[05:59:00] Completed 100000 out of 250000 steps  (40%)
[06:10:17] CoreStatus = 0 (0)
[06:10:17] Client-core communications error: ERROR 0x0
[06:10:17] Deleting current work unit & continuing...
[06:10:35] - Preparing to get new work unit...
[06:10:35] + Attempting to get work packet
[06:10:35] - Connecting to assignment server
[06:10:35] - Successful: assigned to (171.64.65.62).
[06:10:35] + News From Folding@Home: Welcome to Folding@Home
[06:10:35] Loaded queue successfully.
[06:10:44] + Closed connections
[06:10:49]
[06:10:49] + Processing work unit
[06:10:49] Core required: FahCore_78.exe
[06:10:49] Core found.
[06:10:49] Working on Unit 09 [August 16 06:10:49]
[06:10:49] + Working ...
[06:10:49]
[06:10:49] *------------------------------*
[06:10:49] Folding@Home Gromacs Core
[06:10:49] Version 1.90 (March 8, 2006)
[06:10:49]
[06:10:49] Preparing to commence simulation
[06:10:49] - Looking at optimizations...
[06:10:49] - Created dyn
[06:10:49] - Files status OK
[06:10:50] - Expanded 516503 -> 2531073 (decompressed 490.0 percent)
[06:10:50] - Starting from initial work packet
[06:10:50]
[06:10:50] Project: 6503 (Run 3, Clone 254, Gen 53)
[06:10:50]
[06:10:50] Assembly optimizations on if available.
[06:10:50] Entering M.D.
[06:10:56] Protein: TR462_B_4 in water
[06:10:56]
[06:10:56] Writing local files
[06:10:57] Extra SSE boost OK.
[06:10:57] Writing local files
[06:10:58] Completed 0 out of 250000 steps  (0%)
[06:30:27] Writing local files
[06:30:27] Completed 2500 out of 250000 steps  (1%)
[06:49:55] Writing local files

etc., etc., etc.
I have deleted the work files and the queue.dat to try to get it restarted on a different WU. Is it a bad WU or bad configuration on my end?
Not a real doctor, I just play one on the 'net!
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Post by bruce »

Thank you for your report.

The WU (P6503,R3,C254,G53) has been reported as a bad WU.
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Post by DrSpalding »

Thanks Bruce. Unfortunately, it looks like it is still out there in the wild because after a clearing out of queue.dat and work/*, it still picked it up at 18:33 PDT after 10 attempts of trying to contact AS get a WU. I'll wait it out again to ensure it gets to 40% and dies again before killing the queue.dat and work/* and restarting the client.
Not a real doctor, I just play one on the 'net!
Image
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Post by sortofageek »

I looked at it and see no recent feedback about it. FWIW, I reported it again.

Could you post a log, please. Are you sure it is exactly this WU? Project: 6503 (Run 3, Clone 254, Gen 53)

Could it possibly be a different WU with the same project number, but different run, clone, gen numbers?
DrSpalding
Posts: 136
Joined: Wed May 27, 2009 4:48 pm
Hardware configuration: Dell Studio 425 MTS-Core i7-920 c0 stock
evga SLI 3x o/c Core i7-920 d0 @ 3.9GHz + nVidia GTX275
Dell 5150 + nVidia 9800GT

Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Post by DrSpalding »

It was the same WU. Here is the relevant log, FWIW.

Code: Select all

--- Opening Log file [August 26 22:23:58]


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /var/www/html/folding
Executable: ./fah6
Arguments: -oneunit

[22:23:58] - Ask before connecting: No
[22:23:58] - User name: DrSpalding (Team 48083)
[22:23:58] - User ID: 1F23550B520F01BE
[22:23:58] - Machine ID: 1
[22:23:58]
[22:23:58] Could not open work queue, generating new queue...
[22:23:58] - Preparing to get new work unit...
[22:23:58] + Attempting to get work packet
[22:23:58] - Connecting to assignment server
[22:23:58] + No appropriate work server was available; will try again in a bit.
[22:23:58] + Couldn't get work instructions.
[22:23:58] - Attempt #1  to get work failed, and no other work to do.
             Waiting before retry.
[22:24:09] + Attempting to get work packet
[22:24:09] - Connecting to assignment server
[22:27:18] - Couldn't send HTTP request to server
[22:27:18] + Could not connect to Assignment Server
[22:27:18] + No appropriate work server was available; will try again in a bit.
[22:27:18] + Couldn't get work instructions.
[22:27:18] - Attempt #2  to get work failed, and no other work to do.
             Waiting before retry.
[22:27:32] + Attempting to get work packet
[22:27:32] - Connecting to assignment server
[22:27:32] + No appropriate work server was available; will try again in a bit.
[22:27:32] + Couldn't get work instructions.
[22:27:32] - Attempt #3  to get work failed, and no other work to do.
             Waiting before retry.
[22:28:03] + Attempting to get work packet
[22:28:03] - Connecting to assignment server
[22:29:36] + No appropriate work server was available; will try again in a bit.
[22:29:36] + Couldn't get work instructions.
[22:29:36] - Attempt #4  to get work failed, and no other work to do.
             Waiting before retry.
[22:30:19] + Attempting to get work packet
[22:30:19] - Connecting to assignment server
[22:31:52] + No appropriate work server was available; will try again in a bit.
[22:31:52] + Couldn't get work instructions.
[22:31:52] - Attempt #5  to get work failed, and no other work to do.
             Waiting before retry.
[22:33:21] + Attempting to get work packet
[22:33:21] - Connecting to assignment server
[22:34:06] + No appropriate work server was available; will try again in a bit.
[22:34:06] + Couldn't get work instructions.
[22:34:06] - Attempt #6  to get work failed, and no other work to do.
             Waiting before retry.
[22:36:58] + Attempting to get work packet
[22:36:58] - Connecting to assignment server
[22:36:58] + No appropriate work server was available; will try again in a bit.
[22:36:58] + Couldn't get work instructions.
[22:36:58] - Attempt #7  to get work failed, and no other work to do.
             Waiting before retry.
[22:42:31] + Attempting to get work packet
[22:42:31] - Connecting to assignment server
[22:42:31] + No appropriate work server was available; will try again in a bit.
[22:42:31] + Couldn't get work instructions.
[22:42:31] - Attempt #8  to get work failed, and no other work to do.
             Waiting before retry.
[22:53:17] + Attempting to get work packet
[22:53:17] - Connecting to assignment server
[22:53:18] + No appropriate work server was available; will try again in a bit.
[22:53:18] + Couldn't get work instructions.
[22:53:18] - Attempt #9  to get work failed, and no other work to do.
             Waiting before retry.
[23:14:50] + Attempting to get work packet
[23:14:50] - Connecting to assignment server
[23:14:51] + No appropriate work server was available; will try again in a bit.
[23:14:51] + Couldn't get work instructions.
[23:14:51] - Attempt #10  to get work failed, and no other work to do.
             Waiting before retry.
[23:57:34] + Attempting to get work packet
[23:57:34] - Connecting to assignment server
[23:57:35] + No appropriate work server was available; will try again in a bit.
[23:57:35] + Couldn't get work instructions.
[23:57:35] - Attempt #11  to get work failed, and no other work to do.
             Waiting before retry.
[00:45:40] + Attempting to get work packet
[00:45:40] - Connecting to assignment server
[00:45:40] + No appropriate work server was available; will try again in a bit.
[00:45:40] + Couldn't get work instructions.
[00:45:40] - Attempt #12  to get work failed, and no other work to do.
             Waiting before retry.
[01:33:45] + Attempting to get work packet
[01:33:45] - Connecting to assignment server
[01:33:45] - Successful: assigned to (171.64.65.62).
[01:33:45] + News From Folding@Home: Welcome to Folding@Home
[01:33:45] Loaded queue successfully.
[01:33:55] + Closed connections
[01:33:55]
[01:33:55] + Processing work unit
[01:33:55] Core required: FahCore_78.exe
[01:33:55] Core found.
[01:33:55] Working on Unit 01 [August 27 01:33:55]
[01:33:55] + Working ...
[01:33:55]
[01:33:55] *------------------------------*
[01:33:55] Folding@Home Gromacs Core
[01:33:55] Version 1.90 (March 8, 2006)
[01:33:55]
[01:33:55] Preparing to commence simulation
[01:33:55] - Looking at optimizations...
[01:33:55] - Created dyn
[01:33:55] - Files status OK
[01:33:55] - Expanded 516503 -> 2531073 (decompressed 490.0 percent)
[01:33:55] - Starting from initial work packet
[01:33:55]
[01:33:55] Project: 6503 (Run 3, Clone 254, Gen 53)
[01:33:55]
[01:33:55] Assembly optimizations on if available.
[01:33:55] Entering M.D.
[01:34:02] Protein: TR462_B_4 in water
[01:34:02]
[01:34:02] Writing local files
[01:34:02] Extra SSE boost OK.
[01:34:02] Writing local files
[01:34:03] Completed 0 out of 250000 steps  (0%)
[01:54:35] Writing local files
[01:54:35] Completed 2500 out of 250000 steps  (1%)
[02:15:06] Writing local files
[02:15:06] Completed 5000 out of 250000 steps  (2%)
[02:35:37] Writing local files
[02:35:37] Completed 7500 out of 250000 steps  (3%)

Not a real doctor, I just play one on the 'net!
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Project: 6503 (Run 3, Clone 254, Gen 53)

Post by 7im »

It looks like after a WU is reported as bad, it may take a little while for that info to propagate out to the work servers so the WU isn't sent out again.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply