Project: 6316 (Run 183, Clone 3, Gen 26)

Moderators: Site Moderators, FAHC Science Team

Post Reply
AgrFan
Posts: 63
Joined: Sat Mar 15, 2008 8:07 pm

Project: 6316 (Run 183, Clone 3, Gen 26)

Post by AgrFan »

This one failed at 100% with 'Server reports problem with unit'. 1.5 days of work lost. I have problems constantly getting new work on this box. I can go hours before a new unit gets downloaded. This box is running the Linux client on Ubuntu 8.04. I don't have any problems downloading 6316 WUs on my Windows boxes. I restarted the client and got the same unit again. Will see what happens. Could there be a shortage of non-SMP Linux WUs right now?

Code: Select all

[03:02:37] Completed 495000 out of 500000 steps  (99%)
[03:26:30] Writing local files
[03:26:30] Completed 500000 out of 500000 steps  (100%)
[03:26:30] Writing final coordinates.
[03:26:30] Past main M.D. loop
[03:27:30] 
[03:27:30] Finished Work Unit:
[03:27:30] - Reading up to 362448 from "work/wudata_03.arc": Read 362448
[03:27:30] - Reading up to 3515260 from "work/wudata_03.xtc": Read 3515260
[03:27:30] goefile size: 0
[03:27:30] logfile size: 905340
[03:27:30] Leaving Run
[03:27:31] - Writing 6547340 bytes of core data to disk...
[03:27:35] Done: 6546828 -> 4729396 (compressed to 72.2 percent)
[03:27:35]   ... Done.
[03:27:36] - Shutting down core
[03:27:36] 
[03:27:36] Folding@home Core Shutdown: FINISHED_UNIT
[03:27:37] CoreStatus = 64 (100)
[03:27:37] Sending work to server


[03:27:37] + Attempting to send results
[03:27:43] - Server reports problem with unit.
[03:27:43] - Preparing to get new work unit...
[03:27:43] + Attempting to get work packet
[03:27:43] - Connecting to assignment server
[03:27:44] + No appropriate work server was available; will try again in a bit.
[03:27:44] + Couldn't get work instructions.
[03:27:44] - Attempt #1  to get work failed, and no other work to do.
             Waiting before retry.
[03:28:03] + Attempting to get work packet
[03:28:03] - Connecting to assignment server
[03:28:03] + No appropriate work server was available; will try again in a bit.
[03:28:03] + Couldn't get work instructions.
[03:28:03] - Attempt #2  to get work failed, and no other work to do.
             Waiting before retry.
[03:28:22] + Attempting to get work packet
[03:28:22] - Connecting to assignment server
[03:28:23] + No appropriate work server was available; will try again in a bit.
[03:28:23] + Couldn't get work instructions.
[03:28:23] - Attempt #3  to get work failed, and no other work to do.
             Waiting before retry.
[03:28:50] + Attempting to get work packet
[03:28:50] - Connecting to assignment server
[03:28:56] + No appropriate work server was available; will try again in a bit.
[03:28:56] + Couldn't get work instructions.
[03:28:56] - Attempt #4  to get work failed, and no other work to do.
             Waiting before retry.
[03:29:48] + Attempting to get work packet
[03:29:48] - Connecting to assignment server
[03:29:49] + No appropriate work server was available; will try again in a bit.
[03:29:49] + Couldn't get work instructions.
[03:29:49] - Attempt #5  to get work failed, and no other work to do.
             Waiting before retry.
[03:31:11] + Attempting to get work packet
[03:31:11] - Connecting to assignment server
[03:31:11] + No appropriate work server was available; will try again in a bit.
[03:31:11] + Couldn't get work instructions.
[03:31:11] - Attempt #6  to get work failed, and no other work to do.
             Waiting before retry.
[03:33:52] + Attempting to get work packet
[03:33:52] - Connecting to assignment server
[03:33:58] + No appropriate work server was available; will try again in a bit.
[03:33:58] + Couldn't get work instructions.
[03:33:58] - Attempt #7  to get work failed, and no other work to do.
             Waiting before retry.
[03:39:23] + Attempting to get work packet
[03:39:23] - Connecting to assignment server
[03:39:29] + No appropriate work server was available; will try again in a bit.
[03:39:29] + Couldn't get work instructions.
[03:39:29] - Attempt #8  to get work failed, and no other work to do.
             Waiting before retry.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 6316 (Run 183, Clone 3, Gen 26)

Post by bruce »

AgrFan wrote:I restarted the client and got the same unit again. Will see what happens. Could there be a shortage of non-SMP Linux WUs right now?
The servers often reassign the same WU under the assumption that it was lost during transmission. If that is the reason you are asking about a shortage of WUs, then no, that's not an indication that the servers are short of WUs.

In very rare instances, I've seen servers run short of WUs. It's reasonably easy to tell. Go to the Server Status page (link in the header of this page or at Stanford.edu). Look at the following columns: client=classic, STATUS=full, WUs AVAIL = more than about 1000, OperatingSystem contains "L", Min_packet and memory commensurate with your hardware. I'm sure you'll always find a number of servers that can assign work to you.

Another method: If you're behind a proxy that blocks port 8080, you can start with the servers that have green in the column "% Ass 80" and if you don't have that port blocked, you can start with "% Ass" and then look for the "L" in OS column.
AgrFan
Posts: 63
Joined: Sat Mar 15, 2008 8:07 pm

Re: Project: 6316 (Run 183, Clone 3, Gen 26)

Post by AgrFan »

171.64.65.111 has Netload > 200 and WU Avail = 14650. Could the high netload explain why the unit was lost in transmission? Restarting the client downloaded the same unit immediately. I would have thought the download would have stalled due to the high netload on this server. I still don't understand why my Windows boxes don't have this problem. I guess I'll find out in a day or so when the unit finishes.
AgrFan
Posts: 63
Joined: Sat Mar 15, 2008 8:07 pm

Re: Project: 6316 (Run 183, Clone 3, Gen 26)

Post by AgrFan »

Redo of this unit completed successfully today :D

Code: Select all

[19:57:33] Completed 500000 out of 500000 steps  (100%)
[19:57:34] Writing final coordinates.
[19:57:34] Past main M.D. loop
[19:58:34] 
[19:58:34] Finished Work Unit:
[19:58:34] - Reading up to 362448 from "work/wudata_04.arc": Read 362448
[19:58:34] - Reading up to 3515260 from "work/wudata_04.xtc": Read 3515260
[19:58:34] goefile size: 0
[19:58:34] logfile size: 905340
[19:58:34] Leaving Run
[19:58:38] - Writing 6547340 bytes of core data to disk...
[19:58:42] Done: 6546828 -> 4729130 (compressed to 72.2 percent)
[19:58:42]   ... Done.
[19:58:43] - Shutting down core
[19:58:43] 
[19:58:43] Folding@home Core Shutdown: FINISHED_UNIT
[19:58:43] CoreStatus = 64 (100)
[19:58:43] Sending work to server


[19:58:43] + Attempting to send results
[19:58:50] + Results successfully sent
[19:58:50] Thank you for your contribution to Folding@Home.
[19:58:50] + Number of Units Completed: 3
Post Reply