unable to return work units

Moderators: Site Moderators, FAHC Science Team

Post Reply
Runaway1956
Posts: 27
Joined: Mon Feb 15, 2016 12:19 pm

unable to return work units

Post by Runaway1956 »

I have 5 clients, 4 of which are stacking up work units. At present, there are 19 work units failing to return.

Code: Select all

01:53:33:WARNING:WU06:FS02:Exception: Failed to send results to work server: Transfer failed
01:53:33:WU06:FS02:Trying to send results to collection server
01:53:33:WU06:FS02:Uploading 27.50MiB to 128.252.203.13
01:53:33:WU06:FS02:Connecting to 128.252.203.13:8080
01:53:41:WU06:FS02:Upload 0.45%
01:53:49:WU06:FS02:Upload 0.91%
01:53:56:WU06:FS02:Upload 1.36%
01:54:04:WU06:FS02:Upload 1.82%
01:54:09:WU08:FS01:0x22:Completed 912500 out of 1250000 steps (73%)
01:54:32:WU06:FS02:Upload 2.05%
01:54:40:WU06:FS02:Upload 2.50%
01:54:46:WU06:FS02:Upload 2.95%
01:55:02:WU06:FS02:Upload 3.18%
01:55:39:WU06:FS02:Upload 3.86%
01:55:47:WU06:FS02:Upload 4.32%
01:55:54:WU06:FS02:Upload 5.00%
01:56:00:WU06:FS02:Upload 5.23%
01:56:08:WU06:FS02:Upload 5.68%
01:56:18:WU06:FS02:Upload 6.14%
01:56:26:WU06:FS02:Upload 6.59%
01:56:32:WU08:FS01:0x22:Completed 925000 out of 1250000 steps (74%)
01:56:34:WU06:FS02:Upload 7.27%
01:56:35:WU08:FS01:0x22:Checkpoint completed at step 925000
01:56:52:WU06:FS02:Upload 7.73%
01:56:58:WU06:FS02:Upload 8.18%
01:57:06:WU06:FS02:Upload 8.64%
01:57:12:WU06:FS02:Upload 9.09%
01:57:22:WU06:FS02:Upload 9.77%
01:57:32:WU06:FS02:Upload 10.23%
01:57:40:WU06:FS02:Upload 10.68%
01:57:46:WU06:FS02:Upload 10.91%
01:57:54:WU06:FS02:Upload 11.59%
01:58:27:WU07:FS02:0x22:Completed 100000 out of 1250000 steps (8%)
01:58:27:WU06:FS02:Upload 12.04%
01:58:30:WU07:FS02:0x22:Checkpoint completed at step 100000
01:58:35:WU06:FS02:Upload 12.73%
01:58:44:WU06:FS02:Upload 13.18%
01:58:50:WU06:FS02:Upload 13.64%
01:58:56:WU08:FS01:0x22:Completed 937500 out of 1250000 steps (75%)
01:58:58:WU06:FS02:Upload 14.32%
01:59:04:WU06:FS02:Upload 14.77%
01:59:13:WU06:FS02:Upload 15.23%
01:59:19:WU06:FS02:Upload 15.68%
01:59:27:WU06:FS02:Upload 16.14%
01:59:36:WU06:FS02:Upload 16.59%
01:59:42:WU06:FS02:Upload 16.82%
01:59:54:WU06:FS02:Upload 17.04%
02:00:08:WU06:FS02:Upload 17.27%
02:00:16:WU06:FS02:Upload 17.95%
02:00:26:WU06:FS02:Upload 18.41%
02:00:33:WU06:FS02:Upload 18.64%
Are the servers at fault? I can't think of any good reason for continued and repeated failures at my end. I have a crappy ISP, but it usually gets the job done.

Projects affected include 18039, 18601, 17257, 18601, 18201, 18213, and 18210. Collection servers include 128.252.203.14, 128.252.203.1, 128.252.203.10, 128.252.203.12, and 128.252.203.13.

Any help would be appreciated. Each work unit is losing points at a steady pace!
Image
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: unable to return work units

Post by bollix47 »

Here are the steps that usually work:

1. Pause ALL clients
2. reboot router/modem (usually turning it/them off for 30 seconds is long enough to reset the hardware)
3. reboot one computer
(shutdown the computer for 30 seconds & restart)
(check log for uploading & unpause/Fold if necessary to start upload)
(don't reboot all of them at once as that will just mess up your comms)
4. observe the one computer and wait until all uploads for it to finish
5. repeat 3 & 4 for each computer (one at a time)

ps the client will always try to return it's work unit to the WS it came from and if that doesn't work then it tries a CS
Runaway1956
Posts: 27
Joined: Mon Feb 15, 2016 12:19 pm

Re: unable to return work units

Post by Runaway1956 »

OK, just read your post, and it makes sense.

1. I simply hit the kill switch that supplies the router for 3 minutes
2. turned off wifi on the laptops
3. rebooted router
4. returned to my own Linux machine
5. watching the main Windows machine in FAHContol
6. 8 Work Units are vying for bandwidth
7. one fails around 6%
8. another fails around 9%
9. WUs 6, 8, and 4 have all forged ahead, each has passed 33% upload
10. time passes, WUs 6, 8, and 4 pretty much keep pace with each other, passing 45% while WU 2 has come from way behind and passed them all up, passing 50% ahead of the others
11. WU 2 fails, but the other 3 continue on past 50%
12. quick snapshot of log:

Code: Select all

11:50:57:WU08:FS01:Upload 69.99%
11:51:00:WU04:FS02:Upload 68.42%
11:51:02:WU06:FS02:Upload 70.45%
11:51:03:WU05:FS01:Upload 58.42%
11:51:04:WU00:FS01:Upload 31.04%
11:51:04:WU02:FS01:Upload 38.58%
11:51:06:WU04:FS02:Upload 68.87%
11:51:08:WU08:FS01:Upload 70.45%
11:51:09:WU06:FS02:Upload 70.91%
11:51:10:WU05:FS01:Upload 58.87%
11:51:11:WU00:FS01:Upload 31.15%
11:51:14:WU02:FS01:Upload 38.82%
11:51:15:WU08:FS01:Upload 70.90%
11:51:15:WU04:FS02:Upload 69.10%
11:51:17:WU00:FS01:Upload 31.36%
11:51:19:WU05:FS01:Upload 59.33%
11:51:20:WU02:FS01:Upload 39.07%
11:51:22:WU04:FS02:Upload 69.56%
11:51:23:WU00:FS01:Upload 31.57%
11:51:24:WU08:FS01:Upload 71.36%
11:51:27:WU06:FS02:Upload 71.36%
11:51:28:WU02:FS01:Upload 39.31%
11:51:28:WU05:FS01:Upload 59.78%
11:51:29:WU04:FS02:Upload 70.01%
13. WU 8 fails, but the other two in the lead go past 80%
14. WUs 6 and 4 pass 95%
15. WIs 6 amd 4 finished, and acknowledged by server, WU 5 is in the lead now at about 80%, now 95%, and done
16. Now I have only vying for bandwidth - none of them moving very quickly
17. WU 2 has finally passed 90%, and I realize that I've not seen another failure - all five WUs on this machine are proceeding normally, if slowly.
18. WU 2 has finished uploading, and while I wasn't paying a lot of attention, WU 8 has come from behind, has passed 90% - now it has finished.
19. WU 3 finished, WU 1 finished soon after, WU 0 at 61%

Code: Select all

12:27:41:WU00:FS01:Upload 66.39%
12:27:47:WU00:FS01:Upload 67.02%
12:27:53:WU00:FS01:Upload 67.54%
12:28:00:WU00:FS01:Upload 68.28%
12:28:06:WU00:FS01:Upload 68.91%
12:28:12:WU00:FS01:Upload 69.54%
12:28:18:WU00:FS01:Upload 70.06%
12:28:24:WU00:FS01:Upload 70.69%
12:28:30:WU00:FS01:Upload 71.32%
12:28:36:WU00:FS01:Upload 71.84%
12:28:43:WU00:FS01:Upload 72.58%
12:28:49:WU00:FS01:Upload 73.21%
12:28:56:WU00:FS01:Upload 73.84%
12:29:02:WU00:FS01:Upload 74.57%
12:29:08:WU00:FS01:Upload 75.09%
12:29:14:WU00:FS01:Upload 75.72%
I'll allow that last one to finish, then switch on WIFI on one of the laptops and watch it.

Thanks for the suggestion, it works!
Image
Runaway1956
Posts: 27
Joined: Mon Feb 15, 2016 12:19 pm

Re: unable to return work units

Post by Runaway1956 »

At last, the backlog has disappeared, with the last WU almost finished uploading. I've lost the bonus entirely on a couple of them, and reduced bonuses on the rest, but that's life I guess.

Code: Select all

15:15:09:WU05:FS00:Upload 78.19%
15:15:09:WU07:FS00:Server responded WORK_ACK (400)
15:15:09:WU07:FS00:Final credit estimate, 6053.00 points
15:15:09:WU07:FS00:Cleaning up
15:15:15:WU05:FS00:Upload 79.31%
15:15:21:WU05:FS00:Upload 80.89%
15:15:27:WU05:FS00:Upload 82.24%
15:15:35:WU05:FS00:Upload 83.81%
Image
Post Reply