128.252.203.10 and 128.252.203.12 struggling

Moderators: Site Moderators, FAHC Science Team

Post Reply
mgetz
Posts: 57
Joined: Tue Aug 11, 2020 6:23 pm

128.252.203.10 and 128.252.203.12 struggling

Post by mgetz »

Mostly an FYI that these two servers are having issues with uploads, It did eventually make it to 128.252.203.12 but it took multiple tries which is unusual

Code: Select all

15:40:57:WU01:FS00:Trying to send results to collection server
15:40:57:WU01:FS00:Uploading 27.82MiB to 128.252.203.10
15:40:57:WU01:FS00:Connecting to 128.252.203.10:8080
15:41:28:WU01:FS00:Upload 1.35%
15:41:43:ERROR:WU01:FS00:Exception: Transfer failed
15:41:43:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:18212 run:7332 clone:0 gen:6 core:0xa8 unit:0x00000000000000060000472400001ca4
15:41:44:WU01:FS00:Uploading 27.82MiB to 128.252.203.12
15:41:44:WU01:FS00:Connecting to 128.252.203.12:8080
15:41:51:WU01:FS00:Upload 0.22%
15:42:22:WU01:FS00:Upload 0.67%
15:42:22:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
15:42:22:WU01:FS00:Trying to send results to collection server
15:42:22:WU01:FS00:Uploading 27.82MiB to 128.252.203.10
15:42:22:WU01:FS00:Connecting to 128.252.203.10:8080
15:42:53:WU01:FS00:Upload 1.35%
15:42:53:ERROR:WU01:FS00:Exception: Transfer failed
15:42:54:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:18212 run:7332 clone:0 gen:6 core:0xa8 unit:0x00000000000000060000472400001ca4
15:42:54:WU01:FS00:Uploading 27.82MiB to 128.252.203.12
15:42:54:WU01:FS00:Connecting to 128.252.203.12:8080
15:43:09:WU01:FS00:Upload 0.22%
15:43:25:WU00:FS00:0xa8:Completed 2500 out of 125000 steps (2%)
15:43:33:WU01:FS00:Upload 1.35%
15:43:39:WU01:FS00:Upload 59.08%
15:43:44:WU01:FS00:Upload complete
15:43:44:WU01:FS00:Server responded WORK_ACK (400)
15:43:44:WU01:FS00:Final credit estimate, 15065.00 points
Image
jjmiller
Scientist
Posts: 81
Joined: Fri Apr 09, 2021 4:43 pm

Re: 128.252.203.10 and 128.252.203.12 struggling

Post by jjmiller »

Thank you for the notice. I went in and dialed back the assignment rates for the jobs uploading to these two servers.
NGruia
Posts: 13
Joined: Sun Apr 12, 2020 2:46 pm

Re: 128.252.203.10 and 128.252.203.12 struggling

Post by NGruia »

Code: Select all

******************************* Date: 2021-06-30 *******************************
19:52:30:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
22:11:33:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
23:08:30:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
******************************* Date: 2021-07-01 *******************************
******************************* Date: 2021-07-01 *******************************
09:38:30:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
******************************* Date: 2021-07-01 *******************************
14:15:07:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
******************************* Date: 2021-07-01 *******************************
******************************* Date: 2021-07-02 *******************************
******************************* Date: 2021-07-02 *******************************
11:20:20:WU00:FS00:0x22:WARNING:Console control signal 1 on PID 4580
11:20:20:WARNING:WU00:Slot ID 0 no longer exists and there are no other matching slots, dumping
******************************* Date: 2021-07-02 *******************************
******************************* Date: 2021-07-02 *******************************
16:31:02:ERROR:WU00:FS00:Exception: Server did not assign work unit
******************************* Date: 2021-07-02 *******************************
******************************* Date: 2021-07-02 *******************************
23:56:07:ERROR:WU02:FS01:Exception: Server did not assign work unit
23:56:29:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
******************************* Date: 2021-07-03 *******************************
03:00:32:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
03:00:53:ERROR:WU02:FS01:Exception: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
******************************* Date: 2021-07-03 *******************************
08:26:45:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
08:27:06:ERROR:WU00:FS00:Exception: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
08:26:45:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
08:27:06:ERROR:WU00:FS00:Exception: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
******************************* Date: 2021-07-03 *******************************
******************************* Date: 2021-07-03 *******************************
******************************* Date: 2021-07-04 *******************************
04:50:43:ERROR:WU01:FS00:Exception: Server did not assign work unit
******************************* Date: 2021-07-04 *******************************
10:01:19:ERROR:WU01:FS00:Exception: Server did not assign work unit
******************************* Date: 2021-07-04 *******************************
15:11:12:ERROR:WU00:FS00:Exception: Server did not assign work unit
******************************* Date: 2021-07-04 *******************************
******************************* Date: 2021-07-05 *******************************
05:23:00:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
******************************* Date: 2021-07-05 *******************************
09:24:05:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
09:24:26:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
******************************* Date: 2021-07-05 *******************************
******************************* Date: 2021-07-05 *******************************
21:22:44:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
22:47:47:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
******************************* Date: 2021-07-06 *******************************
******************************* Date: 2021-07-06 *******************************
11:17:51:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
******************************* Date: 2021-07-06 *******************************
14:39:08:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
14:39:29:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
15:09:59:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
15:10:21:WARNING:WU02:FS00:Exception: Failed to send results to work server: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
16:04:34:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
16:04:55:ERROR:WU02:FS01:Exception: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
******************************* Date: 2021-07-06 *******************************
******************************* Date: 2021-07-07 *******************************
******************************* Date: 2021-07-07 *******************************
******************************* Date: 2021-07-07 *******************************
******************************* Date: 2021-07-07 *******************************
******************************* Date: 2021-07-08 *******************************
******************************* Date: 2021-07-08 *******************************
******************************* Date: 2021-07-08 *******************************
******************************* Date: 2021-07-08 *******************************
******************************* Date: 2021-07-09 *******************************
******************************* Date: 2021-07-09 *******************************
06:50:09:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
08:53:51:ERROR:WU02:FS00:Exception: Server did not assign work unit
******************************* Date: 2021-07-09 *******************************
******************************* Date: 2021-07-09 *******************************
00:21:43:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
******************************* Date: 2021-07-10 *******************************
******************************* Date: 2021-07-10 *******************************
******************************* Date: 2021-07-10 *******************************
******************************* Date: 2021-07-10 *******************************
******************************* Date: 2021-07-11 *******************************
******************************* Date: 2021-07-11 *******************************
******************************* Date: 2021-07-11 *******************************
******************************* Date: 2021-07-11 *******************************
******************************* Date: 2021-07-12 *******************************
03:08:10:WARNING:WU01:FS01:Failed to send results, will try again later
03:08:31:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
03:09:06:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
03:10:17:WARNING:WU01:FS01:Failed to send results, will try again later
03:10:38:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
03:12:37:WARNING:WU01:FS01:Failed to send results, will try again later
03:12:58:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
03:13:20:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:14:32:WARNING:WU01:FS01:Failed to send results, will try again later
03:15:35:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
03:15:57:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
03:17:06:WARNING:WU01:FS01:Failed to send results, will try again later
03:19:50:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
03:21:58:WARNING:WU01:FS01:Failed to send results, will try again later
03:28:04:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
03:29:13:WARNING:WU01:FS01:Failed to send results, will try again later
03:39:15:WARNING:WU01:FS01:Failed to send results, will try again later
03:55:44:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
03:57:39:WARNING:WU01:FS01:Failed to send results, will try again later
04:22:26:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
04:24:13:WARNING:WU00:FS00:Failed to send results, will try again later
04:26:04:WARNING:WU00:FS00:Failed to send results, will try again later
04:26:12:WARNING:WU01:FS01:Failed to send results, will try again later
04:27:54:WARNING:WU00:FS00:Failed to send results, will try again later
04:28:38:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
04:29:47:WARNING:WU00:FS00:Failed to send results, will try again later
04:30:52:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
04:31:14:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
04:32:23:WARNING:WU00:FS00:Failed to send results, will try again later
04:35:25:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
04:36:34:WARNING:WU00:FS00:Failed to send results, will try again later
04:41:58:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
04:44:19:WARNING:WU00:FS00:Failed to send results, will try again later
04:54:22:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
04:55:31:WARNING:WU00:FS00:Failed to send results, will try again later
05:11:45:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
05:13:34:WARNING:WU00:FS00:Failed to send results, will try again later
05:13:44:WARNING:WU01:FS01:Failed to send results, will try again later
05:26:47:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
05:41:25:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
05:42:34:WARNING:WU00:FS00:Failed to send results, will try again later
06:14:13:WARNING:WU01:FS01:Failed to send results, will try again later
06:28:51:WARNING:WU00:FS00:Failed to send results, will try again later
******************************* Date: 2021-07-12 *******************************
07:13:16:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
07:14:39:WARNING:WU01:FS01:Failed to send results, will try again later
07:28:44:WARNING:WU00:FS00:Failed to send results, will try again later
07:44:34:WARNING:WU04:FS01:WorkServer connection failed on port 8080 trying 80
08:13:23:WARNING:WU01:FS01:Failed to send results, will try again later
08:29:00:WARNING:WU00:FS00:Failed to send results, will try again later
09:10:01:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
09:10:22:ERROR:WU03:FS01:Exception: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
09:10:44:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
09:11:05:ERROR:WU03:FS01:Exception: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
09:13:30:WARNING:WU01:FS01:Failed to send results, will try again later
09:27:02:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
09:27:23:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
09:28:33:WARNING:WU00:FS00:Failed to send results, will try again later
10:13:10:WARNING:WU01:FS01:Failed to send results, will try again later
10:27:02:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
10:27:23:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
10:29:04:WARNING:WU00:FS00:Failed to send results, will try again later
10:37:53:ERROR:WU02:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
10:38:15:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
10:38:36:ERROR:WU02:FS01:Exception: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
10:39:15:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
11:11:46:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
11:12:55:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
11:14:05:WARNING:WU01:FS01:Failed to send results, will try again later
11:29:08:WARNING:WU00:FS00:Failed to send results, will try again later
NGruia
Posts: 13
Joined: Sun Apr 12, 2020 2:46 pm

Re: 128.252.203.10 and 128.252.203.12 struggling

Post by NGruia »

2x 3070
07:14:15:WARNING:WU01:FS01:Failed to send results, will try again later
07:23:20:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
07:24:29:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
07:25:42:WARNING:WU00:FS00:Failed to send results, will try again later
08:12:27:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
08:12:48:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 128.252.203.11:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
08:14:16:WARNING:WU01:FS01:Failed to send results, will try again later
08:25:17:WARNING:WU00:FS00:Failed to send results, will try again later
09:12:27:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
09:14:27:WARNING:WU01:FS01:Failed to send results, will try again later
09:23:14:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
09:25:00:WARNING:WU00:FS00:Failed to send results, will try again later
09:25:14:WARNING:WU04:FS01:WorkServer connection failed on port 8080 trying 80
******************************* Date: 2021-07-12 *******************************
10:14:08:WARNING:WU01:FS01:Failed to send results, will try again later
10:24:28:WARNING:WU03:FS00:WorkServer connection failed on port 8080 trying 80
10:25:12:WARNING:WU00:FS00:Failed to send results, will try again later
11:14:20:WARNING:WU01:FS01:Failed to send results, will try again later
11:24:57:WARNING:WU00:FS00:Failed to send results, will try again later
jjmiller
Scientist
Posts: 81
Joined: Fri Apr 09, 2021 4:43 pm

Re: 128.252.203.10 and 128.252.203.12 struggling

Post by jjmiller »

Thanks all- just getting back into the office after a week out. Checking on these now.
gordonbb
Posts: 510
Joined: Mon May 21, 2018 4:12 pm
Hardware configuration: Ubuntu 22.04.2 LTS; NVidia 525.60.11; 2 x 4070ti; 4070; 4060ti; 3x 3080; 3070ti; 3070
Location: Great White North

Re: 128.252.203.10 and 128.252.203.12 struggling

Post by gordonbb »

I have one WU currently stuck trying to upload to WS 128.252.203.11 and CS 128.252.203.10 since 11:17GMT

11:17:09:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18003 run:0 clone:168 gen:67

These are from an Ubuntu Linux 18.04.5 system that has no Anti-Virus and there are 7 other GPUs here on this and other systems not having this issue.

The upload to both servers get 50-65% done then fails.

Code: Select all

16:58:10:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:18202 run:4825 clone:4 gen:12 core:0x22 unit:0x000000040000000c0000471a000012d9
16:58:10:WU01:FS00:Uploading 27.50MiB to 128.252.203.11
16:58:10:WU01:FS00:Connecting to 128.252.203.11:8080
16:58:16:WU01:FS00:Upload 26.59%
16:58:22:WU01:FS00:Upload 53.41%
16:58:54:WU01:FS00:Upload 64.77%
16:58:54:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
16:58:54:WU01:FS00:Trying to send results to collection server
16:58:54:WU01:FS00:Uploading 27.50MiB to 128.252.203.10
16:58:54:WU01:FS00:Connecting to 128.252.203.10:8080
16:59:00:WU01:FS00:Upload 25.00%
16:59:06:WU01:FS00:Upload 52.27%
16:59:39:WU01:FS00:Upload 64.77%
16:59:39:ERROR:WU01:FS00:Exception: Transfer failed
Traceroute shows nothing too untoward:

Code: Select all

traceroute to 128.252.203.10 (128.252.203.10), 30 hops max, 60 byte packets
 1  grumpy.lan.dwarf.ca (198.51.100.1)  0.204 ms  0.421 ms  0.491 ms
 2  lo0-0-lns03-tor.teksavvy.com (206.248.155.139)  10.860 ms  10.990 ms  11.220 ms
 3  ae0-2150-bdr01-tor.teksavvy.com (69.196.136.172)  12.599 ms  12.637 ms  12.701 ms
 4  206.223.119.163 (206.223.119.163)  22.587 ms  22.717 ms  22.799 ms
 5  * * *
...
11  035-130-036-042.biz.spectrum.com (35.130.36.42)  26.893 ms  27.006 ms  28.159 ms
12  xe-7-2-0-bih-1017-wu-rt-0.net.wustl.edu (128.252.182.131)  28.281 ms  28.323 ms  26.947 ms
13  po11-epsl29-mll02-wu-vml-10.net.wustl.edu (128.252.161.23)  27.191 ms po10-epsl29-mll02-wu-vml-10.net.wustl.edu (128.252.161.21)  27.074 ms po11-epsl29-mll02-wu-vml-10.net.wustl.edu (128.252.161.23)  27.383 ms
14  vl308-epsl29-mll02-wu-vml-11.net.wustl.edu (128.252.161.146)  27.793 ms  28.103 ms  28.144 ms
15  po5-engineering-core.net.wustl.edu (128.252.161.133)  28.236 ms *  28.687 ms
...
root@dcn05# traceroute 128.252.203.11
traceroute to 128.252.203.11 (128.252.203.11), 30 hops max, 60 byte packets
 1  grumpy.lan.dwarf.ca (198.51.100.1)  0.199 ms  0.411 ms  0.483 ms
 2  lo0-0-lns03-tor.teksavvy.com (206.248.155.139)  13.859 ms  14.059 ms  14.160 ms
 3  ae0-2150-bdr01-tor.teksavvy.com (69.196.136.172)  11.127 ms  18.091 ms  18.154 ms
 4  206.223.119.163 (206.223.119.163)  22.351 ms  23.049 ms  23.113 ms
...
11  035-130-036-042.biz.spectrum.com (35.130.36.42)  26.930 ms  27.053 ms  26.728 ms
12  xe-7-2-0-eps-l29-wu-rt-0.net.wustl.edu (128.252.182.129)  26.930 ms xe-7-2-0-bih-1017-wu-rt-0.net.wustl.edu (128.252.182.131)  27.261 ms  26.550 ms
13  po10-epsl29-mll02-wu-vml-10.net.wustl.edu (128.252.161.21)  26.640 ms  27.149 ms  27.210 ms
14  vl308-epsl29-mll02-wu-vml-11.net.wustl.edu (128.252.161.146)  34.294 ms  34.202 ms  30.174 ms
15  po5-engineering-core.net.wustl.edu (128.252.161.133)  28.128 ms  28.163 ms *
...
and pings seem normal:

Code: Select all

root@dcn05# ping 128.252.203.10
PING 128.252.203.10 (128.252.203.10) 56(84) bytes of data.
...
143 packets transmitted, 143 received, 0% packet loss, time 142192ms
rtt min/avg/max/mdev = 27.317/27.518/28.379/0.179 ms

root@dcn05# ping 128.252.203.11
PING 128.252.203.11 (128.252.203.11) 56(84) bytes of data.
...
--- 128.252.203.11 ping statistics ---
68 packets transmitted, 68 received, 0% packet loss, time 67091ms
rtt min/avg/max/mdev = 27.371/27.551/28.026/0.184 ms
Image
jjmiller
Scientist
Posts: 81
Joined: Fri Apr 09, 2021 4:43 pm

Re: 128.252.203.10 and 128.252.203.12 struggling

Post by jjmiller »

Thanks for the report. I've searched through the logs and am only seeing a record of assignment for these WUs, but no notes about failed upload attempts/refusals on our end. Neither 128.252.203.10 or 128.252.203.11 is under load that is significantly different than the past few days and as far as I'm aware we haven't made any changes on our end. I have to run out the door, but will check with our system admin and see if there's anything on our end that can be done.

Thanks,
fz4z
Posts: 36
Joined: Fri Jul 16, 2021 10:22 am

Re: 128.252.203.10 and 128.252.203.12 struggling

Post by fz4z »

gordonbb wrote:I have one WU currently stuck trying to upload to WS 128.252.203.11 and CS 128.252.203.10 since 11:17GMT

11:17:09:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18003 run:0 clone:168 gen:67

These are from an Ubuntu Linux 18.04.5 system that has no Anti-Virus and there are 7 other GPUs here on this and other systems not having this issue.

The upload to both servers get 50-65% done then fails.

Code: Select all

16:58:10:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:18202 run:4825 clone:4 gen:12 core:0x22 unit:0x000000040000000c0000471a000012d9
16:58:10:WU01:FS00:Uploading 27.50MiB to 128.252.203.11
16:58:10:WU01:FS00:Connecting to 128.252.203.11:8080
16:58:16:WU01:FS00:Upload 26.59%
16:58:22:WU01:FS00:Upload 53.41%
16:58:54:WU01:FS00:Upload 64.77%
16:58:54:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
16:58:54:WU01:FS00:Trying to send results to collection server
16:58:54:WU01:FS00:Uploading 27.50MiB to 128.252.203.10
16:58:54:WU01:FS00:Connecting to 128.252.203.10:8080
16:59:00:WU01:FS00:Upload 25.00%
16:59:06:WU01:FS00:Upload 52.27%
16:59:39:WU01:FS00:Upload 64.77%
16:59:39:ERROR:WU01:FS00:Exception: Transfer failed
Traceroute shows nothing too untoward:

Code: Select all

traceroute to 128.252.203.10 (128.252.203.10), 30 hops max, 60 byte packets
 1  grumpy.lan.dwarf.ca (198.51.100.1)  0.204 ms  0.421 ms  0.491 ms
 2  lo0-0-lns03-tor.teksavvy.com (206.248.155.139)  10.860 ms  10.990 ms  11.220 ms
 3  ae0-2150-bdr01-tor.teksavvy.com (69.196.136.172)  12.599 ms  12.637 ms  12.701 ms
 4  206.223.119.163 (206.223.119.163)  22.587 ms  22.717 ms  22.799 ms
 5  * * *
...
11  035-130-036-042.biz.spectrum.com (35.130.36.42)  26.893 ms  27.006 ms  28.159 ms
12  xe-7-2-0-bih-1017-wu-rt-0.net.wustl.edu (128.252.182.131)  28.281 ms  28.323 ms  26.947 ms
13  po11-epsl29-mll02-wu-vml-10.net.wustl.edu (128.252.161.23)  27.191 ms po10-epsl29-mll02-wu-vml-10.net.wustl.edu (128.252.161.21)  27.074 ms po11-epsl29-mll02-wu-vml-10.net.wustl.edu (128.252.161.23)  27.383 ms
14  vl308-epsl29-mll02-wu-vml-11.net.wustl.edu (128.252.161.146)  27.793 ms  28.103 ms  28.144 ms
15  po5-engineering-core.net.wustl.edu (128.252.161.133)  28.236 ms *  28.687 ms
...
root@dcn05# traceroute 128.252.203.11
traceroute to 128.252.203.11 (128.252.203.11), 30 hops max, 60 byte packets
 1  grumpy.lan.dwarf.ca (198.51.100.1)  0.199 ms  0.411 ms  0.483 ms
 2  lo0-0-lns03-tor.teksavvy.com (206.248.155.139)  13.859 ms  14.059 ms  14.160 ms
 3  ae0-2150-bdr01-tor.teksavvy.com (69.196.136.172)  11.127 ms  18.091 ms  18.154 ms
 4  206.223.119.163 (206.223.119.163)  22.351 ms  23.049 ms  23.113 ms
...
11  035-130-036-042.biz.spectrum.com (35.130.36.42)  26.930 ms  27.053 ms  26.728 ms
12  xe-7-2-0-eps-l29-wu-rt-0.net.wustl.edu (128.252.182.129)  26.930 ms xe-7-2-0-bih-1017-wu-rt-0.net.wustl.edu (128.252.182.131)  27.261 ms  26.550 ms
13  po10-epsl29-mll02-wu-vml-10.net.wustl.edu (128.252.161.21)  26.640 ms  27.149 ms  27.210 ms
14  vl308-epsl29-mll02-wu-vml-11.net.wustl.edu (128.252.161.146)  34.294 ms  34.202 ms  30.174 ms
15  po5-engineering-core.net.wustl.edu (128.252.161.133)  28.128 ms  28.163 ms *
...
and pings seem normal:

Code: Select all

root@dcn05# ping 128.252.203.10
PING 128.252.203.10 (128.252.203.10) 56(84) bytes of data.
...
143 packets transmitted, 143 received, 0% packet loss, time 142192ms
rtt min/avg/max/mdev = 27.317/27.518/28.379/0.179 ms

root@dcn05# ping 128.252.203.11
PING 128.252.203.11 (128.252.203.11) 56(84) bytes of data.
...
--- 128.252.203.11 ping statistics ---
68 packets transmitted, 68 received, 0% packet loss, time 67091ms
rtt min/avg/max/mdev = 27.371/27.551/28.026/0.184 ms
There might also be issues with your router or even the ISP, so try tethering the computer via the cell phone network or using VPN, and then see if the problem persists.
jjmiller
Scientist
Posts: 81
Joined: Fri Apr 09, 2021 4:43 pm

Re: 128.252.203.10 and 128.252.203.12 struggling

Post by jjmiller »

Hi gordonbb, I noticed that at least P18202:R4825:C4:G12 uploaded successfully over the past 12-16h- https://apps.foldingathome.org/wu#proje ... e=4&gen=12. Do you still have any WUs stuck?

Thanks
gordonbb
Posts: 510
Joined: Mon May 21, 2018 4:12 pm
Hardware configuration: Ubuntu 22.04.2 LTS; NVidia 525.60.11; 2 x 4070ti; 4070; 4060ti; 3x 3080; 3070ti; 3070
Location: Great White North

Re: 128.252.203.10 and 128.252.203.12 struggling

Post by gordonbb »

jjmiller wrote:Hi gordonbb, I noticed that at least P18202:R4825:C4:G12 uploaded successfully over the past 12-16h- https://apps.foldingathome.org/wu#proje ... e=4&gen=12. Do you still have any WUs stuck?

Thanks
It finally uploaded and I've had no issues since. Very strange. There was end-to-end connectivity but just this one WU was stuck. I likely would not have notice but for that I "finish" folding during the prime time-of-use electricity periods during the Air Conditioning season here and HfM showed that slot in a un-paused state still a few hours after the scheduled finish time.
Image
Post Reply