54 failed attempts to upload to 66.170.111.50

Moderators: Site Moderators, FAHC Science Team

Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 54 failed attempts to upload to 66.170.111.50

Post by Neil-B »

Collection Servers are simply a backup mechanism - originally aimed at covering a server if it was taken down for some reason, but has also been used to cope with surges in comms .. WUs still need to make their way back to the original Work Server for processing - so unless best case is that they are never used (even if one has been set) .. The researchers choose whether to set one.

The existence or not of a CS has no relevance to importance of WUs or their priority.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 54 failed attempts to upload to 66.170.111.50

Post by bruce »

When a server is initialized, it is given a list of potential Collection Servers with which it can make a connection. If connections to that/those server(s) fail, it can stop accepting WUs that might be uploaded to that CS. I didn't check the status of aws3foldingathome.org but if it went off-line or disk storage got full, it can cease being available as a CS until everything is working again and the primary WS is restarted.

There have been some recent changes to the lists of active servers but I can't point to a specific issue there. New servers are replacing old ones and unreliable cloud servers are being disabled but the primary list of work servers hasn't changed much.

As Neil said, a connection to a CS improves reliability but it isn't a necessity.
Colonel_Klink
Posts: 49
Joined: Sat Aug 15, 2020 5:43 pm
Location: Pacific Northwest, USA

Re: 54 failed attempts to upload to 66.170.111.50

Post by Colonel_Klink »

@bruce

Again thanks for the explanation. It looks as though aws3foldingathome.org has has recent problems that may have reoccurred. viewtopic.php?f=24&t=36103

If a server is both assigning new work and uploading completed work, does assigning new work have a higher priority than receiving completed work when the server has a heavy load??
Colonel_Klink
RTX 2080 Super
AMD Ryzen 9 3900X
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 54 failed attempts to upload to 66.170.111.50

Post by PantherX »

Colonel_Klink wrote:...point me to a tutorial on how the collection and assignment servers work?...
Neil-B has provided a decent summary. However, if you want additional details and how everything fits, have a read here: viewtopic.php?f=18&t=17794
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Colonel_Klink
Posts: 49
Joined: Sat Aug 15, 2020 5:43 pm
Location: Pacific Northwest, USA

Re: 54 failed attempts to upload to 66.170.111.50

Post by Colonel_Klink »

@bruce

Thanks for the response, however I have read that post many times and do not see the answer to my question. If a collection server is not assigned to an assignment server, does the assignment server, when acting as it's own collection server, place priority on the assignment task or the collection task.

Also, last night several of the WU's for 140.163.4.231 that had not uploaded finally uploaded about 5 hours ago. When I looked to see if I received credit for the WU 11752 (0,6632,52) I see that someone else also received credit for this WU, who appears to have been issued the WU after I had completed the WU, and an attempt to upload connected to the server and the upload failed for some reason, but before it was finally uploaded https://apps.foldingathome.org/wu#proje ... 632&gen=52 I believe this problem is one that you may be commenting on in a GitHub issue.

I still think that the aws3foldingathome.org server being shown as a failed collection server for both 140.163.4.231 and 60.170.111.50 is possibly causing the delay in uploading the completed WU's that I am concerned about. Now I wonder if aws3foldingathome.org is a contributing cause to the GitHub issue of the same WU's being assigned to multiple folders. Does the connection to a collection server set a flag that a WU has been completed and returned or does the flag get set after the WU has fully uploaded?
Colonel_Klink
RTX 2080 Super
AMD Ryzen 9 3900X
Joe_H
Site Admin
Posts: 7871
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 54 failed attempts to upload to 66.170.111.50

Post by Joe_H »

If your WU never attempts to connect to aws3, then it was never set to use that CS for uploading returns. So that would not have any effect on your upload. The aws3 server may have been removed, or never set, for the project the WUs came from as it is low on space and will probably be decommissioned as a WS in the near future as projects on it finish or the server arrangement with Amazon ends.

Priority on a WS is given to creation fo the next generation for a WU that has been returned, then handling incoming and outgoing connections. Given that most of the WU uploads you have problems with do eventually go overnight, I suspect some part of the network connection between you and 66.170.111.50 is getting saturated during the daytime and drops packet ACKs connected with your upload.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 54 failed attempts to upload to 66.170.111.50

Post by Neil-B »

The 2nd folder was assigned the WU you had just after the 1 day timeout ... your completed WU was received by the WS a day after the other folder had returned it ... it is not when you complete the WU that is important rather when the WS receives it back .. So the sequence is:

Assigned to you .. 2020-10-03 09:17:29
Assigned to other folder ... 2020-10-04 09:22:34 ... just over 24hr after it was assigned to you
Returned by other folder ... 2020-10-04 13:12:07 ... please note that the column headings for returned and credited are switched
Returned by you ...2020-10-05 11:50:53

If a WU is sent to a CS then until it gets forwarded to the WS in question the WS does not know it has been returned and will reassign and reissue if the timeout passes ... but this isn't a CS/WS issue from what I can.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Colonel_Klink
Posts: 49
Joined: Sat Aug 15, 2020 5:43 pm
Location: Pacific Northwest, USA

Re: 54 failed attempts to upload to 66.170.111.50

Post by Colonel_Klink »

Neil_B

I get your point and recognized it before. The issue I have is that my system was trying to upload multiple times before the WU's was issued to the other folder. The problem is that the same WU was issued to two folders before the timeout. Issuing the same WU to two folders does not make sense, unless the time out period has passed.
Colonel_Klink
RTX 2080 Super
AMD Ryzen 9 3900X
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 54 failed attempts to upload to 66.170.111.50

Post by Neil-B »

timeout was 24hrs for that WU ... the period between it being assigned to you and the next was seconds over 24hr and 5mins ... it wasn't reissued before timeout ... Date format it YYYY-MM-DD HH:MM:SS

Until the WS receives your completed WU it treats the WU as still outstanding and if Timeout passes will put the WU in the queue for reassignment ... Now I understand it is frustrating for you to have WUs that complete folding well before timeout and don't upload until after timeout - but the WS is doing precisely what it is meant to.

The resolution to this is not to change any behaviour on the WS (which is simply following due process) but to find and sort out whatever if causing your connections to the server to hang/drop/fail ... once your connections are uploading 1st time (as the vast majority of uploads should be) then yout WUs will be being returned before timeout and so the WS will not reissue.

If as I think you might be suggesting the WS should recognise you are trying to upload and therefore not reissue then the problem would be with that approach that your upload might continued trying to upload past timeout (1 day) all the way to expiration (8.2 days) until finally the client dumps the WU ... this would put intolerable delays into the progress of science ... the rule implemented is simply that is the WS has itself not received back the completed WU (either directly or via a CS) by the timeout it is queued for reissue to minimise the potential for further delays of unknown length (either totally lost WU or just a very slow processing).
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 54 failed attempts to upload to 66.170.111.50

Post by bruce »

As far as the server recognizing when a WU has been returned is concerned, the server may have incoming WU results that have started to upload but have not yet been "received." It's not officially received until the upload finishes and the server notes the PRCG, the name/team/etc associated with the upload, and it gives it a timestamp indicating when it was received. Incomplete/partial uploads are dumped/ignored until the complete results file is received.

The same process is used on either the WS or a CS, so the recorded time that the upload was completed can be reported by the clock on a CS before the WU and the credit record are actually transferred from the CS to the WS. If that transfer of information has not been completed promptly, the WS won't know about it so a duplicate WU might be assigned, but the duration of that interval is generally quite short.

Since you're asking about a server congratulation without a CS, none of this matters. All WUs have to be returned to the WS from whence it came.
Post Reply