Page 1 of 1

Issues with mskcc1

Posted: Fri Jan 14, 2022 6:10 pm
by gordonbb
I noticed all my slots across 5 systems transitioning to "Failed" and reboots would not help.

The logs showed:

Code: Select all

17:54:18:WU00:FS00:Requesting new work unit for slot 00: gpu:10:0 TU104 [GeForce RTX 2070 SUPER] 8218 from 54.157.202.86
17:54:18:WU00:FS00:Connecting to 54.157.202.86:8080
17:54:48:ERROR:WU00:FS00:Exception: Not connected
or connecting and receiving short responses.
Changing Client Preference from "COVID" to "Any" resulted in not being assigned to mskcc1and getting work.

So it looks like there might be an issue with mskcc1 where it has jobs to give out but can't assign them and the client spins away trying and eventuall drops to a failed state.

OS: Ubuntu 18.04.3 LTS; NVidia Driver: 460.91.03; Client: 7.6.21

Re: Issues with mskcc1

Posted: Fri Jan 14, 2022 10:38 pm
by toTOW
At least you don't get the bad WUs it hosts : viewtopic.php?f=19&t=37675 ;)

Re: Issues with mskcc1

Posted: Sat Jan 15, 2022 4:10 am
by gordonbb
toTOW wrote:At least you don't get the bad WUs it hosts : viewtopic.php?f=19&t=37675 ;)
I was getting those but just ignoring them for science’s sake. Figured it was some hastily prepared moonshot thing. :D

I noticed I am getting some WUs from that server with the preferences set to “Any” so either it worked out whatever grief it was going through or someone pushed the big red button.