Here's an odd one I haven't seen before here on my system. These generally run quite well here. In fact, this is the first abnormal result I've seen since ProtoMol started.
[08:07:53] Completed 0 out of 200000 steps (0%)
[08:10:07] WARNING: UnexpectedExitHandler triggered
[08:10:07] WARNING: Unexpected exit from science code
Unexpected tells me it didn't do something it was supposed to do.
From what I understand that is expected and when it happens with the new Version V21 it is supposed to do what it did and send the WU back to the server so you will get a different WU rather than keep getting that one over and over again. I looks like they got that bug fixed in V21.
Grandpa_01 wrote:From what I understand that is expected
Ending early is expected with ProtoMol based units but not ending early with errors like this one did. I have plenty that end early but I've never seen one end early with the errors this one did. Finishing with a CoreStatus of 7B is not normal.
To understand it fully, we need to identify several different components that make up the FAH system. Most of the time people break things up into two pieces -- the servers and the software on your PC, or three pieces -- the servers, the client, and a FahCore. To understand what's going on here we need to look one level deeper and split the FahCore into two separate logical pieces that are integrally combined before you ever see it.
Any FahCore is made up of code written mostly by Stanford and code written mostly by someone else. The Stanford developers can find and fix bugs in the code they wrote rather quickly but if there is a bug in it, but if in the code that somebody else wrote has an error, it will probably take longer to get it fixed. In this case, the message "Unexpected exit from science code" says that there was some kind of error in that other code. The Stanford code responds by reporting a CoreStatus = 7B (123) to the client. The client responds by sending an error report to the server, as it should, and the server gives you a new assignment.
Some of the other FAHcores respond differently to an error in the science code and this is the first example I've seen of doing it right. Other FAHcores make a different report to the client and the result (an undesirable one) is that you may have the same WU reassigned, producing the same error repeatedly.
Version 19 and 20 of ProtoMol were important developmental steps toward this solution, and I commend jcoffland for promptly moving to what appears to be an excellent solution for those unexpected problems that come up in the non-Stanford code.
Thanks Bruce, that helps me to better understand what Grandpa_01 was trying to say. I would agree that error handeling is greatly improved with v21. However, we should still report these unexpected errors as a possible bad WU, correct? This clearly was more than a simple ending early because no more computation was possible.