Page 1 of 1

Protomol Project: 10045 (Run 194, Clone 0, Gen 93)

Posted: Sun Jan 02, 2011 8:24 pm
by John_Weatherman
Another Protomol WU bites the dust with the closedown error ... f@*k! :x

Code: Select all

[19:36:13] Completed 289700 out of 499375 steps (58%)
[19:44:28] Completed 294600 out of 499375 steps (58%)
[19:50:20] WARNING: Console control signal 5 on PID 652
[19:50:20] Service ignoring hangup/logoff signal.
[19:50:24] WARNING: Console control signal 5 on PID 652
[19:50:24] Service ignoring hangup/logoff signal.
[19:50:26] WARNING: Console control signal 6 on PID 652
[19:50:26] Exiting, please wait. . .
[19:50:32] GUI Server closing
[19:50:32] GUI Server exiting
[19:50:32] Folding@home Core Shutdown: INTERRUPTED
[19:50:36] CoreStatus = 66 (102)
[19:50:36] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (2)
[19:50:36] Killing all core threads

Folding@Home Client Shutdown.


--- Opening Log file [January 2 19:55:08 UTC] 


# Windows CPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Program Files\Folding1\Folding@home-Win32-x86-623
Service: C:\Program Files\Folding1\Folding@home-Win32-x86-623\Folding@home-Win32-x86.exe
Arguments: -svcstart -d C:\Program Files\Folding1\Folding@home-Win32-x86-623 -verbosity 9 -forceasm 

Launched as a service.
Entered C:\Program Files\Folding1\Folding@home-Win32-x86-623 to do work.

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[19:55:08] - Ask before connecting: No
[19:55:08] - User name: John_Weatherman (Team 48913)
[19:55:08] - User ID: 56FD91C778A0AD89
[19:55:08] - Machine ID: 1
[19:55:08] 
[19:55:08] Loaded queue successfully.
[19:55:08] 
[19:55:08] + Processing work unit
[19:55:08] Core required: FahCore_b4.exe
[19:55:08] Core found.
[19:55:08] - Autosending finished units... [January 2 19:55:08 UTC]
[19:55:08] Trying to send all finished work units
[19:55:08] + No unsent completed units remaining.
[19:55:08] - Autosend completed
[19:55:08] Working on queue slot 02 [January 2 19:55:08 UTC]
[19:55:08] + Working ...
[19:55:08] - Calling '.\FahCore_b4.exe -dir work/ -suffix 02 -checkpoint 10 -service -forceasm -verbose -lifeline 572 -version 623'

[19:55:43] *********************** Log Started 02/Jan/2011 19:55:42 ***********************
[19:55:43] ************************** ProtoMol Folding@Home Core **************************
[19:55:43]   Version: 25
[19:55:43]      Type: 180
[19:55:43]      Core: ProtoMol
[19:55:43]   Website: http://folding.stanford.edu/
[19:55:43] Copyright: (c) 2009 Stanford University
[19:55:43]    Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[19:55:43]      Args: -dir work/ -suffix 02 -checkpoint 10 -service -forceasm -verbose
[19:55:43]            -lifeline 572 -version 623
[19:55:43] ************************************ Build *************************************
[19:55:43]      Date: May 18 2010
[19:55:43]      Time: 23:43:52
[19:55:43]  Revision: 1819
[19:55:43]  Compiler: Intel(R) C++ MSVC 1500 mode 1110
[19:55:43]   Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[19:55:43]            /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[19:55:43]   Defines: _CRT_SECURE_NO_WARNINGS NDEBUG HAVE_GEEKINFO BOOST_ALL_NO_LIB
[19:55:43]            XML_STATIC HAVE_EXPAT HAVE_OPENSSL HAVE_LIBFAH HAVE_SIMTK_LAPACK
[19:55:43]  Platform: Windows XP
[19:55:43]      Bits: 32
[19:55:43]      Mode: Release
[19:55:43] ************************************ System ************************************
[19:55:43]        OS: Microsoft Windows XP Professional
[19:55:43]       CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz
[19:55:43]    CPU ID: GenuineIntel Family 15 Model 4 Stepping 1
[19:55:43]      CPUs: 2 Logical, 1 Physical
[19:55:43]    Memory: 1023 MB
[19:55:43]   Threads: Windows
[19:55:43] ********************************************************************************
[19:55:43] Project: 10045 (Run 194, Clone 0, Gen 93)
[19:55:43] Unit: 0x000000720001329c4c6190c000000c76
[19:55:43] User: 0x00000000000000000000000000000000
[19:55:43] Machine: 1
[19:55:43] Digital signatures verified
[19:55:51] GUI Server started
[19:55:51] Completed 298400 out of 499375 steps (59%)
[19:57:04] ERROR: ProtoMol ERROR: Corrupt DCD file. Size is 2935572, should be >= 2942124.
[19:57:04] Saving result file logfile_02.txt
[19:57:04] Saving result file checkpt
[19:57:04] Saving result file checkpt.crc
[19:57:04] Saving result file log.txt
[19:57:04] Saving result file protomol.conf
[19:57:04] Saving result file ww.6457.pos
[19:57:04] Saving result file ww.6457.vel
[19:57:04] Saving result file ww.dcd
[19:57:06] WARNING: While cleaning up: 0: Failed to remove directory '02': boost::filesystem::remove: The process cannot access the file because it is being used by another process: "02\ww.dcd"
[19:57:06] Folding@home Core Shutdown: BAD_WORK_UNIT
[19:57:08] CoreStatus = 72 (114)
[19:57:08] Sending work to server
[19:57:08] Project: 10045 (Run 194, Clone 0, Gen 93)
[19:57:08] - Read packet limit of 540015616... Set to 524286976.


[19:57:08] + Attempting to send results [January 2 19:57:08 UTC]
[19:57:08] - Reading file work/wuresults_02.dat from core
[19:57:08]   (Read 2506427 bytes from disk)
[19:57:08] Connecting to http://129.74.85.15:8080/
[19:57:36] Posted data.
[19:57:37] Initial: 0000; - Uploaded at ~84 kB/s
[19:57:37] - Averaged speed for that direction ~36 kB/s
[19:57:37] + Results successfully sent

Re: Protomol Project: 10045 (Run 194, Clone 0, Gen 93)

Posted: Sun Jan 02, 2011 10:58 pm
by bruce
What kind of a closedown was that? I see two signal 5's followed by a signal 6 over the course of 6 seconds. As I'm sure you already know, there's a known problem with protomol taking a "long" time to shut down, and this is probably the same problem.

Apparently ProtoMol tries to write a checkpoint when you shut it down. That does take some time. I don't know the details, so this is mostly educated guessing, but let's assume that protomol takes 10 seconds to finish writing files and shut down. Whatever was shutting FAH down (was it the OS?) asked nicely twice but it didn't shut down in the 4 or 5 seconds allowed so it was assumed to be hung and was unceremoniously killed after 6 seconds. The file it was writing at the time was incomplete so a restart was impossible.

Next time, try shutting down ProtoMol and giving it some extra time before it gets the signal 6. (This looks suspiciously like a Windows shutdown, so until the bug is fixed, you may need to shut down the client manually first.)

Code: Select all

   5,/*CTRL_LOGOFF_EVENT*/
   6,/*CTRL_SHUTDOWN_EVENT*/