Have been working on this WU for about a day and a half and it is just finished the 5% frame. Increase calculates about 22 more days to finish. That doesn't quite cut it for a 3 Day SMP WU. Most WUs that I have received take about 20 minutes per %. This one is taking about 5.9 hours. I have restarted Terminal, Terminal and Client, and all of the above with a machine reboot thrown in. Nothing changes the folding rate, but it is folding. It is not hung. Running on an Intel Mac MIni-(1.83 Ghz).
It will never make due date. Should I trash it????
Sure seem to be a lot of slow folding WUs being reported in the SMP area. What's going on??
Project 2675: (Run 3, Clone 125, Gen 37)
Moderators: Site Moderators, FAHC Science Team
Project 2675: (Run 3, Clone 125, Gen 37)
What is past is prologue!
-
- Posts: 522
- Joined: Mon Dec 03, 2007 4:33 am
- Location: Australia
Re: Project 2675: (Run 3, Clone 125, Gen 37)
Please post the relevant sections of the log. If there is a problem with the WU, the mods/ PG can remove it from circulation.
Re: Project 2675: (Run 3, Clone 125, Gen 37)
Yes, please post the log. We have a checkpoint error that's been causing WU problems. A fix is in the early stages of testing right now.
Re: Project 2675: (Run 3, Clone 125, Gen 37)
I will attempt to post the log. The problem WU is on a different machine than the one I am currently using. I am having serious problems between ComCast and ISP in maintaining a connection long enough to do anything of consequence. Due date for the WU is tomorrow, 1/24/2009. I assume I will not be able to make that and therefore the WU is lost.
What is past is prologue!
Re: Project 2675: (Run 3, Clone 125, Gen 37)
Here is the Log for the subject WU (P2675 R3 C125 G37)
I notice that I will pass due date (1/24/2009 16:43:07 UTC) very soon and the WU has just passed 12 % completion.
I would appreciate some wise counsel ASAP as to the proper disposition of the carcass. No point folding on dead meat and I'm sure PG would prefer to resolve the issue if this is other than a problem that is local to me only.
TIA
Edit Addition: I notice that the log file is not "verbose". As you can see in the log file I have verbosity flag set at 9. The Client is writing all the 15 minute check points to the Terminal Window but it is not getting recorded in the Log File. I do not believe that I have ever seen that anomaly before. Has something changed in the coding????
2nd Edit Addition: I did a little checking on the verbosity question. Other Log Files that I can find on the Intel Macs that are running v6.23 have not been writing the 15 minute checkpoints to the Log File. However, I also have a G4 PPC Mac folding using v5.02 and the Log Files on that machine do have the 15 minute checkpoints recorded. It seems logical that if you want to be in a "Verbose" mode you would want to record everything that the Client writes to the Terminal Window. FAHv6.23 is not performing that way for me when the verbosity flag is set to 9. Why??? ? Is other information of interest also not being recorded???
Code: Select all
--- Opening Log file [January 21 17:27:03 UTC]
# Mac OS X SMP Console Edition ################################################
###############################################################################
Folding@Home Client Version 6.23 Beta R1
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /Users/tedkreuserIII/Library/Folding@home
Executable: ./fah6
Arguments: -smp -local -verbosity 9
[17:27:03] - Ask before connecting: Yes
[17:27:03] - User name: Aardvark (Team 48057)
[17:27:03] - User ID: 7B68D95E256C0686
[17:27:03] - Machine ID: 1
[17:27:03]
[17:27:03] Loaded queue successfully.
[17:27:03] - Preparing to get new work unit...
[17:27:03] - Autosending finished units... [January 21 17:27:03 UTC]
[17:27:03] Trying to send all finished work units
[17:27:03] + No unsent completed units remaining.
[17:27:03] - Autosend completed
[17:27:04] > Press "c" to connect to the server to download unit
[17:27:07] - Establishing connection
[17:27:10] + Attempting to get work packet
[17:27:10] - Will indicate memory of 1024 MB
[17:27:10] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 2
[17:27:10] - Connecting to assignment server
[17:27:10] Connecting to http://assign.stanford.edu:8080/
[17:27:13] Posted data.
[17:27:13] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[17:27:13] + News From Folding@Home: Welcome to Folding@Home
[17:27:13] Loaded queue successfully.
[17:27:13] Connecting to http://171.64.65.56:8080/
[17:27:20] Posted data.
[17:27:20] Initial: 0000; - Receiving payload (expected size: 4841712)
[17:45:07] - Downloaded at ~4 kB/s
[17:45:07] - Averaged speed for that direction ~4 kB/s
[17:45:07] + Received work.
[17:45:07] + Connections closed: You may now disconnect
[17:45:07]
[17:45:07] + Processing work unit
[17:45:07] At least 4 processors must be requested.Core required: FahCore_a2.exe
[17:45:07] Core found.
[17:45:07] - Using generic ./mpiexec
[17:45:07] Working on queue slot 01 [January 21 17:45:07 UTC]
[17:45:07] + Working ...
[17:45:07] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 4811 -version 623'
[17:45:08]
[17:45:08] *------------------------------*
[17:45:08] Folding@Home Gromacs SMP Core
[17:45:08] Version 2.01 (Wed Jul 16 08:26:53 PDT 2008)
[17:45:08]
[17:45:08] Preparing to commence simulation
[17:45:08] - Ensuring status. Please wait.
[17:45:08] Working with standard loops on this execution.
[17:45:08] - Files status OK
[17:45:09] - Expanded 4841200 -> 2399ssByteArray: compressed_data_size=4841200 data_size=23994061, decompressed_data_size=23994061 diff=0
[17:45:10] 23994061, decompressed_data_size=23994061 diff=0
[17:45:10] - Digital signature verified
[17:45:10]
[17:45:10] Project: 2675 (Run 3, Clone 125, Gen 37)
[17:45:10]
[17:45:10] Entering M.D.
[17:45:19] one 125, Gen 37)
[17:45:19]
[17:45:20] Entering M.D.
[17:45:26] Node 3 initialized
[17:52:11] ***** Got an Activate signal (2)
[17:52:11] Killing all core threads
Folding@Home Client Shutdown.
--- Opening Log file [January 21 17:54:11 UTC]
# Mac OS X SMP Console Edition ################################################
###############################################################################
Folding@Home Client Version 6.23 Beta R1
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /Users/tedkreuserIII/Library/Folding@home
Executable: ./fah6
Arguments: -smp -local -verbosity 9
[17:54:11] - Ask before connecting: Yes
[17:54:11] - User name: Aardvark (Team 48057)
[17:54:11] - User ID: 7B68D95E256C0686
[17:54:11] - Machine ID: 1
[17:54:11]
[17:54:11] Loaded queue successfully.
[17:54:11]
[17:54:11] + Processing work unit
[17:54:11] - Autosending finished units... [January 21 17:54:11 UTC]
[17:54:11] At least 4 processors must be requested.[17:54:11] Trying to send all finished work units
Core required: FahCore_a2.exe
[17:54:11] + No unsent completed units remaining.
[17:54:11] - Autosend completed
[17:54:11] Core found.
[17:54:11] - Using generic ./mpiexec
[17:54:11] Working on queue slot 01 [January 21 17:54:11 UTC]
[17:54:11] + Working ...
[17:54:11] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 4832 -version 623'
[17:54:12]
[17:54:12] *------------------------------*
[17:54:12] Folding@Home Gromacs SMP Core
[17:54:12] Version 2.01 (Wed Jul 16 08:26:53 PDT 2008)
[17:54:12]
[17:54:12] Preparing to commence simulation
[17:54:12] - Ensuring status. Please wait.
[17:54:13] Called DecompressByteArray: compressed_data_size=4841200 data_size=23994061, decompressed_data_size=23994061 diff=0
[17:54:14] - Digital signature verified
[17:54:14]
[17:54:14] Project: 2675 (Run 3, Clone 125, Gen 37)
[17:54:14]
[17:54:14] Assembly optimizations on if available.
[17:54:14] Entering M.D.
[17:54:24] Run 3, Clone 125, Gen 37)
[17:54:24]
[17:54:24] Entering M.D.
[17:54:31] Node 3 initialized
[23:07:20] )
[04:20:07] Completed 80009 out of 4000000 steps (2%)
[06:56:39] ***** Got an Activate signal (2)
[06:56:39] Killing all core threads
Folding@Home Client Shutdown.
--- Opening Log file [January 22 07:00:37 UTC]
# Mac OS X SMP Console Edition ################################################
###############################################################################
Folding@Home Client Version 6.23 Beta R1
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /Users/tedkreuserIII/Library/Folding@home
Executable: ./fah6
Arguments: -smp -local -verbosity 9
[07:00:37] - Ask before connecting: Yes
[07:00:37] - User name: Aardvark (Team 48057)
[07:00:37] - User ID: 7B68D95E256C0686
[07:00:37] - Machine ID: 1
[07:00:37]
[07:00:37] Loaded queue successfully.
[07:00:37]
[07:00:37] + Processing work unit
[07:00:37] - Autosending finished units... [January 22 07:00:37 UTC]
[07:00:37] At least 4 processors must be requested.[07:00:37] Trying to send all finished work units
Core required: FahCore_a2.exe
[07:00:37] + No unsent completed units remaining.
[07:00:37] - Autosend completed
[07:00:37] Core found.
[07:00:37] - Using generic ./mpiexec
[07:00:37] Working on queue slot 01 [January 22 07:00:37 UTC]
[07:00:37] + Working ...
[07:00:37] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 5005 -version 623'
[07:00:37]
[07:00:37] *------------------------------*
[07:00:37] Folding@Home Gromacs SMP Core
[07:00:37] Version 2.01 (Wed Jul 16 08:26:53 PDT 2008)
[07:00:37]
[07:00:37] Preparing to commence simulation
[07:00:37] - Ensuring status. Please wait.
[07:00:37] Files status OK
[07:00:39] - Expanded 4841200 -> 23994061 (decompressed 495.6 percent)
[07:00:39] Called DecompressByteArray: compressed_data_size=4841200 data_size=23994061, decompressed_data_size=23994061 diff=0
[07:00:39] - Digital signature verified
[07:00:39]
[07:00:39] Project: 2675 (Run 3, Clone 125, Gen 37)
[07:00:39]
[07:00:40] Assembly optimizations on if available.
[07:00:40] Entering M.D.
[07:00:46] Will resume from checkpoint file
[07:00:50] ng M.D.
[07:00:56] Will resume from checkpoint file
[07:00:56] Node 3 initialized
[07:00:59] data_01.log
[07:00:59] Verified work/wudata_01.trr
[07:00:59] Verified work/wudata_01.xtc
[07:00:59] Verified work/wudata_01.edr
[07:00:59] Completed 80019 out of 4000000 steps (2%)
[12:13:07] Completed 120009 out of 4000000 steps (3%)
[13:54:54] ***** Got an Activate signal (2)
[13:54:54] Killing all core threads
Folding@Home Client Shutdown.
--- Opening Log file [January 22 14:22:56 UTC]
# Mac OS X SMP Console Edition ################################################
###############################################################################
Folding@Home Client Version 6.23 Beta R1
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /Users/tedkreuserIII/Library/Folding@home
Executable: ./fah6
Arguments: -smp -local -verbosity 9
[14:22:56] - Ask before connecting: Yes
[14:22:56] - User name: Aardvark (Team 48057)
[14:22:56] - User ID: 7B68D95E256C0686
[14:22:56] - Machine ID: 1
[14:22:56]
[14:22:57] Loaded queue successfully.
[14:22:57]
[14:22:57] + Processing work unit
[14:22:57] At least 4 processors must be requested.Core required: FahCore_a2.exe
- Autosending finished units... [14:22:57]
[14:22:57] Core found.
[14:22:57] - Using generic ./mpiexec
[14:22:57] Trying to send all finished work units
[14:22:57] + No unsent completed units remaining.
[14:22:57] - Autosend completed
[14:22:57] Working on queue slot 01 [January 22 14:22:57 UTC]
[14:22:57] + Working ...
[14:22:57] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 222 -version 623'
[14:22:57]
[14:22:57] *------------------------------*
[14:22:57] Folding@Home Gromacs SMP Core
[14:22:57] Version 2.01 (Wed Jul 16 08:26:53 PDT 2008)
[14:22:57]
[14:22:57] Preparing to commence simulation
[14:22:57] - Looking at optimizations...
[14:22:57] - Working with standard loops on this execution.
[14:22:57] - Files status OK
[14:23:00] - Expanded Called DecompressByteArray: compressed_data_size=Called DecompressByteArray: compressed_data_size=4841200 data_size=23994061, decompressed_data_size=23994061 diff=0
[14:23:01] - Digital signature verified
[14:23:01]
[14:23:01] Project: 2675 (Run 3, Clone 125, Gen 37)
[14:23:01]
[14:23:01] M.D.
[14:23:01] ing M.D.
[14:23:07] me from checkpoint file
[14:23:07] int file
[14:23:08] itialized
[14:23:08] tialized
[14:23:11] Resuming from checkpoint
[14:23:11] Verified work/wudata_01.log
[14:23:11] Verified work/wudata_01.trr
[14:23:12] Verified work/wudata_01.xtc
[14:23:12] Verified work/wudata_01.edr
[14:23:12] Completed 120019 out of 4000000 steps (3%)
[19:37:07] Completed 160009 out of 4000000 steps (4%)
[00:49:04] Completed 200009 out of 4000000 steps (5%)
[06:00:55] Completed 240009 out of 4000000 steps (6%)
[11:12:55] Completed 280009 out of 4000000 steps (7%)
[16:24:50] Completed 320009 out of 4000000 steps (8%)
[19:44:47] ***** Got an Activate signal (2)
[19:44:47] Killing all core threads
Folding@Home Client Shutdown.
--- Opening Log file [January 23 19:52:22 UTC]
# Mac OS X SMP Console Edition ################################################
###############################################################################
Folding@Home Client Version 6.23 Beta R1
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /Users/tedkreuserIII/Library/Folding@home
Executable: ./fah6
Arguments: -smp -local -verbosity 9
[19:52:22] - Ask before connecting: Yes
[19:52:22] - User name: Aardvark (Team 48057)
[19:52:22] - User ID: 7B68D95E256C0686
[19:52:22] - Machine ID: 1
[19:52:22]
[19:52:22] Loaded queue successfully.
[19:52:22]
[19:52:22] + Processing work unit
[19:52:22] At least 4 processors must be requested.[19:52:22] - Autosending finished units... [19:52:22]
Core required: FahCore_a2.exe
[19:52:22] Trying to send all finished work units
[19:52:22] + No unsent completed units remaining.
[19:52:22] - Autosend completed
[19:52:22] Core found.
[19:52:22] - Using generic ./mpiexec
[19:52:22] Working on queue slot 01 [January 23 19:52:22 UTC]
[19:52:22] + Working ...
[19:52:22] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 673 -version 623'
[19:52:22]
[19:52:22] *------------------------------*
[19:52:22] Folding@Home Gromacs SMP Core
[19:52:22] Version 2.01 (Wed Jul 16 08:26:53 PDT 2008)
[19:52:22]
[19:52:22] Preparing to commence simulation
[19:52:22] - Ensuring status. Please wait.
[19:52:22] Files status OK
[19:52:24] - Expanded 4841200 -> 23994061 (decompressed 495.6 percent)
[19:52:24] Called DecompressByteArray: compressed_data_size=4841200 data_size=23994061, decompressed_data_size=23994061 diff=0
[19:52:24] - Digital signature verified
[19:52:24]
[19:52:24] Project: 2675 (Run 3, Clone 125, Gen 37)
[19:52:24]
[19:52:25] Assembly optimizations on if available.
[19:52:25] Entering M.D.
[19:52:31] Will resume from checkpoint file
[19:52:34] ng M.D.
[19:52:40] Will resume from checkpoint file
[19:52:41] Node 1 initialized
[19:52:44] data_01.log
[19:52:45] Verified work/wudata_01.trr
[19:52:45] Verified work/wudata_01.xtc
[19:52:45] Verified work/wudata_01.edr
[19:52:45] Completed 320019 out of 4000000 steps (8%)
[01:05:08] Completed 360009 out of 4000000 steps (9%)
[06:17:29] Completed 400009 out of 4000000 steps (10%)
[11:30:07] Completed 440009 out of 4000000 steps (11%)
I would appreciate some wise counsel ASAP as to the proper disposition of the carcass. No point folding on dead meat and I'm sure PG would prefer to resolve the issue if this is other than a problem that is local to me only.
TIA
Edit Addition: I notice that the log file is not "verbose". As you can see in the log file I have verbosity flag set at 9. The Client is writing all the 15 minute check points to the Terminal Window but it is not getting recorded in the Log File. I do not believe that I have ever seen that anomaly before. Has something changed in the coding????
2nd Edit Addition: I did a little checking on the verbosity question. Other Log Files that I can find on the Intel Macs that are running v6.23 have not been writing the 15 minute checkpoints to the Log File. However, I also have a G4 PPC Mac folding using v5.02 and the Log Files on that machine do have the 15 minute checkpoints recorded. It seems logical that if you want to be in a "Verbose" mode you would want to record everything that the Client writes to the Terminal Window. FAHv6.23 is not performing that way for me when the verbosity flag is set to 9. Why??? ? Is other information of interest also not being recorded???
What is past is prologue!
Re: Project 2675: (Run 3, Clone 125, Gen 37)
The Pande Group has been working on a solution to this sort of problem but it is still in the early phases of testing. Your only choice is to discard the WU and move on to a new assignment.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Project 2675: (Run 3, Clone 125, Gen 37)
Thanks for the quick advice, Bruce. I was proceeding to remove the WU and found that the System had resolved it's own problem. I found the following in the Terminal Window:
I hope a solution is soon available for this problem. In hindsight, I guess it is just not very wise to play these slow WUs for 3 days hoping it will heal itself.
Code: Select all
Writing checkpoint, step 6016480 at Sat Jan 24 15:37:46 2009
Writing checkpoint, step 6018380 at Sat Jan 24 15:52:45 2009
[22:05:34] Completed 520009 out of 4000000 steps (13%)
[22:05:34] Unit 1's deadline (January 24 17:45) has passed.
[22:05:34] Going to interrupt core and move on to next unit...
-------------------------------------------------------
P
What is past is prologue!
Re: Project 2675: (Run 3, Clone 125, Gen 37)
We stopped this WU on the server. Thanks for the report and sorry for the hassle.