I have a Core2 machine that has had no problems completing 2662's (see end of this posting for prior percent to deadline time remaining). However it is now stuck on a WU that won't complete, and I keep getting re-assigned the same WU.
model name : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz
cpu MHz : 2664.000
cache size : 4096 KB
Memory: 976.11 MB physical, 1.94 GB virtual
It crashed at 67% with a 0xff error on August 9th and restarted the same WU over.
Code: Select all
[08:04:28] Completed 812500 out of 1250000 steps (65%)
[09:03:08] Completed 825000 out of 1250000 steps (66%)
[10:01:49] Completed 837500 out of 1250000 steps (67%)
[10:59:15]
[10:59:15] Folding@home Core Shutdown: INTERRUPTED
[10:59:19] CoreStatus = FF (255)
[10:59:19] Client-core communications error: ERROR 0xff
[10:59:19] Deleting current work unit & continuing...
[10:59:27] - Warning: Could not delete all work unit files (1): Core file absent
[10:59:27] Trying to send all finished work units
[10:59:27] + No unsent completed units remaining.
[10:59:27] - Preparing to get new work unit...
[10:59:27] + Attempting to get work packet
[10:59:27] - Will indicate memory of 976 MB
[10:59:27] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 11
[10:59:27] - Connecting to assignment server
[10:59:27] Connecting to http://assign.stanford.edu:8080/
[10:59:28] Posted data.
[10:59:28] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[10:59:28] + News From Folding@Home: Welcome to Folding@Home
[10:59:28] Loaded queue successfully.
[10:59:28] Connecting to http://171.64.65.56:8080/
[10:59:33] Posted data.
[10:59:33] Initial: 0000; - Receiving payload (expected size: 5001321)
[11:00:01] - Downloaded at ~174 kB/s
[11:00:01] - Averaged speed for that direction ~172 kB/s
[11:00:01] + Received work.
[11:00:01] + Closed connections
[11:00:06]
[11:00:06] + Processing work unit
[11:00:06] Core required: FahCore_a2.exe
[11:00:06] Core found.
[11:00:06] Working on Unit 02 [August 9 11:00:06]
[11:00:06] + Working ...
-version 602'
[11:00:06] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 15 -verbose -lifeline 2756
[11:00:06]
[11:00:06] *------------------------------*
[11:00:06] Folding@Home Gromacs SMP Core
[11:00:06] Version 2.00 (Wed Jul 9 13:11:25 PDT 2008)
[11:00:06]
[11:00:06] Preparing to commence simulation
[11:00:06] - Ensuring status. Please wait.
[11:00:07] Called DecompressByteArray: compressed_data_size=5000809 data_size=24742709, decompressed_data_size=24742709 diff=0
[11:00:07] - Digital signature verified
[11:00:07]
[11:00:07] Project: 2662 (Run 1, Clone 173, Gen 8)
Code: Select all
[09:11:42] Completed 900000 out of 1250000 steps (72%)
[10:10:16] Completed 912500 out of 1250000 steps (73%)
[11:08:48] Completed 925000 out of 1250000 steps (74%)
[11:08:48] Unit 2's deadline (August 12 11:00) has passed.
[11:08:48] Going to interrupt core and move on to next unit...
[11:08:52] CoreStatus = FF (255)
[11:08:52] Client-core communications error: ERROR 0xff
[11:08:52] Deleting current work unit & continuing...
[11:23:52] - Autosending finished units...
[11:23:52] Trying to send all finished work units
[11:23:52] + No unsent completed units remaining.
[11:23:52] - Autosend completed
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
2 cores detected
--- Opening Log file [August 12 12:01:31]
# SMP Client ##################################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /root/fah6
Executable: ./fah6
Arguments: -advmethods -verbosity 9 -smp
[12:01:31] - Ask before connecting: No
[12:01:31] - User name: parkut (Team 4)
[12:01:31] - User ID: 7B76FF2E050086E6
[12:01:31] - Machine ID: 1
[12:01:31]
A potential conflict was detected:
Process 2756 is currently running and may also be a client with Mach. ID 1.
Program will now exit. Upon restart, this check will not be done --
you may wish to check that no client is currently running in
/root/fah6 before restarting.
Please press any key to exit.
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
2 cores detected
--- Opening Log file [August 12 12:08:53]
# SMP Client ##################################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /root/fah6
Executable: ./fah6
Arguments: -advmethods -verbosity 9 -smp
[12:08:53] - Ask before connecting: No
[12:08:53] - User name: parkut (Team 4)
[12:08:53] - User ID: 7B76FF2E050086E6
[12:08:53] - Machine ID: 1
[12:08:53]
[12:08:53] Could not open work queue, generating new queue...
[12:08:53] - Preparing to get new work unit...
[12:08:53] + Attempting to get work packet
[12:08:53] - Will indicate memory of 976 MB
[12:08:53] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 11
[12:08:53] - Connecting to assignment server
[12:08:53] Connecting to http://assign.stanford.edu:8080/
[12:08:53] - Autosending finished units...
[12:08:53] Trying to send all finished work units
[12:08:53] + No unsent completed units remaining.
[12:08:53] - Autosend completed
[12:08:53] Posted data.
[12:08:53] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[12:08:53] + News From Folding@Home: Welcome to Folding@Home
[12:08:53] Loaded queue successfully.
[12:08:53] Connecting to http://171.64.65.56:8080/
[12:08:58] Posted data.
[12:08:58] Initial: 0000; - Receiving payload (expected size: 5001321)
[12:09:26] - Downloaded at ~174 kB/s
[12:09:26] - Averaged speed for that direction ~174 kB/s
[12:09:26] + Received work.
[12:09:26] + Closed connections
[12:09:26]
[12:09:26] + Processing work unit
[12:09:26] Core required: FahCore_a2.exe
[12:09:26] Core found.
[12:09:26] Working on Unit 01 [August 12 12:09:26]
[12:09:26] + Working ...
-version 602'
[12:09:26] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 17679
[12:09:26]
[12:09:26] *------------------------------*
[12:09:26] Folding@Home Gromacs SMP Core
[12:09:26] Version 2.00 (Wed Jul 9 13:11:25 PDT 2008)
[12:09:26]
[12:09:26] Preparing to commence simulation
[12:09:26] - Ensuring status. Please wait.
[12:09:27] Called DecompressByteArray: compressed_data_size=5000809 data_size=24742709, decompressed_data_size=24742709 diff=0
[12:09:27] - Digital signature verified
[12:09:27]
[12:09:27] Project: 2662 (Run 1, Clone 173, Gen 8)
Today August 15th, it crashed again, with the same 0xff error, and the same deadline had passed notice. The clients failed to shut down, so I ended up needing to manually kill the cores and deleted the work folder contents and the queue.dat file.
Code: Select all
[10:34:19] Completed 900000 out of 1250000 steps (72%)
[11:33:03] Completed 912500 out of 1250000 steps (73%)
[12:08:53] - Autosending finished units...
[12:08:53] Trying to send all finished work units
[12:08:53] + No unsent completed units remaining.
[12:08:53] - Autosend completed
[12:31:47] Completed 925000 out of 1250000 steps (74%)
[12:31:47] Unit 1's deadline (August 15 12:09) has passed.
[12:31:47] Going to interrupt core and move on to next unit...
[12:31:51] CoreStatus = FF (255)
[12:31:51] Client-core communications error: ERROR 0xff
[12:31:51] Deleting current work unit & continuing...
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
2 cores detected
--- Opening Log file [August 15 13:01:31]
# SMP Client ##################################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /root/fah6
Executable: ./fah6
Arguments: -advmethods -verbosity 9 -smp
[13:01:31] - Ask before connecting: No
[13:01:31] - User name: parkut (Team 4)
[13:01:31] - User ID: 7B76FF2E050086E6
[13:01:31] - Machine ID: 1
[13:01:31]
A potential conflict was detected:
Process 17679 is currently running and may also be a client with Mach. ID 1.
Program will now exit. Upon restart, this check will not be done --
you may wish to check that no client is currently running in
/root/fah6 before restarting.
Please press any key to exit.
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
2 cores detected
--- Opening Log file [August 15 13:31:31]
# SMP Client ##################################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /root/fah6
Executable: ./fah6
Arguments: -advmethods -verbosity 9 -smp
[13:31:31] - Ask before connecting: No
[13:31:31] - User name: parkut (Team 4)
[13:31:31] - User ID: 7B76FF2E050086E6
[13:31:31] - Machine ID: 1
[13:31:31]
[13:31:31] Loaded queue successfully.
[13:31:31] Unit 1's deadline (August 15 12:09) has passed.
[13:56:47] ***** Got a SIGTERM signal (15)
[13:56:47] Killing all core threads
Folding@Home Client Shutdown.
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
2 cores detected
--- Opening Log file [August 15 13:58:05]
# SMP Client ##################################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /root/fah6
Executable: ./fah6
Arguments: -advmethods -verbosity 9 -smp
[13:58:05] - Ask before connecting: No
[13:58:05] - User name: parkut (Team 4)
[13:58:05] - User ID: 7B76FF2E050086E6
[13:58:05] - Machine ID: 1
[13:58:05]
[13:58:06] Loaded queue successfully.
[13:58:06] Unit 1's deadline (August 15 12:09) has passed.
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
2 cores detected
--- Opening Log file [August 15 14:01:04]
# SMP Client ##################################################################
###############################################################################
Folding@Home Client Version 6.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /root/fah6
Executable: ./fah6
Arguments: -advmethods -verbosity 9 -smp
[14:01:04] - Ask before connecting: No
[14:01:04] - User name: parkut (Team 4)
[14:01:04] - User ID: 7B76FF2E050086E6
[14:01:04] - Machine ID: 1
[14:01:04]
[14:01:04] Could not open work queue, generating new queue...
[14:01:04] - Preparing to get new work unit...
[14:01:04] + Attempting to get work packet
[14:01:04] - Will indicate memory of 976 MB
[14:01:04] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 11
[14:01:04] - Connecting to assignment server
[14:01:04] Connecting to http://assign.stanford.edu:8080/
[14:01:04] - Autosending finished units...
[14:01:04] Trying to send all finished work units
[14:01:04] + No unsent completed units remaining.
[14:01:04] - Autosend completed
[14:01:04] Posted data.
[14:01:04] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[14:01:04] + News From Folding@Home: Welcome to Folding@Home
[14:01:05] Loaded queue successfully.
[14:01:05] Connecting to http://171.64.65.56:8080/
[14:01:10] Posted data.
[14:01:10] Initial: 0000; - Receiving payload (expected size: 5001321)
[14:01:40] - Downloaded at ~162 kB/s
[14:01:40] - Averaged speed for that direction ~162 kB/s
[14:01:40] + Received work.
[14:01:40] + Closed connections
[14:01:40]
[14:01:40] + Processing work unit
[14:01:40] Core required: FahCore_a2.exe
[14:01:40] Core found.
[14:01:40] Working on Unit 01 [August 15 14:01:40]
[14:01:40] + Working ...
-version 602'
[14:01:40] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 16974
[14:01:40]
[14:01:40] *------------------------------*
[14:01:40] Folding@Home Gromacs SMP Core
[14:01:40] Version 2.00 (Wed Jul 9 13:11:25 PDT 2008)
[14:01:40]
[14:01:40] Preparing to commence simulation
[14:01:40] - Ensuring status. Please wait.
[14:01:41] Called DecompressByteArray: compressed_data_size=5000809 data_size=24742709, decompressed_data_size=24742709 diff=0
[14:01:41] - Digital signature verified
[14:01:41]
[14:01:41] Project: 2662 (Run 1, Clone 173, Gen 8)
[14:01:41]
[14:01:41] Assembly optimizations on if available.
[/code]
[16:19:36] Project: 2662 (Run 1, Clone 173, Gen 8)
[16:16:22] Unit 0 finished with 73 percent of time to deadline remaining.
[20:46:13] Project: 2662 (Run 1, Clone 395, Gen 3)
[20:42:49] Unit 9 finished with 73 percent of time to deadline remaining.
[01:09:23] Project: 2662 (Run 1, Clone 428, Gen 1)
[01:06:09] Unit 8 finished with 73 percent of time to deadline remaining.
[05:39:29] Project: 2662 (Run 1, Clone 315, Gen 1)
[05:36:15] Unit 7 finished with 73 percent of time to deadline remaining.
[10:03:25] Project: 2662 (Run 1, Clone 163, Gen 5)
[10:00:15] Unit 6 finished with 74 percent of time to deadline remaining.
[15:38:20] Project: 2662 (Run 0, Clone 328, Gen 0)
[15:35:12] Unit 5 finished with 73 percent of time to deadline remaining.
[20:05:21] Project: 2662 (Run 1, Clone 191, Gen 3)
[20:01:59] Unit 4 finished with 73 percent of time to deadline remaining.
[00:33:14] Project: 2662 (Run 1, Clone 305, Gen 0)
[00:30:08] Unit 3 finished with 74 percent of time to deadline remaining.
[06:01:28] Project: 2662 (Run 0, Clone 235, Gen 1)
[05:58:23] Unit 2 finished with 74 percent of time to deadline remaining.
[11:32:58] Project: 2662 (Run 0, Clone 141, Gen 0)
[11:29:51] Unit 1 finished with 74 percent of time to deadline remaining.
[/code]