Project: 2669 - 2 Units any credit?
Posted: Wed Apr 29, 2009 6:19 am
At the end of last week I had two units from 2669 run and no credit showed up for either. One ran to 49%, and aborted with a fatal error. Console message was "Fatal error in MPI_Sendrecv: Error message texts are not available", which was not included in the log file. The second ran to 100%, then ran into an error transmitting the completed unit. I ran qfix and then did a -send all to see if any partial results would go out. Any records on these? In any case, qfix has left a clean queue, I have a new unit running on that machine.
Thanks for any info
1st: Project: 2669 (Run 8, Clone 150, Gen 122)
2nd: Project: 2669 (Run 9, Clone 109, Gen 75)
Thanks for any info
1st: Project: 2669 (Run 8, Clone 150, Gen 122)
Code: Select all
--- Opening Log file [April 24 05:05:37 UTC]
# Mac OS X SMP Console Edition ################################################
###############################################################################
Folding@Home Client Version 6.24R1
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /Users/jheimann/Library/Folding@home
Executable: ./fah6
Arguments: -oneunit
[05:05:37] - Ask before connecting: No
[05:05:37] - User name: Joe_H (Team 38910)
[05:05:37] - User ID: 74A191A36AA6AED3
[05:05:37] - Machine ID: 1
[05:05:37]
[05:05:37] Loaded queue successfully.
[05:05:37] - Preparing to get new work unit...
[05:05:37] Cleaning up work directory
[05:05:37] + Attempting to get work packet
[05:05:37] - Connecting to assignment server
[05:05:37] - Successful: assigned to (171.64.65.56).
[05:05:37] + News From Folding@Home: Welcome to Folding@Home
[05:05:37] Loaded queue successfully.
[05:06:03] + Closed connections
[05:06:03]
[05:06:03] + Processing work unit
[05:06:03] At least 4 processors must be requested; read 1.
[05:06:03] Core required: FahCore_a2.exe
[05:06:03] Core found.
[05:06:03] Working on queue slot 06 [April 24 05:06:03 UTC]
[05:06:03] + Working ...
[05:06:03]
[05:06:03] *------------------------------*
[05:06:03] Folding@Home Gromacs SMP Core
[05:06:03] Version 2.06 (Mon Mar 30 18:46:18 PDT 2009)
[05:06:03]
[05:06:03] Preparing to commence simulation
[05:06:03] - Ensuring status. Please wait.
[05:06:03] Files status OK
[05:06:03] Need version 207
[05:06:03] Error: Work unit read from disk is invalid
[05:06:03]
[05:06:03] Folding@home Core Shutdown: CORE_OUTDATED
[05:06:08] CoreStatus = 6E (110)
[05:06:08] + Core out of date. Auto updating...
[05:06:08] - Attempting to download new core...
[05:06:08] + Downloading new core: FahCore_a2.exe
[05:06:09] + 10240 bytes downloaded
[05:06:09] + 20480 bytes downloaded
...
[05:06:14] + 1495040 bytes downloaded
[05:06:14] + 1505280 bytes downloaded
[05:06:14] + 1512001 bytes downloaded
[05:06:14] Verifying core Core_a2.fah...
[05:06:14] Signature is VALID
[05:06:14]
[05:06:14] Trying to unzip core FahCore_a2.exe
[05:06:15] Decompressed FahCore_a2.exe (4631828 bytes) successfully
[05:06:15] + Core successfully engaged
[05:06:20]
[05:06:20] + Processing work unit
[05:06:20] At least 4 processors must be requested; read 1.
[05:06:20] Core required: FahCore_a2.exe
[05:06:20] Core found.
[05:06:20] Working on queue slot 06 [April 24 05:06:20 UTC]
[05:06:20] + Working ...
[05:06:20]
[05:06:20] *------------------------------*
[05:06:20] Folding@Home Gromacs SMP Core
[05:06:20] Version 2.07 (Sun Apr 19 14:29:51 PDT 2009)
[05:06:20]
[05:06:20] Preparing to commence simulation
[05:06:20] - Ensuring status. Please wait.
[05:06:29] - Looking at optimizations...
[05:06:29] - Working with standard loops on this execution.
[05:06:29] - Files status OK
[05:06:32] - Expanded 4829574 -> 23976217 (decompressed 496.4 percent)
[05:06:32] Called DecompressByteArray: compressed_data_size=4829574 data_size=23976217, decompressed_data_size=23976217 diff=0
[05:06:32] - Digital signature verified
[05:06:32]
[05:06:32] Project: 2669 (Run 8, Clone 150, Gen 122)
[05:06:32]
[05:06:32] Entering M.D.
[05:06:43] Completed 0 out of 250000 steps (0%)
[05:21:16] Completed 2500 out of 250000 steps (1%)
[05:35:47] Completed 5000 out of 250000 steps (2%)
[05:50:16] Completed 7500 out of 250000 steps (3%)
...
[16:43:17] Completed 120000 out of 250000 steps (48%)
[16:57:49] Completed 122500 out of 250000 steps (49%)
[17:08:58] CoreStatus = 1 (1)
[17:08:58] Sending work to server
[17:08:58] Project: 2669 (Run 8, Clone 150, Gen 122)
[17:08:58] - Error: Could not get length of results file work/wuresults_06.dat
[17:08:58] - Error: Could not read unit 06 file. Removing from queue.
[17:08:58] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[17:08:58] Cleaning up work directory
Folding@Home Client Shutdown.
Code: Select all
--- Opening Log file [April 24 17:45:07 UTC]
# Mac OS X SMP Console Edition ################################################
###############################################################################
Folding@Home Client Version 6.24R1
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /Users/jheimann/Library/Folding@home
Executable: ./fah6
Arguments: -queue
[17:45:07] - Ask before connecting: No
[17:45:07] - User name: Joe_H (Team 38910)
[17:45:07] - User ID: 74A191A36AA6AED3
[17:45:07] - Machine ID: 1
[17:45:07]
[17:45:07] Loaded queue successfully.
[17:45:07]
[17:45:07] + Processing work unit
[17:45:07] At least 4 processors must be requested; read 1.
[17:45:07] Core required: FahCore_a2.exe
[17:45:07] Core found.
[17:45:07] Working on queue slot 07 [April 24 17:45:07 UTC]
[17:45:07] + Working ...
[17:45:07]
[17:45:07] *------------------------------*
[17:45:07] Folding@Home Gromacs SMP Core
[17:45:07] Version 2.07 (Sun Apr 19 14:29:51 PDT 2009)
[17:45:07]
[17:45:07] Preparing to commence simulation
[17:45:07] - Ensuring status. Please wait.
[17:45:17] - Looking at optimizations...
[17:45:17] - Working with standard loops on this execution.
[17:45:17] - Files status OK
[17:45:20] - Expanded 4839370 -> 23982265 (decompressed 495.5 percent)
[17:45:20] Called DecompressByteArray: compressed_data_size=4839370 data_size=23982265, decompressed_data_size=23982265 diff=0
[17:45:21] - Digital signature verified
[17:45:21]
[17:45:21] Project: 2669 (Run 9, Clone 109, Gen 75)
[17:45:21]
[17:45:22] Entering M.D.
[17:45:28] Using Gromacs checkpoints
[17:45:35] Resuming from checkpoint
[17:45:36] Verified work/wudata_07.log
[17:45:36] Verified work/wudata_07.trr
[17:45:36] Verified work/wudata_07.xtc
[17:45:36] Verified work/wudata_07.edr
[17:46:10] Completed 2500 out of 250000 steps (1%)
[18:00:39] Completed 5000 out of 250000 steps (2%)
[18:15:11] Completed 7500 out of 250000 steps (3%)
...
[17:12:59] Completed 245000 out of 250000 steps (98%)
[17:27:28] Completed 247500 out of 250000 steps (99%)
[17:41:58] Completed 250000 out of 250000 steps (100%)
[17:42:00] DynamicWrapper: Finished Work Unit: sleep=10000
[17:42:10]
[17:42:10] Finished Work Unit:
[17:42:10] - Reading up to 21122496 from "work/wudata_07.trr": Read 21122496
[17:42:10] trr file hash check passed.
[17:42:10] - Reading up to 4392612 from "work/wudata_07.xtc": Read 4392612
[17:42:10] xtc file hash check passed.
[17:42:10] - Checksum of file (work/wudata_07.edr) read from disk doesn't match
[17:42:10]
[17:42:10] Folding@home Core Shutdown: FILE_IO_ERROR
[17:45:35] CoreStatus = 64 (100)
[17:45:35] Sending work to server
[17:45:35] Project: 2669 (Run 9, Clone 109, Gen 75)
[17:45:35] - Error: Could not get length of results file work/wuresults_07.dat
[17:45:35] - Error: Could not read unit 07 file. Removing from queue.
[17:45:35] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[17:45:35] Cleaning up work directory
Folding@Home Client Shutdown.