Project: 5734 (Run 3, Clone 574, Gen 122)

Moderators: Site Moderators, FAHC Science Team

Post Reply
rhavern
Posts: 425
Joined: Mon Dec 03, 2007 8:45 am
Location: UK

Project: 5734 (Run 3, Clone 574, Gen 122)

Post by rhavern »

Problems with this WU on an up until now stable dedicated folder, XP-32 SP3, 3850 as a service. Oddly note the incredibly long times between percent increments and near the end this error:
[18:05:25] Initial: 0000; - Error: Bad packet type from server, expected work assignment

Code: Select all

[12:17:30] Completed 63%
[12:22:14] Completed 64%
[14:05:44] mdrun_gpu returned 
[14:05:44] NANs detected on GPU
[14:05:44] 
[14:05:44] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:05:48] CoreStatus = 7A (122)
[14:05:48] Sending work to server
[14:05:48] Project: 5734 (Run 3, Clone 574, Gen 122)
[14:05:48] - Read packet limit of 540015616... Set to 524286976.
[14:05:48] - Error: Could not get length of results file work/wuresults_01.dat
[14:05:48] - Error: Could not read unit 01 file. Removing from queue.
<snip>
[14:30:09] Project: 5734 (Run 3, Clone 574, Gen 122)
[14:30:09] 
[14:30:09] Assembly optimizations on if available.
[14:30:09] Entering M.D.
[14:30:15] Will resume from checkpoint file
[14:30:15] Tpr hash work/wudata_02.tpr:  2150310785 104078462 3589952212 1106799434 3064008667
[14:30:15] Working on Protein
[14:30:16] Client config found, loading data.
[14:30:16] Starting GUI Server
[14:30:25] Resuming from checkpoint
[14:30:25] fcCheckPointResume: retreived and current tpr file hash:
[14:30:25]    0   2150310785   2150310785
[14:30:25]    1    104078462    104078462
[14:30:25]    2   3589952212   3589952212
[14:30:25]    3   1106799434   1106799434
[14:30:25]    4   3064008667   3064008667
[14:30:25] Verified work/wudata_02.log
[14:30:25] Verified work/wudata_02.edr
[14:30:25] Verified work/wudata_02.xtc
[14:35:00] Completed 1%
[16:05:19] mdrun_gpu returned 
[16:05:19] NANs detected on GPU
[16:05:19] 
[16:05:19] Folding@home Core Shutdown: UNSTABLE_MACHINE
[16:05:22] CoreStatus = 7A (122)
[16:05:22] Sending work to server
[16:05:22] Project: 5734 (Run 3, Clone 574, Gen 122)
[16:05:22] - Read packet limit of 540015616... Set to 524286976.
[16:05:22] - Error: Could not get length of results file work/wuresults_02.dat
[16:05:22] - Error: Could not read unit 02 file. Removing from queue.
<snip>
[16:05:31] Project: 5734 (Run 3, Clone 574, Gen 122)
[16:05:31] 
[16:05:31] Assembly optimizations on if available.
[16:05:31] Entering M.D.
[16:05:37] Tpr hash work/wudata_03.tpr:  2150310785 104078462 3589952212 1106799434 3064008667
[16:05:37] Working on Protein
[16:05:38] Client config found, loading data.
[16:05:38] Starting GUI Server
[19:05:44] mdrun_gpu returned 
[19:05:44] NANs detected on GPU
[19:05:44] 
[19:05:44] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:05:49] CoreStatus = 7A (122)
[19:05:49] Sending work to server
[19:05:49] Project: 5734 (Run 3, Clone 574, Gen 122)
[19:05:49] - Read packet limit of 540015616... Set to 524286976.
[19:05:49] - Error: Could not get length of results file work/wuresults_03.dat
[19:05:49] - Error: Could not read unit 03 file. Removing from queue.
<snip>
[19:05:53] Project: 5734 (Run 3, Clone 574, Gen 122)
[19:05:53] 
[19:05:53] Assembly optimizations on if available.
[19:05:53] Entering M.D.
[19:05:59] Tpr hash work/wudata_04.tpr:  2150310785 104078462 3589952212 1106799434 3064008667
[19:05:59] Working on Protein
[19:06:00] Client config found, loading data.
[19:06:00] Starting GUI Server
[20:28:08] - Autosending finished units... [January 13 20:28:08 UTC]
[20:28:08] Trying to send all finished work units
[20:28:08] + No unsent completed units remaining.
[20:28:08] - Autosend completed
[00:05:27] Completed 1%
[02:26:09] - Autosending finished units... [January 14 02:26:09 UTC]
[02:26:09] Trying to send all finished work units
[02:26:09] + No unsent completed units remaining.
[02:26:09] - Autosend completed
[04:06:04] Completed 2%
[08:06:27] Completed 3%
[08:24:04] - Autosending finished units... [January 14 08:24:04 UTC]
[08:24:04] Trying to send all finished work units
[08:24:04] + No unsent completed units remaining.
[08:24:04] - Autosend completed
[12:06:27] Completed 4%
[14:22:05] - Autosending finished units... [January 14 14:22:05 UTC]
[14:22:05] Trying to send all finished work units
[14:22:05] + No unsent completed units remaining.
[14:22:05] - Autosend completed
[15:05:34] mdrun_gpu returned 
[15:05:34] NANs detected on GPU
[15:05:34] 
[15:05:34] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:05:38] CoreStatus = 7A (122)
[15:05:38] Sending work to server
[15:05:38] Project: 5734 (Run 3, Clone 574, Gen 122)
[15:05:38] - Read packet limit of 540015616... Set to 524286976.
[15:05:38] - Error: Could not get length of results file work/wuresults_04.dat
[15:05:38] - Error: Could not read unit 04 file. Removing from queue.
<snip>
[15:59:20] Completed 4%
[16:03:48] Completed 5%
[18:05:20] mdrun_gpu returned 
[18:05:20] NANs detected on GPU
[18:05:20] 
[18:05:20] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:05:23] CoreStatus = 7A (122)
[18:05:23] Sending work to server
[18:05:23] Project: 5734 (Run 3, Clone 574, Gen 122)
[18:05:23] - Read packet limit of 540015616... Set to 524286976.
[18:05:23] - Error: Could not get length of results file work/wuresults_01.dat
[18:05:23] - Error: Could not read unit 01 file. Removing from queue.
[18:05:23] Trying to send all finished work units
[18:05:23] + No unsent completed units remaining.
[18:05:23] - Preparing to get new work unit...
[18:05:23] Cleaning up work directory
[18:05:23] + Attempting to get work packet
[18:05:23] Passkey found
[18:05:23] - Will indicate memory of 1023 MB
[18:05:23] Gpu type=1 species=2.
[18:05:23] - Detect CPU. Vendor: AuthenticAMD, Family: 6, Model: 10, Stepping: 0
[18:05:23] - Connecting to assignment server
[18:05:23] Connecting to http://assign-GPU.stanford.edu:8080/
[18:05:24] Posted data.
[18:05:24] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[18:05:24] + News From Folding@Home: Welcome to Folding@Home
[18:05:24] Loaded queue successfully.
[18:05:24] Gpu type=1 species=2.
[18:05:24] Sent data
[18:05:24] Connecting to http://171.64.65.102:8080/
[18:05:25] Posted data.
[18:05:25] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[18:05:25] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[18:05:31] + Attempting to get work packet
[18:05:31] Passkey found
[18:05:31] - Will indicate memory of 1023 MB
[18:05:31] Gpu type=1 species=2.
[18:05:31] - Connecting to assignment server
[18:05:31] Connecting to http://assign-GPU.stanford.edu:8080/
[18:05:35] Posted data.
[18:05:35] Initial: 40AB; - Successful: assigned to (171.64.65.102).

Folding since 1 WU=1 point
ImageImage
toTOW
Site Moderator
Posts: 6435
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 5734 (Run 3, Clone 574, Gen 122)

Post by toTOW »

There's one EUE in the DB and two successful runs.

(None were from rhavern.)
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply