Page 1 of 1

Project: 6882 (Run 146, Clone 0, Gen 11) - segfault

Posted: Wed Jul 13, 2011 3:32 pm
by ejs
This unit is segfaulting:
FahCore_78.exe[1373]: segfault at 17716cb0 ip 0000000008087a6c sp 00000000ff5fe3dc error 4 in FahCore_78.exe[8048000+322000]

I'm running fah 6.34 on Linux x86_64 (Fedora 15).

Log:

Code: Select all

[23:35:25] + Processing work unit
[23:35:25] Core required: FahCore_78.exe
[23:35:25] Core found.
[23:35:25] Working on queue slot 06 [July 12 23:35:25 UTC]
[23:35:25] + Working ...
[23:35:25] - Calling './FahCore_78.exe -dir work/ -nice 19 -suffix 06 -checkpoint 15 -verbose -lifeline 1365 -version 634'

[23:35:25] 
[23:35:25] *------------------------------*
[23:35:25] Folding@Home Gromacs Core
[23:35:25] Version 1.90 (March 8, 2006)
[23:35:25] 
[23:35:25] Preparing to commence simulation
[23:35:25] - Looking at optimizations...
[23:35:25] - Created dyn
[23:35:25] - Files status OK
[23:35:25] - Expanded 375038 -> 1804408 (decompressed 481.1 percent)
[23:35:25] - Starting from initial work packet
[23:35:25] 
[23:35:25] Project: 6882 (Run 146, Clone 0, Gen 11)
[23:35:25] 
[23:35:25] Assembly optimizations on if available.
[23:35:25] Entering M.D.
[23:35:31] Protein: ALZHEIMERS DISEASE AMYLOID
[23:35:31] 
[23:35:31] Writing local files
[23:35:40] Extra SSE boost OK.
[23:35:40] Writing local files
[23:35:40] Completed 0 out of 250000 steps  (0%)
[23:40:37] Writing local files
[23:40:37] Completed 2500 out of 250000 steps  (1%)
[23:45:32] Writing local files
[23:45:32] Completed 5000 out of 250000 steps  (2%)
[23:50:29] Writing local files
[23:50:29] Completed 7500 out of 250000 steps  (3%)
[23:55:25] Writing local files
[23:55:26] Completed 10000 out of 250000 steps  (4%)
[00:00:22] Writing local files
[00:00:22] Completed 12500 out of 250000 steps  (5%)
[00:05:19] Writing local files
[00:05:20] Completed 15000 out of 250000 steps  (6%)
[00:10:17] Writing local files
[00:10:17] Completed 17500 out of 250000 steps  (7%)
[00:14:19] CoreStatus = 0 (0)
[00:14:19] Sending work to server
[00:14:19] Project: 6882 (Run 146, Clone 0, Gen 11)
[00:14:19] - Error: Could not get length of results file work/wuresults_06.dat
[00:14:19] - Error: Could not read unit 06 file. Removing from queue.
[00:14:19] Trying to send all finished work units
[00:14:19] + No unsent completed units remaining.
[00:14:19] - Preparing to get new work unit...
[00:14:19] Cleaning up work directory
[00:14:19] + Attempting to get work packet
[00:14:19] Passkey found
[00:14:19] - Will indicate memory of 2883 MB
[00:14:19] - Connecting to assignment server
[00:14:19] Connecting to http://assign.stanford.edu:8080/
[00:14:23] Posted data.
[00:14:23] Initial: 43AB; - Successful: assigned to (171.67.108.33).
[00:14:23] + News From Folding@Home: Welcome to Folding@Home
[00:14:23] Loaded queue successfully.
[00:14:23] Sent data
[00:14:23] Connecting to http://171.67.108.33:8080/
[00:14:26] Posted data.
[00:14:26] Initial: 0000; - Receiving payload (expected size: 375550)
[00:14:33] - Downloaded at ~52 kB/s
[00:14:33] - Averaged speed for that direction ~159 kB/s
[00:14:33] + Received work.
[00:14:33] Trying to send all finished work units
[00:14:33] + No unsent completed units remaining.
[00:14:33] + Closed connections
[00:14:38] 
[00:14:38] + Processing work unit
[00:14:38] Core required: FahCore_78.exe
[00:14:38] Core found.
[00:14:38] Working on queue slot 07 [July 13 00:14:38 UTC]
[00:14:38] + Working ...
[00:14:38] - Calling './FahCore_78.exe -dir work/ -nice 19 -suffix 07 -checkpoint 15 -verbose -lifeline 1365 -version 634'

[00:14:38] 
[00:14:38] *------------------------------*
[00:14:38] Folding@Home Gromacs Core
[00:14:38] Version 1.90 (March 8, 2006)
[00:14:38] 
[00:14:38] Preparing to commence simulation
[00:14:38] - Ensuring status. Please wait.
[00:14:55] - Looking at optimizations...
[00:14:55] - Working with standard loops on this execution.
[00:14:55] - Created dyn
[00:14:55] - Files status OK
[00:14:55] - Expanded 375038 -> 1804408 (decompressed 481.1 percent)
[00:14:55] - Starting from initial work packet
[00:14:55] 
[00:14:55] Project: 6882 (Run 146, Clone 0, Gen 11)
[00:14:55] 
[00:14:55] Entering M.D.
[00:15:01] Protein: ALZHEIMERS DISEASE AMYLOID
[00:15:01] 
[00:15:01] Writing local files
[00:15:10] Writing local files
[00:15:10] Completed 0 out of 250000 steps  (0%)
[00:25:26] Writing local files
[00:25:26] Completed 2500 out of 250000 steps  (1%)
[00:27:26] CoreStatus = 0 (0)
[00:27:26] Sending work to server
[00:27:26] Project: 6882 (Run 146, Clone 0, Gen 11)
[00:27:26] - Error: Could not get length of results file work/wuresults_07.dat
[00:27:26] - Error: Could not read unit 07 file. Removing from queue.
[00:27:26] Trying to send all finished work units
[00:27:26] + No unsent completed units remaining.
[00:27:26] - Preparing to get new work unit...
[00:27:26] Cleaning up work directory
[00:27:26] + Attempting to get work packet
[00:27:26] Passkey found
[00:27:26] - Will indicate memory of 2883 MB
[00:27:26] - Connecting to assignment server
[00:27:26] Connecting to http://assign.stanford.edu:8080/
[00:27:27] Posted data.
[00:27:27] Initial: 43AB; - Successful: assigned to (171.67.108.33).
[00:27:27] + News From Folding@Home: Welcome to Folding@Home
[00:27:27] Loaded queue successfully.
[00:27:27] Sent data
[00:27:27] Connecting to http://171.67.108.33:8080/
[00:27:28] Posted data.
[00:27:28] Initial: 0000; - Receiving payload (expected size: 375550)
[00:27:31] - Downloaded at ~122 kB/s
[00:27:31] - Averaged speed for that direction ~151 kB/s
[00:27:31] + Received work.
[00:27:31] Trying to send all finished work units
[00:27:31] + No unsent completed units remaining.
[00:27:31] + Closed connections
[00:27:36] 
[00:27:36] + Processing work unit
[00:27:36] Core required: FahCore_78.exe
[00:27:36] Core found.
[00:27:36] Working on queue slot 08 [July 13 00:27:36 UTC]
[00:27:36] + Working ...
[00:27:36] - Calling './FahCore_78.exe -dir work/ -nice 19 -suffix 08 -checkpoint 15 -verbose -lifeline 1365 -version 634'

[00:27:36] 
[00:27:36] *------------------------------*
[00:27:36] Folding@Home Gromacs Core
[00:27:36] Version 1.90 (March 8, 2006)
[00:27:36] 
[00:27:36] Preparing to commence simulation
[00:27:36] - Ensuring status. Please wait.
[00:27:53] - Looking at optimizations...
[00:27:53] - Working with standard loops on this execution.
[00:27:53] - Created dyn
[00:27:53] - Files status OK
[00:27:53] - Expanded 375038 -> 1804408 (decompressed 481.1 percent)
[00:27:53] - Starting from initial work packet
[00:27:53] 
[00:27:53] Project: 6882 (Run 146, Clone 0, Gen 11)
[00:27:53] 
[00:27:53] Entering M.D.
[00:27:59] Protein: ALZHEIMERS DISEASE AMYLOID
[00:27:59] 
[00:27:59] Writing local files
[00:28:08] Writing local files
[00:28:08] Completed 0 out of 250000 steps  (0%)
[00:38:23] Writing local files
[00:38:23] Completed 2500 out of 250000 steps  (1%)
[00:40:24] CoreStatus = 0 (0)
[00:40:24] Sending work to server
[00:40:24] Project: 6882 (Run 146, Clone 0, Gen 11)
[00:40:24] - Error: Could not get length of results file work/wuresults_08.dat
[00:40:24] - Error: Could not read unit 08 file. Removing from queue.
It repeatedly segfaulted past 1% after it disables sse, I tried it one last time with -forceasm and it segfaulted after 7%, the same as the initial attempt.

This computer is just about capable of doing smp units, but I don't usually leave it on 24/7, so instead get through many normal units each week.

Re: Project: 6882 (Run 146, Clone 0, Gen 11) - segfault

Posted: Wed Jul 13, 2011 10:17 pm
by PantherX
There were 4 errors in the WU Database so I have marked it as a bad WU:
The WU (P6882,R146,C0,G11) has been reported as a bad WU.
Thanks for your report.