Page 1 of 1

Project: 3064 (Run 3, Clone 88, Gen 31) segfault@41% x2[fix]

Posted: Tue May 20, 2008 4:23 am
by tear
Hello and Welcome!

Per subject line. Failed twice at the same point [got new assignment
afterwards], terminal excerpts follow.
Didn't have a chance to re-start it in the middle of simulation this time
around; sorry.
[16:34:07] Writing local files
[16:34:07] Completed 2000000 out of 5000000 steps (40 percent)
[16:46:23] Writing local files
[16:46:23] Completed 2050000 out of 5000000 steps (41 percent)
[16:57:33] Warning: long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[16:57:37] CoreStatus = 0 (0)
[16:57:37] Client-core communications error: ERROR 0x0
[16:57:37] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 18
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[17:02:03] - Warning: Could not delete all work unit files (6): Core returned invalid code
[17:02:03] Trying to send all finished work units
[17:02:03] + No unsent completed units remaining.
[17:02:03] - Preparing to get new work unit...
[17:02:03] + Attempting to get work packet
[01:12:31] Writing local files
[01:12:31] Completed 2000000 out of 5000000 steps (40 percent)
[01:24:45] Writing local files
[01:24:45] Completed 2050000 out of 5000000 steps (41 percent)
[01:35:55] Warning: long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[01:35:59] CoreStatus = 0 (0)
[01:35:59] Client-core communications error: ERROR 0x0
[01:35:59] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 18
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[01:40:25] - Warning: Could not delete all work unit files (7): Core returned invalid code
[01:40:25] Trying to send all finished work units
[01:40:25] + No unsent completed units remaining.
[01:40:25] - Preparing to get new work unit...
[01:40:25] + Attempting to get work packet

Thank you and good night!
tear

Re: Project: 3064 (Run 3, Clone 88, Gen 31), segfault@41% x2

Posted: Wed May 21, 2008 3:16 pm
by tear
Hmmm, interesting. The client received P2605 after second failure, completed it
and got P3064 (3,88,31) again [sic!]. Will interrupt it @10%.

Re: Project: 3064 (Run 3, Clone 88, Gen 31), segfault@41% x2

Posted: Thu May 22, 2008 5:46 am
by tear
FYI.

Restarted @9%; unit has surpassed 41% mark, currently @71% and folding.

Re: Project: 3064 (Run 3, Clone 88, Gen 31), segfault@41% x2

Posted: Thu May 22, 2008 9:15 pm
by ppetrone
Cool!
Thanks for the report & let's see what happens.

paula

Re: Project: 3064 (Run 3, Clone 88, Gen 31), segfault@41% x2

Posted: Thu May 22, 2008 9:28 pm
by tear
Hey Paula,


FWIW, unit was completed successfully. Let me know if you need any additional information.


Cheers,
tear

Re: Project: 3064 (R3, C88, G31), segfault@41% x2[Resolved]

Posted: Fri May 23, 2008 1:14 am
by tear
Resolved.. well, hmm... it depends on what you call a resolution :wink:

Personally I do not mind restarting clients every now and then; got one question
though -- is there anyone working on this problem?

To me, it seems it's a long-standing (6 months+) FahCore_a1 bug.
Or maybe all dev efforts are focused to get _a2 going and deprecate _a1?

And no, I'm not complaining; just trying to say that if I was the dev,
this thing wouldn't let me sleep at night, but then again I am not so...


Cheers,
tear

Re: Project: 3064 (R3, C88, G31), segfault@41% x2[Resolved]

Posted: Fri May 23, 2008 2:53 am
by anandhanju
tear wrote:Or maybe all dev efforts are focused to get _a2 going and deprecate _a1?
Yes, I believe thats the idea. A2 is going to be A1's successor and A1 will be phased out. The researcher who eats and breathes SMP is working on the A2 core and fixing this in the A1 core may not feature in his plans.

I'd also posted a question, an observation if you will, here and bruce's answer, though being his opinion, may be the closest thing to the grand scheme of things that we'll know of.

Re: Project: 3064 (R3, C88, G31), segfault@41% x2[Resolved]

Posted: Fri May 23, 2008 5:30 pm
by DanEnsign
anandhanju wrote:
tear wrote:Or maybe all dev efforts are focused to get _a2 going and deprecate _a1?
Yes, I believe thats the idea. A2 is going to be A1's successor and A1 will be phased out. The researcher who eats and breathes SMP is working on the A2 core and fixing this in the A1 core may not feature in his plans.
Even if that's not the official plan (you'd have to talk to the developer for any official plans) I'll be starting some projects soon that will replace 3062, 3064, and 3065, and use a2.

Thanks to everyone for being patient -- even if a1's have been a pain, they have been extremely valuable scientifically.* No, seriously. These simulations are so cool that I can pick up girls in bars with them. Not that my wife approves, but that's not the point.

*Because SMP can get us really long trajectories, we're starting to understand that many of the measurements other researchers have made on the systems simulated by the 30xx series projects are looking at internal motions rather than protein folding. Paradigm shift, anyone?(http://en.wikipedia.org/wiki/Paradigm_shift)

Dan

Re: Project: 3064 (R3, C88, G31), segfault@41% x2[Resolved]

Posted: Thu May 29, 2008 4:02 pm
by hrsetrdr
I just got a 3064 (R3, C132, G6 on a quad with fresh dist-upgrade; am glad to be back in native Linux, will be anxious to see how it does.
DanEnsign wrote: These simulations are so cool that I can pick up girls in bars with them. Not that my wife approves, but that's not the point.
Dan
Wow, Palo Alto chicks do rock! ;)