Project: 3064 (Run 3, Clone 88, Gen 31) segfault@41% x2[fix]

Moderators: Site Moderators, FAHC Science Team

Post Reply
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Project: 3064 (Run 3, Clone 88, Gen 31) segfault@41% x2[fix]

Post by tear »

Hello and Welcome!

Per subject line. Failed twice at the same point [got new assignment
afterwards], terminal excerpts follow.
Didn't have a chance to re-start it in the middle of simulation this time
around; sorry.
[16:34:07] Writing local files
[16:34:07] Completed 2000000 out of 5000000 steps (40 percent)
[16:46:23] Writing local files
[16:46:23] Completed 2050000 out of 5000000 steps (41 percent)
[16:57:33] Warning: long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[16:57:37] CoreStatus = 0 (0)
[16:57:37] Client-core communications error: ERROR 0x0
[16:57:37] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 18
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[17:02:03] - Warning: Could not delete all work unit files (6): Core returned invalid code
[17:02:03] Trying to send all finished work units
[17:02:03] + No unsent completed units remaining.
[17:02:03] - Preparing to get new work unit...
[17:02:03] + Attempting to get work packet
[01:12:31] Writing local files
[01:12:31] Completed 2000000 out of 5000000 steps (40 percent)
[01:24:45] Writing local files
[01:24:45] Completed 2050000 out of 5000000 steps (41 percent)
[01:35:55] Warning: long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[01:35:59] CoreStatus = 0 (0)
[01:35:59] Client-core communications error: ERROR 0x0
[01:35:59] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 18
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[01:40:25] - Warning: Could not delete all work unit files (7): Core returned invalid code
[01:40:25] Trying to send all finished work units
[01:40:25] + No unsent completed units remaining.
[01:40:25] - Preparing to get new work unit...
[01:40:25] + Attempting to get work packet

Thank you and good night!
tear
One man's ceiling is another man's floor.
Image
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project: 3064 (Run 3, Clone 88, Gen 31), segfault@41% x2

Post by tear »

Hmmm, interesting. The client received P2605 after second failure, completed it
and got P3064 (3,88,31) again [sic!]. Will interrupt it @10%.
One man's ceiling is another man's floor.
Image
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project: 3064 (Run 3, Clone 88, Gen 31), segfault@41% x2

Post by tear »

FYI.

Restarted @9%; unit has surpassed 41% mark, currently @71% and folding.
One man's ceiling is another man's floor.
Image
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: Project: 3064 (Run 3, Clone 88, Gen 31), segfault@41% x2

Post by ppetrone »

Cool!
Thanks for the report & let's see what happens.

paula
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project: 3064 (Run 3, Clone 88, Gen 31), segfault@41% x2

Post by tear »

Hey Paula,


FWIW, unit was completed successfully. Let me know if you need any additional information.


Cheers,
tear
One man's ceiling is another man's floor.
Image
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project: 3064 (R3, C88, G31), segfault@41% x2[Resolved]

Post by tear »

Resolved.. well, hmm... it depends on what you call a resolution :wink:

Personally I do not mind restarting clients every now and then; got one question
though -- is there anyone working on this problem?

To me, it seems it's a long-standing (6 months+) FahCore_a1 bug.
Or maybe all dev efforts are focused to get _a2 going and deprecate _a1?

And no, I'm not complaining; just trying to say that if I was the dev,
this thing wouldn't let me sleep at night, but then again I am not so...


Cheers,
tear
One man's ceiling is another man's floor.
Image
anandhanju
Posts: 522
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: Project: 3064 (R3, C88, G31), segfault@41% x2[Resolved]

Post by anandhanju »

tear wrote:Or maybe all dev efforts are focused to get _a2 going and deprecate _a1?
Yes, I believe thats the idea. A2 is going to be A1's successor and A1 will be phased out. The researcher who eats and breathes SMP is working on the A2 core and fixing this in the A1 core may not feature in his plans.

I'd also posted a question, an observation if you will, here and bruce's answer, though being his opinion, may be the closest thing to the grand scheme of things that we'll know of.
DanEnsign
Pande Group Member
Posts: 56
Joined: Fri Nov 30, 2007 9:41 pm
Location: Austin, TX

Re: Project: 3064 (R3, C88, G31), segfault@41% x2[Resolved]

Post by DanEnsign »

anandhanju wrote:
tear wrote:Or maybe all dev efforts are focused to get _a2 going and deprecate _a1?
Yes, I believe thats the idea. A2 is going to be A1's successor and A1 will be phased out. The researcher who eats and breathes SMP is working on the A2 core and fixing this in the A1 core may not feature in his plans.
Even if that's not the official plan (you'd have to talk to the developer for any official plans) I'll be starting some projects soon that will replace 3062, 3064, and 3065, and use a2.

Thanks to everyone for being patient -- even if a1's have been a pain, they have been extremely valuable scientifically.* No, seriously. These simulations are so cool that I can pick up girls in bars with them. Not that my wife approves, but that's not the point.

*Because SMP can get us really long trajectories, we're starting to understand that many of the measurements other researchers have made on the systems simulated by the 30xx series projects are looking at internal motions rather than protein folding. Paradigm shift, anyone?(http://en.wikipedia.org/wiki/Paradigm_shift)

Dan
hrsetrdr
Posts: 112
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Re: Project: 3064 (R3, C88, G31), segfault@41% x2[Resolved]

Post by hrsetrdr »

I just got a 3064 (R3, C132, G6 on a quad with fresh dist-upgrade; am glad to be back in native Linux, will be anxious to see how it does.
DanEnsign wrote: These simulations are so cool that I can pick up girls in bars with them. Not that my wife approves, but that's not the point.
Dan
Wow, Palo Alto chicks do rock! ;)
Folding rig:Supermicro X9DRD-7LN4F-JBOD | (2) Xeon E5-2670 | 128GB DDR3 ECC Registered

Image
Install Folding@Home on Linux without Python dependancy issues
Post Reply