Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Moderators: Site Moderators, FAHC Science Team

Post Reply
parkut
Posts: 365
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Post by parkut »

Client Version 6.23 Beta R1
Arguments: -advmethods -verbosity 9 -smp
Executable: ./fah6
Launch directory: /root/fah6

Current Work Unit
-----------------
Name: p2665_IBX in water
Tag: P2665R2C717G58
Download time: October 15 05:26:53
Due time: October 21 05:26:53
Progress: 30% [|||_______]
...
[15:19:04] Project: 2665 (Run 2, Clone 717, Gen 58)
[15:19:04]
[15:19:04] Entering M.D.
[15:19:12] Protein: HGProtein: HGG with glycosylations
[15:19:12] Writing lCompleted 75000 out of 250000 steps (30 percent)
[15:19:12] 0 percent)
[15:19:12] Extra SSE boost OK.
[15:29:31] Warning: long 1-4 interactions
[15:29:31]
[15:29:31] Folding@home Core Shutdown: INTERRUPTED
[15:29:35] CoreStatus = 66 (102)
[15:29:35] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[15:29:35] Killing all core threads

This WU has hung at least four times, CPU utilization drops to zero, then
segfaults

[0]0:Return code = 102
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
toTOW
Site Moderator
Posts: 6429
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Post by toTOW »

There's no data for this WU in the stat DB :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Post by kasson »

The 66 (102) return codes are the major class of errors that we can't yet handle well with the new client (6.23).
parkut
Posts: 365
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Post by parkut »

Stopped process, renamed work dir, and queue.dat, and restarted client.

was assigned the same work unit. Hung again at 30% complete

Found the same situation again this morning, and was again assigned the same work unit.

I suspect it will again
toTOW
Site Moderator
Posts: 6429
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Post by toTOW »

If I can remember well, the SMP server usually resend the same WU three times before assuming it's lost and moving to another. But sometimes, it doesn't work and some people get the same WU indefinitely.

If it happens with the WU someone from the PandeGroupe will have to remove it manually.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
parkut
Posts: 365
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Post by parkut »

ran a different unit, then got 2665 (Run 2, Clone 717, Gen 58) again. hung at 30% again.
restarted. got same wu assigned. stopped client immediately, deleted WU. restarted.
was assigned a different unit.
parkut
Posts: 365
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Re: Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Post by parkut »

Found same situation again this morning. Machine was again assigned
2665 (Run 2, Clone 717, Gen 58), again hung at 30%. deleted and restarted
got the same WU two more times before getting a different WU.

How do we get PandeGroup to delete it permanently?
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Post by kasson »

done.
Post Reply