Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

parkut · Post by **parkut** » Wed Oct 15, 2008 3:33 pm

Client Version 6.23 Beta R1
Arguments: -advmethods -verbosity 9 -smp
Executable: ./fah6
Launch directory: /root/fah6

Current Work Unit
-----------------
Name: p2665_IBX in water
Tag: P2665R2C717G58
Download time: October 15 05:26:53
Due time: October 21 05:26:53
Progress: 30% [|||_______]
...
[15:19:04] Project: 2665 (Run 2, Clone 717, Gen 58)
[15:19:04]
[15:19:04] Entering M.D.
[15:19:12] Protein: HGProtein: HGG with glycosylations
[15:19:12] Writing lCompleted 75000 out of 250000 steps (30 percent)
[15:19:12] 0 percent)
[15:19:12] Extra SSE boost OK.
[15:29:31] Warning: long 1-4 interactions
[15:29:31]
[15:29:31] Folding@home Core Shutdown: INTERRUPTED
[15:29:35] CoreStatus = 66 (102)
[15:29:35] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[15:29:35] Killing all core threads

This WU has hung at least four times, CPU utilization drops to zero, then
segfaults

[0]0:Return code = 102
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault

Post by **toTOW** » Wed Oct 15, 2008 10:28 pm

There's no data for this WU in the stat DB

Post by **kasson** » Wed Oct 15, 2008 10:39 pm

The 66 (102) return codes are the major class of errors that we can't yet handle well with the new client (6.23).

parkut · Post by **parkut** » Thu Oct 23, 2008 2:17 pm

Stopped process, renamed work dir, and queue.dat, and restarted client.

was assigned the same work unit. Hung again at 30% complete

Found the same situation again this morning, and was again assigned the same work unit.

I suspect it will again

Post by **toTOW** » Thu Oct 23, 2008 6:18 pm

If I can remember well, the SMP server usually resend the same WU three times before assuming it's lost and moving to another. But sometimes, it doesn't work and some people get the same WU indefinitely.

If it happens with the WU someone from the PandeGroupe will have to remove it manually.

parkut · Post by **parkut** » Sat Oct 25, 2008 1:23 pm

ran a different unit, then got 2665 (Run 2, Clone 717, Gen 58) again. hung at 30% again.
restarted. got same wu assigned. stopped client immediately, deleted WU. restarted.
was assigned a different unit.

parkut · Post by **parkut** » Mon Oct 27, 2008 12:31 pm

Found same situation again this morning. Machine was again assigned
2665 (Run 2, Clone 717, Gen 58), again hung at 30%. deleted and restarted
got the same WU two more times before getting a different WU.

How do we get PandeGroup to delete it permanently?

Post by **kasson** » Mon Oct 27, 2008 3:25 pm

done.

Folding Forum

Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Re: Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault

Re: Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault