Client Version 6.23 Beta R1
Arguments: -advmethods -verbosity 9 -smp
Executable: ./fah6
Launch directory: /root/fah6
Current Work Unit
-----------------
Name: p2665_IBX in water
Tag: P2665R2C717G58
Download time: October 15 05:26:53
Due time: October 21 05:26:53
Progress: 30% [|||_______]
...
[15:19:04] Project: 2665 (Run 2, Clone 717, Gen 58)
[15:19:04]
[15:19:04] Entering M.D.
[15:19:12] Protein: HGProtein: HGG with glycosylations
[15:19:12] Writing lCompleted 75000 out of 250000 steps (30 percent)
[15:19:12] 0 percent)
[15:19:12] Extra SSE boost OK.
[15:29:31] Warning: long 1-4 interactions
[15:29:31]
[15:29:31] Folding@home Core Shutdown: INTERRUPTED
[15:29:35] CoreStatus = 66 (102)
[15:29:35] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[15:29:35] Killing all core threads
This WU has hung at least four times, CPU utilization drops to zero, then
segfaults
[0]0:Return code = 102
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault
Moderators: Site Moderators, FAHC Science Team
-
- Site Moderator
- Posts: 6429
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault
There's no data for this WU in the stat DB 

Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault
The 66 (102) return codes are the major class of errors that we can't yet handle well with the new client (6.23).
-
- Posts: 365
- Joined: Tue Feb 12, 2008 7:33 am
- Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
- Location: SE Michigan, USA
Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault
Stopped process, renamed work dir, and queue.dat, and restarted client.
was assigned the same work unit. Hung again at 30% complete
Found the same situation again this morning, and was again assigned the same work unit.
I suspect it will again
was assigned the same work unit. Hung again at 30% complete
Found the same situation again this morning, and was again assigned the same work unit.
I suspect it will again
-
- Site Moderator
- Posts: 6429
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault
If I can remember well, the SMP server usually resend the same WU three times before assuming it's lost and moving to another. But sometimes, it doesn't work and some people get the same WU indefinitely.
If it happens with the WU someone from the PandeGroupe will have to remove it manually.
If it happens with the WU someone from the PandeGroupe will have to remove it manually.
-
- Posts: 365
- Joined: Tue Feb 12, 2008 7:33 am
- Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
- Location: SE Michigan, USA
Re: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault
ran a different unit, then got 2665 (Run 2, Clone 717, Gen 58) again. hung at 30% again.
restarted. got same wu assigned. stopped client immediately, deleted WU. restarted.
was assigned a different unit.
restarted. got same wu assigned. stopped client immediately, deleted WU. restarted.
was assigned a different unit.
-
- Posts: 365
- Joined: Tue Feb 12, 2008 7:33 am
- Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
- Location: SE Michigan, USA
Re: Project: 2665 (Run 2, Clone 717, Gen 58) Hang/Segfault
Found same situation again this morning. Machine was again assigned
2665 (Run 2, Clone 717, Gen 58), again hung at 30%. deleted and restarted
got the same WU two more times before getting a different WU.
How do we get PandeGroup to delete it permanently?
2665 (Run 2, Clone 717, Gen 58), again hung at 30%. deleted and restarted
got the same WU two more times before getting a different WU.
How do we get PandeGroup to delete it permanently?