Project: 2665 (Run 2, Clone 968, Gen 104) - 3 hrs since ckpt

Moderators: Site Moderators, FAHC Science Team

Post Reply
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Project: 2665 (Run 2, Clone 968, Gen 104) - 3 hrs since ckpt

Post by anko1 »

This is on Big Red, which hasn't had an EUE since April (iirc; i.e., my records are accurate). 3 hrs since last check point, so restarted a new WU (which is an improvement; iirc, it used to just hang until you found it).

Code: Select all

[20:03:05] + Closed connections
[20:03:05] 
[20:03:05] + Processing work unit
[20:03:05] Work type a1 not eligible for variable processors
[20:03:05] Core required: FahCore_a1.exe
[20:03:05] Core found.
[20:03:05] Using generic mpiexec calls
[20:03:05] Working on queue slot 09 [August 15 20:03:05 UTC]
[20:03:05] + Working ...
[20:03:05] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 09 -checkpoint 15 -verbose -lifeline 2496 -version 624'

[20:03:05] 
[20:03:05] *------------------------------*
[20:03:05] Folding@Home Gromacs SMP Core
[20:03:05] Version 1.74 (March 10, 2007)
[20:03:05] 
[20:03:05] Preparing to commence simulation
[20:03:05] - Looking at optimizations...
[20:03:06] .
[20:03:10] - Starting from initial work packet
[20:03:10] 
[20:03:10] Project: 2665 (Run 2, Clone 968, Gen 104)
[20:03:10] 
[20:03:11] Assembly optimizations on if available.
[20:03:11] Entering M.D.
[20:03:33] percent)
[20:03:33] - Starting from initial work packet
[20:03:33] 8, Gen 104)
[20:03:33] 
[20:03:33] Entering M.D.
[20:03:34] e 968, Gen 104)
[20:03:34] 
[20:03:34] Entering M.D.
[20:03:40] Rejecting checkpoint
[20:03:42] Protein: HGG with glycosylations
[20:03:42] Writing local files
[20:03:51] Extra SSE boost OK.
[20:03:51] Writing local files
[20:03:51] Completed 0 out of 250000 steps  (0 percent)
[20:18:52] Timered checkpoint triggered.
[20:19:36] Writing local files
[20:19:36] Completed 2500 out of 250000 steps  (1 percent)
                      {snip}
[10:33:35] Timered checkpoint triggered.
[10:34:27] Writing local files
[10:34:27] Completed 137500 out of 250000 steps  (55 percent)
[10:49:00] - Autosending finished units... [August 16 10:49:00 UTC]
[10:49:00] Trying to send all finished work units
[10:49:00] + No unsent completed units remaining.
[10:49:00] - Autosend completed
[10:49:28] Timered checkpoint triggered.
[10:50:18] Writing local files
[10:50:19] Completed 140000 out of 250000 steps  (56 percent)
[11:05:20] Timered checkpoint triggered.
[11:06:10] Writing local files
[11:06:11] Completed 142500 out of 250000 steps  (57 percent)
[14:06:12] At least 3 hours since checkpoint written...
[14:08:12] 
[14:08:12] Folding@home Core Shutdown: EARLY_UNIT_END
[14:08:12] 
[14:08:12] Folding@home Core Shutdown: EARLY_UNIT_END
[14:08:15] CoreStatus = 7B (123)
[14:08:15] Sending work to server
[14:08:15] Project: 2665 (Run 2, Clone 968, Gen 104)


[14:08:15] + Attempting to send results [August 16 14:08:15 UTC]
[14:08:15] - Reading file work/wuresults_09.dat from core
[14:08:15]   (Read 116435 bytes from disk)
[14:08:15] Connecting to http://171.64.65.64:80/
[14:08:16] Posted data.
[14:08:16] Initial: 0000; - Uploaded at ~114 kB/s
[14:08:16] - Averaged speed for that direction ~373 kB/s
[14:08:16] + Results successfully sent
[14:08:16] Thank you for your contribution to Folding@Home.
MtM
Posts: 1579
Joined: Fri Jun 27, 2008 2:20 pm
Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot
Location: The Netherlands
Contact:

Re: Project: 2665 (Run 2, Clone 968, Gen 104) - 3 hrs since ckpt

Post by MtM »

viewtopic.php?f=19&t=11048

Same occurance ( and there are 4 more threads about the same subject ).
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2665 (Run 2, Clone 968, Gen 104) - 3 hrs since ckpt

Post by anko1 »

Thanks for the response, MtM. Maybe I edited too much of my log. The WU was actually progressing normally until it just stopped. Probably just one of those mysterious incidents that happens now and then, but I thought I'd report it anyway. If you'd like to see more of the log, just let me know.
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: Project: 2665 (Run 2, Clone 968, Gen 104) - 3 hrs since ckpt

Post by P5-133XL »

Whenever I have a WU stall like that, the first thing I do is check a process manager (In windows it would be the task manager) to see if the FAHCore_xx processes are still running. If not then I know that folding needs to be restarted.
Image
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Project: 2665 (Run 2, Clone 968, Gen 104) - 3 hrs since ckpt

Post by anko1 »

Yes. What was nice about this is that the client caught the stall and restarted itself. Otherwise it would have waited for my return this morning.
susato
Site Moderator
Posts: 512
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: Project: 2665 (Run 2, Clone 968, Gen 104) - 3 hrs since ckpt

Post by susato »

MtM wrote:http://foldingforum.org/viewtopic.php?f=19&t=11048

Same occurance ( and there are 4 more threads about the same subject ).
Um, this looks different from the "folding on 1 core" problem in the link you cited.

Interesting, though, that the client noticed the delay between checkpoints and ended the unit. Good eye anko1.
MtM
Posts: 1579
Joined: Fri Jun 27, 2008 2:20 pm
Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot
Location: The Netherlands
Contact:

Re: Project: 2665 (Run 2, Clone 968, Gen 104) - 3 hrs since ckpt

Post by MtM »

susato wrote:
MtM wrote:http://foldingforum.org/viewtopic.php?f=19&t=11048

Same occurance ( and there are 4 more threads about the same subject ).
Um, this looks different from the "folding on 1 core" problem in the link you cited.

Interesting, though, that the client noticed the delay between checkpoints and ended the unit. Good eye anko1.
Here I did not read it well, but this wasn't really news to me and it's in the problems with a wu section so I guess I didn't make the connection with the client/core which got updated.

Sorry susato and OP, reading it back I can't believe I missed that, must have posted in a hurry.
Post Reply