Project 2671 Hangs at Random

Moderators: Site Moderators, FAHC Science Team

Post Reply
HendricksSA
Posts: 339
Joined: Fri Jun 26, 2009 4:34 am

Project 2671 Hangs at Random

Post by HendricksSA »

I'm experiencing a random hang of my Linux SMP folding computer. I'm running Fedora 11 basic install with current updates. The computer seems to hang only when processing Project 2671 WUs. I've not had any problems with 2669 WUs but 2671 has been hanging up for several weeks now with increasing frequency. I deleted the folding directory, downloaded and reinstalled fresh yesterday to no avail. Without Project 2671, this computer runs perfectly for as long as I allow it to go. With Project 2671, I can expect it to hang several times a day. No errors are indicated and the screen, mouse, and keyboard are frozen. I can restart the computer and the WU continues processing (without SSE so it knows it terminated incorrectly) until it finishes or hangs. The log of a successful 2669 run and subsequent 2671 run with 3 hangups follows. Any advice you experts could share would be greatly appreciated. Thanks in advance!

Code: Select all

--- Opening Log file [August 25 16:32:09] 


# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/folding/folding
Executable: ./fah6
Arguments: -smp -verbosity 9 

[16:32:09] - Ask before connecting: No
[16:32:09] - User name: HendricksSA (Team 0)
[16:32:09] - User ID: 381E417376BD9808
[16:32:09] - Machine ID: 1
[16:32:09] 
[16:32:09] Work directory not found. Creating...
[16:32:09] Could not open work queue, generating new queue...
[16:32:09] - Preparing to get new work unit...
[16:32:09] - Autosending finished units...
[16:32:09] Trying to send all finished work units
[16:32:09] + No unsent completed units remaining.
[16:32:09] - Autosend completed
[16:32:09] + Attempting to get work packet
[16:32:09] - Will indicate memory of 7200 MB
[16:32:09] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 7
[16:32:09] - Connecting to assignment server
[16:32:09] Connecting to http://assign.stanford.edu:8080/
[16:32:09] Posted data.
[16:32:09] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[16:32:09] + News From Folding@Home: Welcome to Folding@Home
[16:32:10] Loaded queue successfully.
[16:32:10] Connecting to http://171.64.65.56:8080/
[16:32:16] Posted data.
[16:32:16] Initial: 0000; - Receiving payload (expected size: 4830990)
[16:32:24] - Downloaded at ~589 kB/s
[16:32:24] - Averaged speed for that direction ~589 kB/s
[16:32:24] + Received work.
[16:32:24] + Closed connections
[16:32:24] 
[16:32:24] + Processing work unit
[16:32:24] Core required: FahCore_a2.exe
[16:32:24] Core not found.
[16:32:24] - Core is not present or corrupted.
[16:32:24] - Attempting to download new core...
[16:32:24] + Downloading new core: FahCore_a2.exe
[16:32:24] Downloading core (/~pande/Linux/x86/Core_a2.fah from www.stanford.edu)
[16:32:24] Initial: AFDE; + 10240 bytes downloaded
[16:32:25] Initial: 2DF4; + 20480 bytes downloaded
[16:32:25] Initial: BA52; + 30720 bytes downloaded
[16:32:25] Initial: B719; + 40960 bytes downloaded
[16:32:25] Initial: 88F5; + 51200 bytes downloaded
[16:32:25] Initial: D186; + 61440 bytes downloaded
[16:32:25] Initial: C494; + 71680 bytes downloaded
[16:32:25] Initial: 2436; + 81920 bytes downloaded
[16:32:25] Initial: 6AB5; + 92160 bytes downloaded
[16:32:25] Initial: 15A1; + 102400 bytes downloaded
[16:32:25] Initial: 95C3; + 112640 bytes downloaded
[16:32:25] Initial: 467E; + 122880 bytes downloaded
[16:32:25] Initial: 52FD; + 133120 bytes downloaded
[16:32:25] Initial: F7AD; + 143360 bytes downloaded
[16:32:25] Initial: FF03; + 153600 bytes downloaded
[16:32:25] Initial: DB32; + 163840 bytes downloaded
[16:32:25] Initial: 8D05; + 174080 bytes downloaded
[16:32:25] Initial: 6120; + 184320 bytes downloaded
[16:32:25] Initial: 18D5; + 194560 bytes downloaded
[16:32:25] Initial: B434; + 204800 bytes downloaded
[16:32:25] Initial: 2BC7; + 215040 bytes downloaded
[16:32:25] Initial: 5768; + 225280 bytes downloaded
[16:32:25] Initial: EBB7; + 235520 bytes downloaded
[16:32:25] Initial: EE47; + 245760 bytes downloaded
[16:32:25] Initial: 8CA6; + 256000 bytes downloaded
[16:32:25] Initial: DA87; + 266240 bytes downloaded
[16:32:25] Initial: B36F; + 276480 bytes downloaded
[16:32:25] Initial: 750C; + 286720 bytes downloaded
[16:32:25] Initial: 4387; + 296960 bytes downloaded
[16:32:25] Initial: 3DF7; + 307200 bytes downloaded
[16:32:25] Initial: 6C1C; + 317440 bytes downloaded
[16:32:25] Initial: E07D; + 327680 bytes downloaded
[16:32:25] Initial: CCFB; + 337920 bytes downloaded
[16:32:26] Initial: F9E9; + 348160 bytes downloaded
[16:32:26] Initial: DB75; + 358400 bytes downloaded
[16:32:26] Initial: F5A2; + 368640 bytes downloaded
[16:32:26] Initial: E643; + 378880 bytes downloaded
[16:32:26] Initial: ADA5; + 389120 bytes downloaded
[16:32:26] Initial: 796A; + 399360 bytes downloaded
[16:32:26] Initial: 2929; + 409600 bytes downloaded
[16:32:26] Initial: 4725; + 419840 bytes downloaded
[16:32:26] Initial: 8807; + 430080 bytes downloaded
[16:32:26] Initial: 1736; + 440320 bytes downloaded
[16:32:26] Initial: 52A1; + 450560 bytes downloaded
[16:32:26] Initial: E44D; + 460800 bytes downloaded
[16:32:26] Initial: B51A; + 471040 bytes downloaded
[16:32:26] Initial: 3D9B; + 481280 bytes downloaded
[16:32:26] Initial: 43AD; + 491520 bytes downloaded
[16:32:26] Initial: B6D6; + 501760 bytes downloaded
[16:32:26] Initial: 6070; + 512000 bytes downloaded
[16:32:26] Initial: 8E38; + 522240 bytes downloaded
[16:32:26] Initial: 7294; + 532480 bytes downloaded
[16:32:26] Initial: A895; + 542720 bytes downloaded
[16:32:26] Initial: C9E7; + 552960 bytes downloaded
[16:32:26] Initial: D6A1; + 563200 bytes downloaded
[16:32:26] Initial: 250D; + 573440 bytes downloaded
[16:32:26] Initial: 9AB0; + 583680 bytes downloaded
[16:32:26] Initial: 88A2; + 593920 bytes downloaded
[16:32:26] Initial: 6298; + 604160 bytes downloaded
[16:32:26] Initial: 5A02; + 614400 bytes downloaded
[16:32:26] Initial: 9DEC; + 624640 bytes downloaded
[16:32:26] Initial: E7EE; + 634880 bytes downloaded
[16:32:26] Initial: C229; + 645120 bytes downloaded
[16:32:26] Initial: A221; + 655360 bytes downloaded
[16:32:26] Initial: 33CA; + 665600 bytes downloaded
[16:32:26] Initial: 8F5A; + 675840 bytes downloaded
[16:32:26] Initial: 3C59; + 686080 bytes downloaded
[16:32:26] Initial: F0D6; + 696320 bytes downloaded
[16:32:26] Initial: 62D7; + 706560 bytes downloaded
[16:32:26] Initial: 960E; + 716800 bytes downloaded
[16:32:26] Initial: DF4D; + 727040 bytes downloaded
[16:32:26] Initial: 4ACF; + 737280 bytes downloaded
[16:32:26] Initial: 1422; + 747520 bytes downloaded
[16:32:26] Initial: 5F27; + 757760 bytes downloaded
[16:32:26] Initial: 00A2; + 768000 bytes downloaded
[16:32:26] Initial: E542; + 778240 bytes downloaded
[16:32:26] Initial: B745; + 788480 bytes downloaded
[16:32:26] Initial: 2A30; + 798720 bytes downloaded
[16:32:26] Initial: EE1E; + 808960 bytes downloaded
[16:32:26] Initial: 827D; + 819200 bytes downloaded
[16:32:26] Initial: ABF2; + 829440 bytes downloaded
[16:32:26] Initial: 2A95; + 839680 bytes downloaded
[16:32:26] Initial: A1C8; + 849920 bytes downloaded
[16:32:26] Initial: 2FB9; + 860160 bytes downloaded
[16:32:26] Initial: 56B3; + 870400 bytes downloaded
[16:32:26] Initial: 2589; + 880640 bytes downloaded
[16:32:26] Initial: 20D0; + 890880 bytes downloaded
[16:32:26] Initial: 1B7B; + 901120 bytes downloaded
[16:32:26] Initial: 1D1D; + 911360 bytes downloaded
[16:32:26] Initial: 5CF5; + 921600 bytes downloaded
[16:32:26] Initial: AF0D; + 931840 bytes downloaded
[16:32:26] Initial: F5C8; + 942080 bytes downloaded
[16:32:26] Initial: 4A27; + 952320 bytes downloaded
[16:32:26] Initial: A4EA; + 962560 bytes downloaded
[16:32:26] Initial: 4145; + 972800 bytes downloaded
[16:32:26] Initial: 06C7; + 983040 bytes downloaded
[16:32:26] Initial: 5656; + 993280 bytes downloaded
[16:32:26] Initial: 7BDC; + 1003520 bytes downloaded
[16:32:26] Initial: A956; + 1013760 bytes downloaded
[16:32:26] Initial: 3205; + 1024000 bytes downloaded
[16:32:26] Initial: A44B; + 1034240 bytes downloaded
[16:32:26] Initial: 4620; + 1044480 bytes downloaded
[16:32:26] Initial: 36F7; + 1054720 bytes downloaded
[16:32:26] Initial: EAF9; + 1064960 bytes downloaded
[16:32:26] Initial: 6522; + 1075200 bytes downloaded
[16:32:26] Initial: F08A; + 1085440 bytes downloaded
[16:32:27] Initial: E582; + 1095680 bytes downloaded
[16:32:27] Initial: E54C; + 1105920 bytes downloaded
[16:32:27] Initial: D7D3; + 1116160 bytes downloaded
[16:32:27] Initial: ACB0; + 1126400 bytes downloaded
[16:32:27] Initial: AD60; + 1136640 bytes downloaded
[16:32:27] Initial: 13AF; + 1146880 bytes downloaded
[16:32:27] Initial: 6301; + 1157120 bytes downloaded
[16:32:27] Initial: 274A; + 1167360 bytes downloaded
[16:32:27] Initial: 2927; + 1177600 bytes downloaded
[16:32:27] Initial: 4953; + 1187840 bytes downloaded
[16:32:27] Initial: 40D4; + 1198080 bytes downloaded
[16:32:27] Initial: 4AEC; + 1208320 bytes downloaded
[16:32:27] Initial: 8298; + 1218560 bytes downloaded
[16:32:27] Initial: E975; + 1228800 bytes downloaded
[16:32:27] Initial: 20E4; + 1239040 bytes downloaded
[16:32:27] Initial: 8B86; + 1249280 bytes downloaded
[16:32:27] Initial: 2092; + 1259520 bytes downloaded
[16:32:27] Initial: 529D; + 1269760 bytes downloaded
[16:32:27] Initial: 57F2; + 1280000 bytes downloaded
[16:32:27] Initial: C63A; + 1290240 bytes downloaded
[16:32:27] Initial: F76A; + 1300480 bytes downloaded
[16:32:27] Initial: D3B4; + 1310720 bytes downloaded
[16:32:27] Initial: A881; + 1320960 bytes downloaded
[16:32:27] Initial: D81D; + 1331200 bytes downloaded
[16:32:27] Initial: 4BF0; + 1341440 bytes downloaded
[16:32:27] Initial: 4D0F; + 1351680 bytes downloaded
[16:32:27] Initial: 8C80; + 1361920 bytes downloaded
[16:32:27] Initial: 0CBA; + 1372160 bytes downloaded
[16:32:27] Initial: F4C9; + 1382400 bytes downloaded
[16:32:27] Initial: 0ECB; + 1392640 bytes downloaded
[16:32:27] Initial: 796A; + 1402880 bytes downloaded
[16:32:27] Initial: 118A; + 1413120 bytes downloaded
[16:32:27] Initial: 0FCE; + 1423360 bytes downloaded
[16:32:27] Initial: 4045; + 1433600 bytes downloaded
[16:32:27] Initial: BB62; + 1443840 bytes downloaded
[16:32:27] Initial: CFD1; + 1454080 bytes downloaded
[16:32:27] Initial: BBA1; + 1464320 bytes downloaded
[16:32:27] Initial: 2594; + 1474560 bytes downloaded
[16:32:27] Initial: 88C6; + 1484800 bytes downloaded
[16:32:27] Initial: 1304; + 1495040 bytes downloaded
[16:32:27] Initial: F039; + 1505280 bytes downloaded
[16:32:27] Initial: C7CE; + 1515520 bytes downloaded
[16:32:27] Initial: D1A2; + 1525760 bytes downloaded
[16:32:27] Initial: F375; + 1536000 bytes downloaded
[16:32:27] Initial: F5D0; + 1546240 bytes downloaded
[16:32:27] Initial: 5419; + 1556480 bytes downloaded
[16:32:27] Initial: 11D2; + 1566720 bytes downloaded
[16:32:27] Initial: F77C; + 1576960 bytes downloaded
[16:32:27] Initial: 271F; + 1587200 bytes downloaded
[16:32:27] Initial: C5F5; + 1597440 bytes downloaded
[16:32:27] Initial: EC87; + 1607680 bytes downloaded
[16:32:27] Initial: C6C1; + 1617920 bytes downloaded
[16:32:27] Initial: BBD2; + 1628160 bytes downloaded
[16:32:27] Initial: 68FB; + 1638400 bytes downloaded
[16:32:27] Initial: 5BD4; + 1648640 bytes downloaded
[16:32:27] Initial: 7276; + 1658880 bytes downloaded
[16:32:27] Initial: 51D1; + 1669120 bytes downloaded
[16:32:27] Initial: 14A0; + 1679360 bytes downloaded
[16:32:27] Initial: A8D5; + 1689600 bytes downloaded
[16:32:27] Initial: B89B; + 1699840 bytes downloaded
[16:32:27] Initial: 4E79; + 1710080 bytes downloaded
[16:32:28] Initial: 6BB6; + 1720320 bytes downloaded
[16:32:28] Initial: 49B2; + 1730560 bytes downloaded
[16:32:28] Initial: 6B0C; + 1740800 bytes downloaded
[16:32:28] Initial: D8D6; + 1751040 bytes downloaded
[16:32:28] Initial: 8467; + 1761280 bytes downloaded
[16:32:28] Initial: AE61; + 1771520 bytes downloaded
[16:32:28] Initial: 097E; + 1781760 bytes downloaded
[16:32:28] Initial: 01C3; + 1785668 bytes downloaded
[16:32:28] Verifying core Core_a2.fah...
[16:32:28] Signature is VALID
[16:32:28] 
[16:32:28] Trying to unzip core FahCore_a2.exe
[16:32:28] Decompressed FahCore_a2.exe (4382312 bytes) successfully
[16:32:28] + Core successfully engaged
[16:32:35] 
[16:32:35] + Processing work unit
[16:32:35] Core required: FahCore_a2.exe
[16:32:35] Core found.
[16:32:35] Working on Unit 01 [August 25 16:32:35]
[16:32:35] + Working ...
[16:32:35] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 4492 -version 602'

[16:32:35] 
[16:32:35] *------------------------------*
[16:32:35] Folding@Home Gromacs SMP Core
[16:32:35] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[16:32:35] 
[16:32:35] Preparing to commence simulation
[16:32:35] - Ensuring status. Please wait.
[16:32:35] Working with standard loops on this execution.
[16:32:35] - Files status OK
[16:32:36] - Expanded Called DecompressByteArray: compressed_data_size=Called DecompressByteArray: compressed_data_size=4830478 data_size=- Digital signature verified
[16:32:36] 
[16:32:36] Project: 2669 (Run - Digital signature veriAssembly optimizations on if available.
[16:32:36] Entering Entering M.D.
[16:32:45] one 47, Gen 183)
[16:32:45] 
[16:32:45] Entering M.D.
[16:32:54]  (0%)
[16:39:45] Completed 2500 out of 250000 steps  (1%)
[16:46:27] Completed 5000 out of 250000 steps  (2%)
[16:53:03] Completed 7500 out of 250000 steps  (3%)
[16:59:40] Completed 10000 out of 250000 steps  (4%)
[17:06:22] Completed 12500 out of 250000 steps  (5%)
[17:13:05] Completed 15000 out of 250000 steps  (6%)
[17:19:45] Completed 17500 out of 250000 steps  (7%)
[17:26:24] Completed 20000 out of 250000 steps  (8%)
[17:33:02] Completed 22500 out of 250000 steps  (9%)
[17:39:45] Completed 25000 out of 250000 steps  (10%)
[17:46:22] Completed 27500 out of 250000 steps  (11%)
[17:53:04] Completed 30000 out of 250000 steps  (12%)
[17:59:44] Completed 32500 out of 250000 steps  (13%)
[18:06:22] Completed 35000 out of 250000 steps  (14%)
[18:13:05] Completed 37500 out of 250000 steps  (15%)
[18:19:45] Completed 40000 out of 250000 steps  (16%)
[18:26:21] Completed 42500 out of 250000 steps  (17%)
[18:32:58] Completed 45000 out of 250000 steps  (18%)
[18:39:36] Completed 47500 out of 250000 steps  (19%)
[18:46:13] Completed 50000 out of 250000 steps  (20%)
[18:52:53] Completed 52500 out of 250000 steps  (21%)
[18:59:35] Completed 55000 out of 250000 steps  (22%)
[19:06:11] Completed 57500 out of 250000 steps  (23%)
[19:12:46] Completed 60000 out of 250000 steps  (24%)
[19:19:27] Completed 62500 out of 250000 steps  (25%)
[19:26:09] Completed 65000 out of 250000 steps  (26%)
[19:32:53] Completed 67500 out of 250000 steps  (27%)
[19:39:33] Completed 70000 out of 250000 steps  (28%)
[19:46:11] Completed 72500 out of 250000 steps  (29%)
[19:52:48] Completed 75000 out of 250000 steps  (30%)
[19:59:31] Completed 77500 out of 250000 steps  (31%)
[20:06:11] Completed 80000 out of 250000 steps  (32%)
[20:12:50] Completed 82500 out of 250000 steps  (33%)
[20:19:35] Completed 85000 out of 250000 steps  (34%)
[20:26:14] Completed 87500 out of 250000 steps  (35%)
[20:32:52] Completed 90000 out of 250000 steps  (36%)
[20:39:31] Completed 92500 out of 250000 steps  (37%)
[20:46:09] Completed 95000 out of 250000 steps  (38%)
[20:52:49] Completed 97500 out of 250000 steps  (39%)
[20:59:32] Completed 100000 out of 250000 steps  (40%)
[21:06:13] Completed 102500 out of 250000 steps  (41%)
[21:12:49] Completed 105000 out of 250000 steps  (42%)
[21:19:29] Completed 107500 out of 250000 steps  (43%)
[21:26:16] Completed 110000 out of 250000 steps  (44%)
[21:32:58] Completed 112500 out of 250000 steps  (45%)
[21:39:36] Completed 115000 out of 250000 steps  (46%)
[21:46:16] Completed 117500 out of 250000 steps  (47%)
[21:52:55] Completed 120000 out of 250000 steps  (48%)
[21:59:37] Completed 122500 out of 250000 steps  (49%)
[22:06:16] Completed 125000 out of 250000 steps  (50%)
[22:12:51] Completed 127500 out of 250000 steps  (51%)
[22:19:34] Completed 130000 out of 250000 steps  (52%)
[22:26:12] Completed 132500 out of 250000 steps  (53%)
[22:32:09] - Autosending finished units...
[22:32:09] Trying to send all finished work units
[22:32:09] + No unsent completed units remaining.
[22:32:09] - Autosend completed
[22:32:51] Completed 135000 out of 250000 steps  (54%)
[22:39:35] Completed 137500 out of 250000 steps  (55%)
[22:46:15] Completed 140000 out of 250000 steps  (56%)
[22:52:57] Completed 142500 out of 250000 steps  (57%)
[22:59:36] Completed 145000 out of 250000 steps  (58%)
[23:06:15] Completed 147500 out of 250000 steps  (59%)
[23:13:07] Completed 150000 out of 250000 steps  (60%)
[23:20:00] Completed 152500 out of 250000 steps  (61%)
[23:26:44] Completed 155000 out of 250000 steps  (62%)
[23:33:42] Completed 157500 out of 250000 steps  (63%)
[23:40:40] Completed 160000 out of 250000 steps  (64%)
[23:47:26] Completed 162500 out of 250000 steps  (65%)
[23:54:07] Completed 165000 out of 250000 steps  (66%)
[00:00:52] Completed 167500 out of 250000 steps  (67%)
[00:07:44] Completed 170000 out of 250000 steps  (68%)
[00:14:38] Completed 172500 out of 250000 steps  (69%)
[00:21:26] Completed 175000 out of 250000 steps  (70%)
[00:28:08] Completed 177500 out of 250000 steps  (71%)
[00:35:02] Completed 180000 out of 250000 steps  (72%)
[00:41:46] Completed 182500 out of 250000 steps  (73%)
[00:48:28] Completed 185000 out of 250000 steps  (74%)
[00:55:14] Completed 187500 out of 250000 steps  (75%)
[01:02:03] Completed 190000 out of 250000 steps  (76%)
[01:08:50] Completed 192500 out of 250000 steps  (77%)
[01:15:46] Completed 195000 out of 250000 steps  (78%)
[01:22:31] Completed 197500 out of 250000 steps  (79%)
[01:29:18] Completed 200000 out of 250000 steps  (80%)
[01:36:01] Completed 202500 out of 250000 steps  (81%)
[01:43:02] Completed 205000 out of 250000 steps  (82%)
[01:50:03] Completed 207500 out of 250000 steps  (83%)
[01:56:49] Completed 210000 out of 250000 steps  (84%)
[02:03:33] Completed 212500 out of 250000 steps  (85%)
[02:10:20] Completed 215000 out of 250000 steps  (86%)
[02:17:04] Completed 217500 out of 250000 steps  (87%)
[02:23:51] Completed 220000 out of 250000 steps  (88%)
[02:30:35] Completed 222500 out of 250000 steps  (89%)
[02:37:13] Completed 225000 out of 250000 steps  (90%)
[02:43:55] Completed 227500 out of 250000 steps  (91%)
[02:50:35] Completed 230000 out of 250000 steps  (92%)
[02:57:18] Completed 232500 out of 250000 steps  (93%)
[03:03:57] Completed 235000 out of 250000 steps  (94%)
[03:10:40] Completed 237500 out of 250000 steps  (95%)
[03:17:26] Completed 240000 out of 250000 steps  (96%)
[03:24:08] Completed 242500 out of 250000 steps  (97%)
[03:30:50] Completed 245000 out of 250000 steps  (98%)
[03:37:31] Completed 247500 out of 250000 steps  (99%)
[03:44:12] Completed 250000 out of 250000 steps  (100%)
[03:44:13] DynamicWrapper: Finished Work Unit: sleep=10000
[03:44:23] 
[03:44:23] Finished Work Unit:
[03:44:23] - Reading up to 21123792 from "work/wudata_01.trr": Read 21123792
[03:44:23] trr file hash check passed.
[03:44:23] - Reading up to 4510056 from "work/wudata_01.xtc": Read 4510056
[03:44:23] xtc file hash check passed.
[03:44:23] edr file hash check passed.
[03:44:23] logfile size: 181235
[03:44:23] Leaving Run
[03:44:26] - Writing 25959835 bytes of core data to disk...
[03:44:27]   ... Done.
[03:46:14] - Shutting down core
[03:46:14] 
[03:46:14] Folding@home Core Shutdown: FINISHED_UNIT
[03:47:47] CoreStatus = 64 (100)
[03:47:47] Unit 1 finished with 84 percent of time to deadline remaining.
[03:47:47] Updated performance fraction: 0.843661
[03:47:47] Sending work to server


[03:47:47] + Attempting to send results
[03:47:47] - Reading file work/wuresults_01.dat from core
[03:47:47]   (Read 25959835 bytes from disk)
[03:47:47] Connecting to http://171.64.65.56:8080/
[03:49:36] Posted data.
[03:49:36] Initial: 0000; - Uploaded at ~216 kB/s
[03:49:44] - Averaged speed for that direction ~216 kB/s
[03:49:44] + Results successfully sent
[03:49:44] Thank you for your contribution to Folding@Home.
[03:49:44] + Starting local stats count at 1
[03:50:36] - Warning: Could not delete all work unit files (1): Core file absent
[03:50:36] Trying to send all finished work units
[03:50:36] + No unsent completed units remaining.
[03:50:36] - Preparing to get new work unit...
[03:50:36] + Attempting to get work packet
[03:50:36] - Will indicate memory of 7200 MB
[03:50:36] - Connecting to assignment server
[03:50:36] Connecting to http://assign.stanford.edu:8080/
[03:50:36] Posted data.
[03:50:36] Initial: 43AB; - Successful: assigned to (171.67.108.24).
[03:50:36] + News From Folding@Home: Welcome to Folding@Home
[03:50:36] Loaded queue successfully.
[03:50:36] Connecting to http://171.67.108.24:8080/
[03:50:42] Posted data.
[03:50:42] Initial: 0000; - Receiving payload (expected size: 4835024)
[03:50:49] - Downloaded at ~674 kB/s
[03:50:49] - Averaged speed for that direction ~632 kB/s
[03:50:49] + Received work.
[03:50:49] Trying to send all finished work units
[03:50:49] + No unsent completed units remaining.
[03:50:49] + Closed connections
[03:50:49] 
[03:50:49] + Processing work unit
[03:50:49] Core required: FahCore_a2.exe
[03:50:49] Core found.
[03:50:49] Working on Unit 02 [August 26 03:50:49]
[03:50:49] + Working ...
[03:50:49] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 15 -verbose -lifeline 4492 -version 602'

[03:50:49] 
[03:50:49] *------------------------------*
[03:50:49] Folding@Home Gromacs SMP Core
[03:50:49] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[03:50:49] 
[03:50:49] Preparing to commence simulation
[03:50:49] - Ensuring status. Please wait.
[03:50:50] Called DecompressByteArray: compressed_data_size=4834512 data_size=24039989, decompressed_data_size=24039989 diff=0
[03:50:50] - Digital signature verified
[03:50:50] 
[03:50:50] Project: 2671 (Run 48, Clone 39, Gen 89)
[03:50:50] 
[03:50:50] Assembly optimizations on if available.
[03:50:50] Entering M.D.
[03:51:00] Run 48, Clone 39, Gen 89)
[03:51:00] 
[03:51:00] Entering M.D.
[03:57:40] pleted 2500 out of 250000 steps  (1%)
[04:04:09] Completed 5000 out of 250000 steps  (2%)
[04:10:44] Completed 7500 out of 250000 steps  (3%)
[04:17:18] Completed 10000 out of 250000 steps  (4%)
[04:23:51] Completed 12500 out of 250000 steps  (5%)
[04:30:22] Completed 15000 out of 250000 steps  (6%)
[04:32:09] - Autosending finished units...
[04:32:09] Trying to send all finished work units
[04:32:09] + No unsent completed units remaining.
[04:32:09] - Autosend completed
[04:36:54] Completed 17500 out of 250000 steps  (7%)
[04:43:30] Completed 20000 out of 250000 steps  (8%)
[04:50:02] Completed 22500 out of 250000 steps  (9%)
[04:56:30] Completed 25000 out of 250000 steps  (10%)
[05:02:58] Completed 27500 out of 250000 steps  (11%)
[05:09:30] Completed 30000 out of 250000 steps  (12%)
[05:16:01] Completed 32500 out of 250000 steps  (13%)
[05:22:32] Completed 35000 out of 250000 steps  (14%)
[05:29:03] Completed 37500 out of 250000 steps  (15%)
[05:35:32] Completed 40000 out of 250000 steps  (16%)
[05:42:04] Completed 42500 out of 250000 steps  (17%)
[05:48:32] Completed 45000 out of 250000 steps  (18%)
[05:55:03] Completed 47500 out of 250000 steps  (19%)
[06:01:36] Completed 50000 out of 250000 steps  (20%)
[06:08:10] Completed 52500 out of 250000 steps  (21%)
[06:14:49] Completed 55000 out of 250000 steps  (22%)
[06:21:20] Completed 57500 out of 250000 steps  (23%)

*** HERE IS WHERE IT HUNG WITH NO ERRORS INDICATED ***

--- Opening Log file [August 26 06:36:58] 


# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/folding/folding
Executable: ./fah6
Arguments: -smp -verbosity 9 

[06:36:58] - Ask before connecting: No
[06:36:58] - User name: HendricksSA (Team 0)
[06:36:58] - User ID: 381E417376BD9808
[06:36:58] - Machine ID: 1
[06:36:58] 
[06:36:58] Loaded queue successfully.
[06:36:58] 
[06:36:58] - Autosending finished units...
[06:36:58] + Processing work unit
[06:36:58] Trying to send all finished work units
[06:36:58] Core required: FahCore_a2.exe
[06:36:58] + No unsent completed units remaining.
[06:36:58] - Autosend completed
[06:36:58] Core found.
[06:36:58] Working on Unit 02 [August 26 06:36:58]
[06:36:58] + Working ...
[06:36:58] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 15 -verbose -lifeline 2949 -version 602'

[06:36:58] 
[06:36:58] *------------------------------*
[06:36:58] Folding@Home Gromacs SMP Core
[06:36:58] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[06:36:58] 
[06:36:58] Preparing to commence simulation
[06:36:58] - Ensuring status. Please wait.
[06:37:08] - Looking at optimizations...
[06:37:08] - Working with standard loops on this execution.
[06:37:08] - Files status OK
[06:37:09] - Expanded 4834512 -> 24039989 (decompressed 497.2 percent)
[06:37:09] Called DecompressByteArray: compressed_data_size=4834512 data_size=24039989, decompressed_data_size=24039989 diff=0
[06:37:09] - Digital signature verified
[06:37:09] 
[06:37:09] Project: 2671 (Run 48, Clone 39, Gen 89)
[06:37:09] 
[06:37:09] Entering M.D.
[06:37:15] Using Gromacs checkpoints
[06:37:18] Resuming from checkpoint
[06:37:18] Verified work/wudata_02.log
[06:37:18] Verified work/wudata_02.trr
[06:37:19] Verified work/wudata_02.xtc
[06:37:19] Verified work/wudata_02.edr
[06:37:19] Completed 57510 out of 250000 steps  (23%)
[06:43:53] Completed 60000 out of 250000 steps  (24%)
[06:50:24] Completed 62500 out of 250000 steps  (25%)
[06:56:55] Completed 65000 out of 250000 steps  (26%)
[07:03:25] Completed 67500 out of 250000 steps  (27%)
[07:09:56] Completed 70000 out of 250000 steps  (28%)
[07:16:28] Completed 72500 out of 250000 steps  (29%)
[07:22:58] Completed 75000 out of 250000 steps  (30%)
[07:29:27] Completed 77500 out of 250000 steps  (31%)

*** HERE IS WHERE IT HUNG WITH NO ERRORS INDICATED ***

--- Opening Log file [August 26 11:31:29] 


# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/folding/folding
Executable: ./fah6
Arguments: -smp -verbosity 9 -pause 

[11:31:29] - Ask before connecting: No
[11:31:29] - User name: HendricksSA (Team 0)
[11:31:29] - User ID: 381E417376BD9808
[11:31:29] - Machine ID: 1
[11:31:29] 
[11:31:29] Loaded queue successfully.
[11:31:29] 
[11:31:29] - Autosending finished units...
[11:31:29] + Processing work unit
[11:31:29] Trying to send all finished work units
[11:31:29] Core required: FahCore_a2.exe
[11:31:29] + No unsent completed units remaining.
[11:31:29] Core found.
[11:31:29] - Autosend completed
[11:31:29] Working on Unit 02 [August 26 11:31:29]
[11:31:29] + Working ...
[11:31:29] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 15 -verbose -lifeline 2949 -version 602'

[11:31:29] 
[11:31:29] *------------------------------*
[11:31:29] Folding@Home Gromacs SMP Core
[11:31:29] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[11:31:29] 
[11:31:29] Preparing to commence simulation
[11:31:29] - Ensuring status. Please wait.
[11:31:39] - Looking at optimizations...
[11:31:39] - Working with standard loops on this execution.
[11:31:39] - Files status OK
[11:31:40] - Expanded 4834512 -> 24039989 (decompressed 497.2 percent)
[11:31:40] Called DecompressByteArray: compressed_data_size=4834512 data_size=24039989, decompressed_data_size=24039989 diff=0
[11:31:40] - Digital signature verified
[11:31:40] 
[11:31:40] Project: 2671 (Run 48, Clone 39, Gen 89)
[11:31:40] 
[11:31:40] Entering M.D.
[11:31:46] Using Gromacs checkpoints
[11:31:49] Resuming from checkpoint
[11:31:50] Verified work/wudata_02.log
[11:31:50] Verified work/wudata_02.trr
[11:31:50] Verified work/wudata_02.xtc
[11:31:50] Verified work/wudata_02.edr
[11:31:50] Completed 77510 out of 250000 steps  (31%)
[11:38:24] Completed 80000 out of 250000 steps  (32%)
[11:45:04] Completed 82500 out of 250000 steps  (33%)
[11:51:37] Completed 85000 out of 250000 steps  (34%)
[11:58:12] Completed 87500 out of 250000 steps  (35%)
[12:04:49] Completed 90000 out of 250000 steps  (36%)
[12:11:22] Completed 92500 out of 250000 steps  (37%)
[12:18:18] Completed 95000 out of 250000 steps  (38%)
[12:24:53] Completed 97500 out of 250000 steps  (39%)
[12:31:26] Completed 100000 out of 250000 steps  (40%)
[12:38:02] Completed 102500 out of 250000 steps  (41%)
[12:44:34] Completed 105000 out of 250000 steps  (42%)
[12:51:07] Completed 107500 out of 250000 steps  (43%)
[12:57:38] Completed 110000 out of 250000 steps  (44%)
[13:04:12] Completed 112500 out of 250000 steps  (45%)
[13:10:46] Completed 115000 out of 250000 steps  (46%)
[13:17:16] Completed 117500 out of 250000 steps  (47%)
[13:23:47] Completed 120000 out of 250000 steps  (48%)
[13:30:17] Completed 122500 out of 250000 steps  (49%)
[13:36:50] Completed 125000 out of 250000 steps  (50%)
[13:43:22] Completed 127500 out of 250000 steps  (51%)
[13:49:53] Completed 130000 out of 250000 steps  (52%)
[13:56:24] Completed 132500 out of 250000 steps  (53%)
[14:02:59] Completed 135000 out of 250000 steps  (54%)
[14:09:33] Completed 137500 out of 250000 steps  (55%)
[14:16:05] Completed 140000 out of 250000 steps  (56%)
[14:22:38] Completed 142500 out of 250000 steps  (57%)
[14:29:11] Completed 145000 out of 250000 steps  (58%)
[14:35:44] Completed 147500 out of 250000 steps  (59%)
[14:42:18] Completed 150000 out of 250000 steps  (60%)
[14:48:55] Completed 152500 out of 250000 steps  (61%)
[14:55:26] Completed 155000 out of 250000 steps  (62%)
[15:02:02] Completed 157500 out of 250000 steps  (63%)
[15:08:36] Completed 160000 out of 250000 steps  (64%)
[15:15:10] Completed 162500 out of 250000 steps  (65%)
[15:21:45] Completed 165000 out of 250000 steps  (66%)
[15:28:19] Completed 167500 out of 250000 steps  (67%)
[15:34:51] Completed 170000 out of 250000 steps  (68%)
[15:41:23] Completed 172500 out of 250000 steps  (69%)
[15:47:58] Completed 175000 out of 250000 steps  (70%)
[15:54:32] Completed 177500 out of 250000 steps  (71%)
[16:01:06] Completed 180000 out of 250000 steps  (72%)
[16:07:39] Completed 182500 out of 250000 steps  (73%)
[16:14:14] Completed 185000 out of 250000 steps  (74%)
[16:20:51] Completed 187500 out of 250000 steps  (75%)
[16:27:31] Completed 190000 out of 250000 steps  (76%)
[16:34:07] Completed 192500 out of 250000 steps  (77%)
[16:40:43] Completed 195000 out of 250000 steps  (78%)
[16:47:20] Completed 197500 out of 250000 steps  (79%)
[16:53:56] Completed 200000 out of 250000 steps  (80%)
[17:00:29] Completed 202500 out of 250000 steps  (81%)
[17:07:01] Completed 205000 out of 250000 steps  (82%)
[17:13:34] Completed 207500 out of 250000 steps  (83%)
[17:20:07] Completed 210000 out of 250000 steps  (84%)

*** HERE IS WHERE IT HUNG WITH NO ERRORS INDICATED ***

--- Opening Log file [August 26 17:51:29] 


# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 6.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/folding/folding
Executable: ./fah6
Arguments: -smp -verbosity 9 -pause 

[17:51:29] - Ask before connecting: No
[17:51:29] - User name: HendricksSA (Team 0)
[17:51:29] - User ID: 381E417376BD9808
[17:51:29] - Machine ID: 1
[17:51:29] 
[17:51:29] Loaded queue successfully.
[17:51:29] 
[17:51:29] - Autosending finished units...
[17:51:29] + Processing work unit
[17:51:29] Trying to send all finished work units
[17:51:29] Core required: FahCore_a2.exe
[17:51:29] + No unsent completed units remaining.
[17:51:29] Core found.
[17:51:29] - Autosend completed
[17:51:29] Working on Unit 02 [August 26 17:51:29]
[17:51:29] + Working ...
[17:51:29] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 02 -checkpoint 15 -verbose -lifeline 2980 -version 602'

[17:51:29] 
[17:51:29] *------------------------------*
[17:51:29] Folding@Home Gromacs SMP Core
[17:51:29] Version 2.08 (Mon May 18 14:47:42 PDT 2009)
[17:51:29] 
[17:51:29] Preparing to commence simulation
[17:51:29] - Ensuring status. Please wait.
[17:51:39] - Looking at optimizations...
[17:51:39] - Working with standard loops on this execution.
[17:51:39] - Files status OK
[17:51:40] - Expanded 4834512 -> 24039989 (decompressed 497.2 percent)
[17:51:40] Called DecompressByteArray: compressed_data_size=4834512 data_size=24039989, decompressed_data_size=24039989 diff=0
[17:51:40] - Digital signature verified
[17:51:40] 
[17:51:40] Project: 2671 (Run 48, Clone 39, Gen 89)
[17:51:40] 
[17:51:40] Entering M.D.
[17:51:46] Using Gromacs checkpoints
[17:51:49] Resuming from checkpoint
[17:51:49] Verified work/wudata_02.log
[17:51:50] Verified work/wudata_02.trr
[17:51:50] Verified work/wudata_02.xtc
[17:51:50] Verified work/wudata_02.edr
[17:51:51] Completed 210010 out of 250000 steps  (84%)
[17:58:31] Completed 212500 out of 250000 steps  (85%)
[18:05:04] Completed 215000 out of 250000 steps  (86%)
[18:11:40] Completed 217500 out of 250000 steps  (87%)
[18:18:10] Completed 220000 out of 250000 steps  (88%)
[18:24:41] Completed 222500 out of 250000 steps  (89%)
[18:31:11] Completed 225000 out of 250000 steps  (90%)
[18:37:44] Completed 227500 out of 250000 steps  (91%)
[18:44:16] Completed 230000 out of 250000 steps  (92%)
[18:50:48] Completed 232500 out of 250000 steps  (93%)
[18:57:23] Completed 235000 out of 250000 steps  (94%)
[19:03:55] Completed 237500 out of 250000 steps  (95%)
[19:10:27] Completed 240000 out of 250000 steps  (96%)
[19:16:58] Completed 242500 out of 250000 steps  (97%)
[19:23:30] Completed 245000 out of 250000 steps  (98%)
[19:30:04] Completed 247500 out of 250000 steps  (99%)
[19:36:32] Completed 250000 out of 250000 steps  (100%)
[19:36:33] DynamicWrapper: Finished Work Unit: sleep=10000
[19:36:43] 
[19:36:43] Finished Work Unit:
[19:36:43] - Reading up to 21189024 from "work/wudata_02.trr": Read 21189024
[19:36:43] trr file hash check passed.
[19:36:43] - Reading up to 27683928 from "work/wudata_02.xtc": Read 27683928
[19:36:43] xtc file hash check passed.
[19:36:43] edr file hash check passed.
[19:36:43] logfile size: 183651
[19:36:43] Leaving Run
[19:36:45] - Writing 49201355 bytes of core data to disk...
[19:36:46]   ... Done.
[19:39:31] - Shutting down core
[19:39:31] 
[19:39:31] Folding@home Core Shutdown: FINISHED_UNIT
[19:40:05] CoreStatus = 64 (100)
[19:40:05] Unit 2 finished with 78 percent of time to deadline remaining.
[19:40:05] Updated performance fraction: 0.811962
[19:40:05] Sending work to server


[19:40:05] + Attempting to send results
[19:40:05] - Reading file work/wuresults_02.dat from core
[19:40:05]   (Read 49201355 bytes from disk)
[19:40:05] Connecting to http://171.67.108.24:8080/
[19:43:34] Posted data.
[19:43:34] Initial: 0000; - Uploaded at ~228 kB/s
[19:43:35] - Averaged speed for that direction ~222 kB/s
[19:43:35] + Results successfully sent
[19:43:35] Thank you for your contribution to Folding@Home.
[19:43:35] + Number of Units Completed: 2

[19:45:05] - Warning: Could not delete all work unit files (2): Core file absent
[19:45:05] Trying to send all finished work units
[19:45:05] + No unsent completed units remaining.
[19:45:05] + Closed connections
[19:45:05] + Paused after finishing unit
[19:45:05] Press Enter to continue, Ctrl-C to exit...
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project 2671 Hangs at Random

Post by tear »

If an application takes the whole system down it's not application's fault but the system (hardware/OS) itself -- sorry.

I'd suggest checking the hardware (memtest86+ and some cpu burning* app aren't a bad start) and OS (running FAH
in a tty/without X11; if there's a kernel panic you should see it).

*) I wouldn't know which one to recommend so I'll leave that to other folks


tear

EDIT: typo
One man's ceiling is another man's floor.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 2671 Hangs at Random

Post by bruce »

. . . some CPU burning app . . .
We normally recommend StressCPU2 because it uses a SSE instruction mix that is VERY similar to how GROMACS. Other CPU-burn software is good too, but the "best" test is one that mimics your normal applications very closely. [We often find people who have a "completely stable" machine (based on conventional software for testing overclocking stabilty) that suddenly becomes unstable when running FAH.]
HendricksSA
Posts: 339
Joined: Fri Jun 26, 2009 4:34 am

Re: Project 2671 Hangs at Random

Post by HendricksSA »

I've been checking things for a week now and have arrived where I started. I ran the StressCPU2, memtest86+, and a host of other diagnostics for hours and never recorded a single error. I'm convinced the hardware is sound as the only application that ever hung were Project 2671 runs. I've reinstalled Fedora 11 and the folding software and downloaded all new cores/clients. The first Project 2671 hung at 3 hours in. I restarted and the system ran for 3 days before it hung again with another Project 2671. Does anyone else have a problem with occasional hangs? Since I'm a pretty new to folding, perhaps this is normal reliability? A lot of very smart people can be found here ... does anyone else have some other suggestion I could try? Since I don't like the machine running for hours while it does nothing, I'm ready to boot into Windows and try that version of SMP. Based on what I read there is no performance loss and I can resume running my GPU client. This will be the very first time I can remember Windows being more stable than Linux!

Tear - I can boot the machine into text mode but how do I start my wireless network (enter the password, etc.)?
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: Project 2671 Hangs at Random

Post by P5-133XL »

Typically, if a WU fails in different spots then it is HW and if it is failing in the same spot then it is the WU. I can't remember a recent time someone complained that SMP WU bringing down a machine that was not HW related. Those are my general rules as to problems.

Now to your specific problem. You've done the standard stress testing, which is very good. However, if the problem can wait 3 days before showing up then hours may not be long enough -- i.e. stress testing the CPU is really not that much more stressful than folding. Memtest86+ is a very good memory tester but not perfect. I have run into problems with RAM that Memtest86+ did not detect and that only replacing the sticks with a different brand solved, or required a subtle change in RAM timing to solve.

Some other things. You have not specified if you are over clocking: If so then just plain stop. If that fixes it, then you know something. Don't OC, till your chronic problems have been fixed and then be cautious. Run a motherboard monitor type program that can write to a log and alarm. Then monitor temps and voltages. Do note, that some motherboards tend to give rather inaccurate readings so a failure here is not inherently something wrong but correct readings can generally be trusted. Next, try underclocking both the CPU and RAM (separately) which can sometimes fix the problem and thereby indicate where the problem is. Also, if you have multiple sticks of RAM, drop it down to a single stick because RAM is much more stable with a single stick.

One of the things you did not do is supply the FAHLog file. Are the hangs the on same WU? If so, perhaps we can move you on to a different WU on the theory that your problems are actually specific to the WU.

Just as a side note, you are incorrect that there is no performance loss with Linux. Linux allows for A2 core WU's which operate 2x as fast as A1 or A0 cores. There are no A2 core WU's for Windows so you will lose productivity running Windows over Linux. However, running the GPU client can easily make up or even exceed the productivity loss.
Image
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project 2671 Hangs at Random

Post by tear »

Hey Hendricks,
HendricksSA wrote:Tear - I can boot the machine into text mode but how do I start my wireless network (enter the password, etc.)?
I don't think we need to go that far... yet (NetworkManager apparently doesn't do much without its X11 counterpart).

I'd start with running client from within virtual terminal (a "text console"). Even when your Xserver is running you can
hit Ctrl+Alt+F1 to switch to a VT. Then just log in, change to client directory, launch it, leave it running and see what gives.

On second thought -- you might want to run setterm -blank 0 -powersave off -powerdown 0 after you log in
(in a VT, that is). This is to make sure Linux doesn't blank/powerdown the display (we want to see all the BadThings(TM))

Other notes:
-- To switch back to X11 (when in text mode) try Alt+F6 (or F7, or F8); typically Xserver runs off either of these virtual terminals
-- When you experience the "hang" initial steps would be to:
  a) determine if console is responsive -- just hit <Enter>
  b) determine if it's possible to switch to other virtual terminals (Alt+F2, Alt+F3, etc.) and if it is
  c) determine if it's possible to log in using any other virtual terminal

Let know if you have any questions or doubts.

tear

EDIT: grammar
Last edited by tear on Wed Sep 09, 2009 3:10 pm, edited 1 time in total.
One man's ceiling is another man's floor.
Image
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Project 2671 Hangs at Random

Post by tear »

Hendricks,


One more question -- do you use any out-of-kernel modules? (nvidia, fglrx, wifi, ndiswrapper or alike)

tear
One man's ceiling is another man's floor.
Image
HendricksSA
Posts: 339
Joined: Fri Jun 26, 2009 4:34 am

Re: Project 2671 Hangs at Random

Post by HendricksSA »

P5 - I'm running the SPD generated settings on the 4x2gb Kingston memory. I can try some different memory arrangements since I've not seen a WU need the 8gb total. This machine is stock with nothing overclocked. The hangs occurred with different 2671 WUs at random points. It may be coincidence that the hangs all happened with Project 2671 because that is mostly what I get on assignment.

Tear - I think I'll give the VT a try first to see if I can get any error indication and determine the computer state at the hang. As far as I know, there are no out-of-kernel modules loaded. The system currently has an ATI 4550 card and I don't think fglrx was loaded on install.

Everyone else - I'll work on the suggestions above but if you have any insight into linux hangs I would love to hear from you. The only symptom I can report is the computer freezes completely leaving me no keyboard or mouse. I've left the screen on but never seen any error and I've not found anything in logs except for Gnome complaining about trying to focus a non-focusing window. I assume enough folks are running Gnome to eliminate that as a problem and I'll report back after I hang a couple of times. Thanks for your help. The support here is terrific!
Post Reply