Page 1 of 1

Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Fri Apr 17, 2009 7:58 am
by alpha754293
console:

Code: Select all

  11507  11508   90.7    0.1010 1218.0474      0.1010
  11511  11513   90.0    0.1090 6237.3096      0.1090
  10432  10433   90.0    0.6249   5.4285      0.1090
  10737  10738   96.1    0.1010  30.4538      0.1010
  10739  10740   91.5    0.1090 302.0507      0.1090
  10741  10742   90.0    0.1090   0.1145      0.1090
  10747  10748  104.1    0.1090  68.0290      0.1090
  11468  11469   90.0    0.0960 462873657344.0000      0.0960
  10718  10719   90.0    0.1090 378178.4688      0.1090
  10720  10721   90.0    0.1090 17374290.0000      0.1090
  10731  10732   940.0000      0.1090
  11183  11184   90.0    0.1090 7616069.0000      0.1090
  11183  11185   90.0    0.1090 7696306.5000      0.1090
  11183  11186   90.0    0.1090 105853104.0000      0.1090
  11531  11532   90.0    0.1090 4661726720.0000      0.1090
  11540  11541   90.0    0.1090 8426486.0000      0.1090
  11540  11542   90.0    0.1090 2739091.0000      0.1090
  11540  11543   90.0    0.1090 2589888.7500      0.1090
  11205  11206   90.0    0.1010 158923.4375      0.1010
  11209  11210   90.0    0.1090 17115131904.0000      0.1090
  11221  11222   90.1    0.1080 17645.2617      0.1080
  10428  10430   90.0    0.1090 414103776.0000      0.1090
  10428  10431   90.0    0.1090 263261984.0000      0.1090
  11191  11192   90.0    0.1090   0.2592      0.1090
  11193  11194   90.0    0.1090 6155499.0000      0.1090
  11195  11196   90.0    0.1090 3899.2480      0.1090
  11195  11197   90.0    0.1090 242316.6875      0.1090
  11195  11198   90.3    0.1090 3736.0793      0.1090
  11199  11200   90.0    0.1090 13970.0    0.1080   1.4129      0.1080
  11476  11477   90.0    0.1090   0.8880      0.1090
  10916  10917   90.0    6.1665 18290.9004      0.1090
  10916  10918   90.3    0.7193 3040.1311      0.1090
  10924  10925   90.0    0.1090   0.2046      0.1090
  10889  10891   90.0    0.1090   0.2621      0.1090
  10928  10929   90.0    0.1090   1.5060      0.1090
  10928  10930   90.0    0.1090   1.1622      0.1090
  10928  10931  135.3    0.1090  12.7603      0.1090
  17923  17924   90.0    0.1090 8001254588416.0000      0.1090
  17923  17925   90.0    0.1090 8673276461056.0000      0.1090
  17926  17928   90.0    0.1090 29901.8418      0.1090
   9557   9558   90.0    0.1090 134667810373632.0000      0.1090
   9562   9563   90.0    0.0960 20130426.0000      0.0960
   9533   9534   90.0    0.1090 6937528320.0000      0.1090
   9535   9536   90.0    6.1850  11.0297      0.1090
   9535   9537   90.0    0.6439  10.3555      0.1090
   9566   9567   90.0    0.1010   4.7135      0.1010
   9568   9569   92.3    0.1090 460.6151938304.0000      0.1090
  11199  11201   90.0    0.1090 3061614336.0000      0.1090
  11199  11202   90.0    0.1090 3221064704.0000      0.1090
   9088   9089   90.0    0.1090 30829046.0000      0.1090
   6109   6111   90.0    0.1090 308462.3125      0.1090
   6081   6082  114.8    0.1090  29.4639      0.1090
   6081   6083   90.0    0.1090  44.8540      0.1090
   6091   6092   90.0    0.1010 5483485696.0000      0.1010
   9503   9504   90.0    0.1010   0.1231      0.1010
   9494   9495   90.0    0.1090   1.0212      0.1090
   9497   9499   90.0    0.1090   0.1328      0.1090
   9535   9536   90.0    6.1850  11.0297      0.1090
   9535   9537   90.0    0.6439  10.3555      0.1090
   9611   9612   90.0    0.1090 9272.8311      0.1090
   9613   9614   90.0    0.1090 913488.6250      0.1090
   9912   9913   90.0    0.1010 36000.8086      0.1010
   9914   9915   90.1    0.1090 5919.3345      0.1090
   9916   9917   90.0    0.1090   3.1461      0.1090
   9918   9920   90.0    0.1090 142456960.0000      0.1090
   8      0.1090
   9965   9966   90.0    1.6912 57287892795392.0000      0.1090
   9965   9967   90.0    5.3603 5418242801664.0000      0.1090
   9570   9571   90.0    0.1090 85040.4219      0.1090
   9570   9572   90.0    0.1090 133734.0156      0.1090
   9573   9574   90.0    0.0960 100542.0781      0.0960
   9548   9549   90.0    0.1080 313591936.0000      0.1080
   9586   9587   90.0    0.1080 2043.7418      0.1080
   9961   9963   90.0    0.1056   2.4203      0.1090
   9961   9964   90.0    0.1055   5.7834      0.1090
  10126  10127   90.0    0.1090 11263.8652      0.1090
  10132  10133   90.0    0.1090 784976314368.0000      0.1090
  11521  11522   90.0    0.1080 62173.8008      0.1080
  11523  11524   90.0    0.1080 89890.5703      0.1080
  10110  10111   90.0    0.1090 9946460.0000      0.1090
  10114  10115   90.0    0.1090 28189482.0000      0.1090
  10114  10116   90.0    0.1090 39309264.0000      0.1090
  10114  10117   90.0    0.1090 1939676928.0000      0.1090
  10122  10123   90.0    0.1010 111759922   9923   90.0    0.1090 1887121.7500
    0.1090
   9922   9924   90.0    0.1090 16118791.0000      0.1090
   9922   9925   90.0    0.1090 1870720.1250      0.1090
   9961   9963   90.0    0.1056   2.4203      0.1090
   9961   9964   90.0    0.1055   5.7834      0.1090
   9965   9966   90.0    1.6912 57287892795392.0000      0.1090
   9965   9967   90.0    5.3603 5418242801664.0000      0.1090
   9900   9901   90.0    0.1090 327638.3438      0.1090
   9906   9907   90.0    0.1090 33412993024.0000      0.1090
   9906   9908   90.0    0.1090 420976416.0000      0.1090
   9906   9909   90.0    0.1090 438786304.0000      0.1090
  10395  10396   90.0    0.1090 282811904.0000      0.1090
  10395  10397   90.0    0.1090 7810912256.0000      0.1090
  10415  10416   90.0    0.1010 13966586880.0000      0.1010
  11515  11516   90.0    1.1019 851388334080.0000      0.1080
  11517  11518   90.0    0.1080 2450566144.0000      0.1080
  11533  11534   90.0    0.1090 21386052.0000      0.1090
  11533  11535   90.0    0.4592256.0000      0.1010
  10135  10136   90.0    0.1090 1572972.1250      0.1090
  10135  10137   90.0    0.1090 31245062.0000      0.1090
  11428  11429   90.0    0.1090  30.0699      0.1090
  11430  11432   90.0    0.1076 323900.0938      0.1090
  11446  11447   90.0    0.1010   0.4010      0.1010
  10767  10768   90.0    0.1010  14.8766      0.1010
  10769  10770   90.0    0.1090 1430351.6250      0.1090
  10773  10774   90.0    0.1099 235.5556      0.1090
  10773  10775   90.0    0.1092 133.5641      0.1090
  10777  10778   90.1    0.1109 5538.2310      0.1090
  10777  10780   90.1    0.1123 5112.7407      0.1090
  10866  10867   90.1    0.1090 8504.9092      0.1090
  10866  10868   90.0    0.1090 86305.6797      0.1090
  10869  10870   90.0    0.1090 94151060226048.0000      0.1090
  10874  10875   90.6    0.0980 713.7846      0.1010
  18599  18601 [22:29:52]
[22:29:52] Folding@home Core Shutdown: INTERRUPTED
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[cli_1]: aborting job:
Fatal error in MPI_Sendrecv: Error message texts are not available
1090 29365602.0000      0.1090
  11533  11536   90.0    0.1090 27522300.0000      0.1090
  11225  11226   92.3    0.1010  83.6208      0.1010
  11227  11228  119.6    0.1090  37.4717      0.1090
  11245  11246   97.6    0.1010  65.4615      0.1010
  10399  10400   90.0    0.1080   7.4338      0.1080
  10401  10402   90.0    0.1010   3.9936      0.1010
  10912  10913   90.0    0.1090 10882544640.0000      0.1090
  10912  10915   90.0    0.1090 8097246720.0000      0.1090
  11229  11231   90.0    0.1090 10088.7090      0.1090
  11241  11242   90.0    0.1080 425.1463      0.1080
  10916  10917   90.0    6.1665 18290.9004      0.1090
  10916  10918   90.3    0.7193 3040.1311      0.1090
  11235  11236   90.0    0.1080   0.6362      0.1080
   6146   6147   90.0    0.1090      inf      0.1090
   6146   6148   90.0    0.1090      inf      0.1090
  16856  16857   90.0    0.1090 13890.7510      0.1090
  16856  16858   90.0    0.1090 740824.5625      0.1090
  16856  16859   90.0    0.1090 15129.7598      0.1090
  16860  16861   90.0    0.5767   2.6036      0.1090
  16860  16863   90.0    0.5611   3.3388      0.1090
   9511   9512   90.0    0.1090   0.2825      0.1090
   9522   9523   90.0    0.6179  71.8703      0.1090
   9527   9528   90.0    0.0960 24089.6934      0.0960
  11570  11571   90.0    0.1010 11387090944.0000      0.1010
  11574  11576   90.0    0.1090 176664016.0000      0.1090
  11577  11578   90.0    0.1090 20964362.0000      0.1090
  11577  11579   90.0    0.1090 1154052.1250      0.1090
   9458   9459   90.0    0.1090 3697016320.0000      0.1090
   9460   9461   90.0    0.1090 323697376.0000      0.1090
   9544   9545   90.0    0.1080 366.7988      0.1080
   9546   9547   90.0    0.1228 210.7410      0.1080
   9448   9449   90.0    0.1090 51478572040192.0000      0.1090
  10362  10363   90.0    0.1090 49031.4414      0.1090
  10362  10364   90.0    0.1090 862333.1250      0.1090
  11563  11564   90.0    0.1090 84529352.0000      0.1090
  11563  11565   90.0    0.1090 219270720.0000      0.1090
  11566  11567   90.0    0.1090 170128.4062      0.1090
  11436  11437   92.6    0.1080 170.6472      0.1080
  11438  11439   90.0    0.1080 919.2798      0.1080
  11442  11443   90.0    1.8197 6133788442624.0000      0.1080
  11557  11558   90.0    0.1090 86302793728.0000      0.1090
  11557  11559   90.0    0.1090 228376444928.0000      0.1090
  11560  11561   90.2    0.1090 4121.8032      0.1090
  11560  11562   90.0    0.1090 314.4964      0.1090
  10372  10373   90.0    0.1010   5.7342      0.1010
  11271  11272   90.0    0.1090 1027834511360.0000      0.1090
  10404  10405   90.1    0.1080 9998.7832      0.1080
  10406  10407   90.1    0.1080 10392.3613      0.1080
  10408  10409   90.0    0.1080 440681.7500      0.1080
  10410  10411   90.0    0.1080 2929.8613      0.1080
  10773  10774   90.0    0.1099 235.5556      0.1090
  10773  10775   90.0    0.1092 133.5641      0.1090
  10777  10778   90.1    0.1109 5538.2310      0.1090
  10777  10780   90.1    0.1123 5112.7407      0.1090
  11403  11404   90.0    0.1090   0.5451      0.1090
  10833  10835   90.0    0.1090 11743.8682      0.1090
  10836  10837   90.0    0.1090 1643869765632.0000      0.1090
  10838  10839   90.0    0.1090 591859.6875      0.1090
  10842  10843   90.0    0.1090 222.3573      0.1090
  10842  10844   90.0    0.1090 2001.6852      0.1090
  21333  21334   90.0    0.1010 779.8967      0.1010
  21361  21362   90.0    0.1010 186.1879      0.1010
  21380  21381   90.0    0.1010 413.2797      0.1010
  21382  21383   90.0    0.1090 1861380736.0000      0.1090
  21387  21388   93.6    0.1090  98.1719      0.1090
  21387  21389   90.0    0.1090   0.2395      0.1090
  16834  16836   90.0    0.1090   0.8005      0.1090
  16903  16904   90.0    0.1090   0.2729      0.1090
  16868  16869   90.0    0.1090   0.1427      0.1090
  16875  16876   90.1    0.2562 6929438.0000      0.1090
  16875  16877   90.0    0.2389 16139660.0000      0.1090
  16875  16878   90.0    3.6470 80175960.0000      0.1090
  16900  16901   90.0    0.1090 132.0579      0.1090
  16926  16928   90.0    6.8607      inf      0.1090
  16926  16929   90.0    0.4548      inf      0.1090
  16915  16917   90.0    0.1010 2213165036669501440.0000      0.1010
  19375  19376   90.0    0.1090 6521.2188      0.1090
  19377  19378   90.9    0.1090 1765.4371      0.1090
  16898  16899   90.0    0.1090  39.7362      0.1090
  16922  16923   90.0    0.1090 138264657920.0000      0.1090
  16926  16927   90.0    7.3677      inf      0.1090
  16939  16940   90.0    0.1010 143587200.0000      0.1010
  19344  19345   90.0    1.1613 3451187625984.0000      0.1010
  19373  19374   90.0    0.1010 255087263744.0000      0.1010
   9474   9476  100.3    0.1090  40.0976      0.1090
   9477   9478   90.0    0.1090 523090656.0000      0.1090
   9425   9426   90.0    0.1090 38469388.0000      0.1090
   9425   9427   90.0    0.1090 440345440.0000      0.1090
   9437   9438   90.0    0.1090   0.3569      0.1090
   9445   9446   90.0    0.1090 12148.5596      0.1090
   9445   9447   90.0    0.1090 83781.6250      0.1090
  10213  10214   90.0    0.1080 22220398.0000      0.1080
  10325  10326   90.0    0.1090   0.1797      0.1090
  10331  10333   90.0    0.1090 1315228.7500      0.1090
  11393  11394   90.0    0.1010 1063676352.0000      0.1010
  11399  11400  105.4    0.1090  24.6318      0.1090
  10366  10367   90.1    0.1090 2670.1626      0.1090
  10366  10368   90.0    0.1090 26159.8047      0.1090
  10366  10369   90.4    0.1090 2409.2385      0.1090
  11306  11307   90.0    0.1090   7.2163      0.1090
  11321  11322   90.0    0.1010 28121464045568.0000      0.1010
  10796  10797   90.0    0.1090 13387873.0000      0.1090
  10796  10798   90.0    0.1090 708735.2500      0.1090
  10796  10799   90.0    0.1090 761121.3750      0.1090
  11311  11312   90.0    0.1090 556509.4375      0.1090
  11315  11318   90.0    0.1090 12710876.0000      0.1090
  11289  11290   90.0    0.1090 31915.2090      0.1090
  11289  11291   90.0    0.1090 25513.1953      0.1090
  11289  11292   90.0    0.1090 32479.4082      0.1090
  11293  11294   90.0    0.1090 12752091136.0000      0.1090
  11293  11295   90.0    0.1090 8772718592.0000      0.1090
  11331  11332   90.0    0.1090 259137.7812      0.1090
  10804  10805   90.0    1.2262 43453996072960.0000      0.1010
  21351  21353   93.2    0.1090 146.2578      0.1090
  21356  21357   90.0    0.1010 103628021760.0000      0.1010
  21335  21336   90.0    0.1090  18.3684      0.1090
  21349  21350   90.0    0.1090   0.2558      0.1090
  16123  16124   90.1    0.4215   0.1866      0.1090
  19423  19424   90.0    0.1090   0.4744      0.1090
  19400  19401   90.0    0.1090 5104192512.0000      0.1090
  16870  16871   90.0    0.1090   4.6788      0.1090
  16413  16414   90.0    0.1080   0.4302      0.1080
  16953  16954   90.0    0.1010      inf      0.1010
  10252  10253   90.0    0.1090   0.1349      0.1090
  10234  10235  117.2    0.1090  10.2347      0.1090
  10236  10237  114.7    0.1090  10.4483      0.1090
  10246  10247   90.0    0.1010 174792.2188      0.1010
  10223  10225   90.0    0.1090 116356384.0000      0.1090
  10232  10233   90.0    0.1010 66916.0000      0.1010
  11387  11388   94.2    5.5828 162.4945      0.1090
  10241  10243   90.0    0.1010   3.3205      0.1010
  11353  11354   90.0    0.1090   0.1128      0.1090
  11373  11374   90.0    0.1090 539.9991      0.1090
  11383  11384   90.0    0.1010 939639552.0000      0.1010
  11325  11327   90.0    0.1090 17274683523072.0000      0.1090
  11328  11329   90.0    0.1090  19.6159      0.1090
  11328  11330   90.0    0.1090   3.8571      0.1090
  10808  10810   90.0    0.1090 390407.2500      0.1090
  16108  16109   90.0    0.1090   0.6361      0.1090
  16117  16118   90.0    0.1010   0.2649      0.1010
  16119  16120   90.0    6.4284      inf      0.1090
  16485  16486   90.0    0.1090      inf      0.1090
  16485  16487   90.0    0.1090      inf      0.1090
  16485  16488   90.0    0.1090      inf      0.1090
  16011  16013   90.0    0.6472   1.8559      0.1090
  16011  16014   90.0    5.8190   8.1873      0.1090
  16437  16438   90.0    0.1090 114509.9141      0.1090
  16372  16373   90.0    4.9838      inf      0.1090
  16372  16374   90.0    0.9407      inf      0.1090
  16375  16376   90.0    0.5886 157.9605      0.1090
  16375  16377   90.0    0.6026 210.9202      0.1090
  16378  16379   90.0    0.4482  46.7614      0.1090
  16378  16380   90.0    0.4103  64.4903      0.1090
  15966  15967   90.0    0.1010 252.8868      0.1010
  15214  15215   90.0    0.1092 492287840.0000      0.1092
  16559  16561   90.0    0.1090   3.2826      0.1090
  15970  15972   90.0    0.1090   0.3832      0.1090
  15973  15975   90.0    0.1090  49.5639      0.1090
  15982  15983   90.0    0.1010   5.4620      0.1010
  15982  15984   90.0    0.1010   6.5533      0.1010
  15982  15985   90.0    0.1010 258.9774      0.1010
  16572  16573   90.0    0.7999 56818.2070      0.1090
  16572  16575   90.0    0.7275 64815.4688      0.1090

t = 4696.902 ps: Water molecule starting at atom 131728 can not be settled.
Check for bad contacts and/or reduce the timestep.
[0]0:Return code = 102
[0]1:Return code = 1
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[22:29:57] CoreStatus = 66 (102)
[22:29:57] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[22:29:57] Killing all core threads

Folding@Home Client Shutdown.
FAHlog

Code: Select all

[16:52:13] 
[16:52:13] *------------------------------*
[16:52:13] Folding@Home Gromacs SMP Core
[16:52:13] Version 2.06 (Tue Mar 31 08:29:45 PDT 2009)
[16:52:13] 
[16:52:13] Preparing to commence simulation
[16:52:13] - Ensuring status. Please wait.
[16:52:13] Files status OK
[16:52:14] - Expanded 4836370 -> 24038369 (decompressed 497.0 percent)
[16:52:14] Called DecompressByteArray: compressed_data_size=4836370 data_size=24038369, decompressed_data_size=24038369 diff=0
[16:52:14] - Digital signature verified
[16:52:14] 
[16:52:14] Project: 2671 (Run 11, Clone 32, Gen 9)
[16:52:14] 
[16:52:15] Assembly optimizations on if available.
[16:52:15] Entering M.D.
[16:52:24] (Run 11, Clone 32, Gen 9)
[16:52:24] 
[16:52:24] Entering M.D.
[17:01:08] pleted 2500 out of 250000 steps  (1%)
[17:09:42] Completed 5000 out of 250000 steps  (2%)
[17:18:16] Completed 7500 out of 250000 steps  (3%)
[17:26:49] Completed 10000 out of 250000 steps  (4%)
[17:35:23] Completed 12500 out of 250000 steps  (5%)
[17:43:57] Completed 15000 out of 250000 steps  (6%)
[17:52:31] Completed 17500 out of 250000 steps  (7%)
[18:01:04] Completed 20000 out of 250000 steps  (8%)
[18:09:38] Completed 22500 out of 250000 steps  (9%)
[18:18:12] Completed 25000 out of 250000 steps  (10%)
[18:26:35] - Autosending finished units... [April 16 18:26:35 UTC]
[18:26:35] Trying to send all finished work units
[18:26:35] + No unsent completed units remaining.
[18:26:35] - Autosend completed
[18:26:46] Completed 27500 out of 250000 steps  (11%)
[18:35:19] Completed 30000 out of 250000 steps  (12%)
[18:43:53] Completed 32500 out of 250000 steps  (13%)
[18:52:26] Completed 35000 out of 250000 steps  (14%)
[19:01:00] Completed 37500 out of 250000 steps  (15%)
[19:09:34] Completed 40000 out of 250000 steps  (16%)
[19:18:07] Completed 42500 out of 250000 steps  (17%)
[19:26:42] Completed 45000 out of 250000 steps  (18%)
[19:35:15] Completed 47500 out of 250000 steps  (19%)
[19:43:49] Completed 50000 out of 250000 steps  (20%)
[19:52:23] Completed 52500 out of 250000 steps  (21%)
[20:00:58] Completed 55000 out of 250000 steps  (22%)
[20:09:32] Completed 57500 out of 250000 steps  (23%)
[20:18:06] Completed 60000 out of 250000 steps  (24%)
[20:26:40] Completed 62500 out of 250000 steps  (25%)
[20:35:13] Completed 65000 out of 250000 steps  (26%)
[20:43:47] Completed 67500 out of 250000 steps  (27%)
[20:52:21] Completed 70000 out of 250000 steps  (28%)
[21:00:55] Completed 72500 out of 250000 steps  (29%)
[21:09:29] Completed 75000 out of 250000 steps  (30%)
[21:18:04] Completed 77500 out of 250000 steps  (31%)
[21:26:38] Completed 80000 out of 250000 steps  (32%)
[21:35:12] Completed 82500 out of 250000 steps  (33%)
[21:43:46] Completed 85000 out of 250000 steps  (34%)
[21:52:20] Completed 87500 out of 250000 steps  (35%)
[22:00:54] Completed 90000 out of 250000 steps  (36%)
[22:09:28] Completed 92500 out of 250000 steps  (37%)
[22:18:02] Completed 95000 out of 250000 steps  (38%)
[22:26:37] Completed 97500 out of 250000 steps  (39%)
[22:29:52] 
[22:29:52] Folding@home Core Shutdown: INTERRUPTED
[22:29:57] CoreStatus = 66 (102)
[22:29:57] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[22:29:57] Killing all core threads

Folding@Home Client Shutdown.
restarting...

Re: Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Fri Apr 17, 2009 12:16 pm
by toTOW
There's no report for this WU in the DB yet.

I don't know what's going on with your machines, but there is definitely something wrong ... you shouldn't see so many errors ... :?

Re: Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Fri Apr 17, 2009 12:19 pm
by alpha754293
I have no idea what the current statistics are because I don't really monitor the system very often. Usually, I don't see the problem until I refresh the FAH stats page at ExtremeOverclocking, and then I would check FahMon, and then I'd check the consoles.

When you run the system headless, you don't really have a sense of what it is or isn't doing.

Ambient room temp is currently 292.7 K, so that shouldn't be the problem.

I've been able to run some various versions of Prime and they all run fine. So I don't know...

*edit*

I'm trying to remember if I had this much problems with the 2.01 core. 2.04 seemed to have more problems. 2.06 I'm seeing the WUs freeze a few times already and I would have to sent a SIGKILL(9) to it since the normal kill or pkill doesn't work.

Re: Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Fri Apr 17, 2009 12:31 pm
by toTOW
Prime is definitively the worst stability tester for FAH ... it's not pushing the hardware hard enough.

I think your systems are mainly servers ... am I correct ? (that's why I think it's not a stability issue ... but that might be a bad memory stick ... it might happen even in professional parts ...)

Re: Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Fri Apr 17, 2009 12:34 pm
by alpha754293
toTOW wrote:Prime is definitively the worst stability tester for FAH ... it's not pushing the hardware hard enough.

I think your systems are mainly servers ... am I correct ? (that's why I think it's not a stability issue ... but that might be a bad memory stick ... it might happen even in professional parts ...)
Well...I have beaten it with 120 hours and 125 hours of CFD and FEA respectively and it lived through that with no errors.

And yes, it's a server.

But it also passed memtest86 even with ECC disabled.

Re: Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Fri Apr 17, 2009 12:41 pm
by toTOW
That's definitely a strange behavior ... at least we're pretty sure it isn't hardware related.

How is your network set up ? Are you sure that you don't have some random IP resets ?

Re: Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Fri Apr 17, 2009 12:54 pm
by alpha754293
toTOW wrote:That's definitely a strange behavior ... at least we're pretty sure it isn't hardware related.

How is your network set up ? Are you sure that you don't have some random IP resets ?
system -> 16-port netgear GS116 GbE switch -> I think it's a netgear WGR614 wireless router -> cable modem.

Host has lo and eth0. IPv6 I think is disabled entirely.

As far as I know, no. I can continuously ping somewhere, but it shouldn't matter because all of the data I/O should be going through the loopback if I understand it correctly.

I also don't know if the IP stack would report a wire being disconnected (at least to console), and I don't think that F@H has a way of reporting network or connectivity issues.

Re: Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Fri Apr 17, 2009 1:03 pm
by toTOW
I don't know how linux handles this, but under windows, disconnecting a wire (or wireless connection), or changing the IP (DHCP renewal for instance) on a network interface also reset the loopback interface :(

Did you notice a regular period on the issue you're having (which might be related to an IP renewal or something similar) ? Did you try to assign static IP to your machines (in the netgear wireless router configuration) ?

Re: Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Fri Apr 17, 2009 1:10 pm
by alpha754293
toTOW wrote:I don't know how linux handles this, but under windows, disconnecting a wire (or wireless connection), or changing the IP (DHCP renewal for instance) on a network interface also reset the loopback interface :(

Did you notice a regular period on the issue you're having (which might be related to an IP renewal or something similar) ? Did you try to assign static IP to your machines (in the netgear wireless router configuration) ?
Not as far as I know.

The DHCP lease period is 7 days I think. But even then, the system just picks up the same IP anyways (which I've come to accept it as being "pseudo-static" because the stupid netgear router doesn't have a way to release it from its DHCP server, even if a release command is issued).

The server never leaves the building and the only time that the IPs gets reassigned is if there's a power failure (for the whole house, since the router is on a different floor than the server).

Apparently seg faults are related to memory, memory addressing, and memory read/write issues. MY current working theory is that the F@H client isn't actually fully truly 64-bit because as I recall, only parts of it are. Therefore; with my system running SLES10 SP2 x64, and 16 GB of RAM, it might not be properly addressing all of the memory available to the system, even though the client is configured to use 16003 MB. (max available according to memstat).

http://en.wikipedia.org/wiki/Segmentation_fault

Re: Project: 2671 (Run 11, Clone 32, Gen 9) seg fault

Posted: Sun Apr 19, 2009 5:05 pm
by susato
In the end both alpha and another donor completed this WU for full credit.