Project: 2669 (Run 5, Clone 196, Gen 86) seg fault

Moderators: Site Moderators, FAHC Science Team

Post Reply
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Project: 2669 (Run 5, Clone 196, Gen 86) seg fault

Post by alpha754293 »

console:

Code: Select all

[15:43:35]
[15:43:35] *------------------------------*
[15:43:35] Folding@Home Gromacs SMP Core
[15:43:35] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[15:43:35]
[15:43:35] Preparing to commence simulation
[15:43:35] - Ensuring status. Please wait.
[15:43:44] - Looking at optimizations...
[15:43:44] - Working with standard loops on this execution.
[15:43:44] - Files status OK
[15:43:45] - Expanded 4836062 -> 23981533 (decompressed 495.8 percent)
[15:43:45] Called DecompressByteArray: compressed_data_size=4836062 data_size=23
981533, decompressed_data_size=23981533 diff=0
[15:43:45] - Digital signature verified
[15:43:45]
[15:43:45] Project: 2669 (Run 5, Clone 196, Gen 86)
[15:43:45]
[15:43:46] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=computenode
NNODES=4, MYRANK=2, HOSTNAME=computenode
NNODES=4, MYRANK=1, HOSTNAME=computenode
NNODES=4, MYRANK=3, HOSTNAME=computenode
NODEID=0 argc=20
NODEID=1 argc=20
NODEID=2 argc=20
NODEID=3 argc=20
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                 :-)  VERSION 4.0.99_development_20090307  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.


                                :-)  mdrun  (-:

Reading file work/wudata_03.tpr, VERSION 3.3.99_development_20070618 (single pre
cision)
Note: tpx file_version 48, software version 64

NOTE: The tpr file used for this simulation is in an old format, for less memory
 usage and possibly more performance create a new tpr file with an up to date ve
rsion of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22878 system'
21750000 steps,  43500.0 ps (continuing from step 21500000,  43000.0 ps).
[15:43:55] Completed 0 out of 250000 steps  (0%)
[15:52:54] Completed 2500 out of 250000 steps  (1%)
[16:01:54] Completed 5000 out of 250000 steps  (2%)
[16:10:55] Completed 7500 out of 250000 steps  (3%)
[16:19:55] Completed 10000 out of 250000 steps  (4%)
[16:28:55] Completed 12500 out of 250000 steps  (5%)
[16:37:53] Completed 15000 out of 250000 steps  (6%)
[16:46:51] Completed 17500 out of 250000 steps  (7%)
[16:55:48] Completed 20000 out of 250000 steps  (8%)
[17:04:47] Completed 22500 out of 250000 steps  (9%)
[17:13:45] Completed 25000 out of 250000 steps  (10%)
[17:22:43] Completed 27500 out of 250000 steps  (11%)
[17:31:38] Completed 30000 out of 250000 steps  (12%)
[17:40:33] Completed 32500 out of 250000 steps  (13%)
[17:49:30] Completed 35000 out of 250000 steps  (14%)
[17:58:31] Completed 37500 out of 250000 steps  (15%)
[18:07:29] Completed 40000 out of 250000 steps  (16%)
[18:16:25] Completed 42500 out of 250000 steps  (17%)
[18:25:19] Completed 45000 out of 250000 steps  (18%)
[18:34:08] Completed 47500 out of 250000 steps  (19%)
[18:42:58] Completed 50000 out of 250000 steps  (20%)
[18:51:48] Completed 52500 out of 250000 steps  (21%)
[19:00:41] Completed 55000 out of 250000 steps  (22%)
[19:09:33] Completed 57500 out of 250000 steps  (23%)
[19:18:26] Completed 60000 out of 250000 steps  (24%)
[19:25:51] - Autosending finished units... [April 22 19:25:51 UTC]
[19:25:51] Trying to send all finished work units
[19:25:51] + No unsent completed units remaining.
[19:25:51] - Autosend completed
[19:27:19] Completed 62500 out of 250000 steps  (25%)
[19:36:12] Completed 65000 out of 250000 steps  (26%)
[19:45:04] Completed 67500 out of 250000 steps  (27%)
[19:53:56] Completed 70000 out of 250000 steps  (28%)
[20:02:50] Completed 72500 out of 250000 steps  (29%)
[20:11:44] Completed 75000 out of 250000 steps  (30%)
[20:20:37] Completed 77500 out of 250000 steps  (31%)
[20:29:32] Completed 80000 out of 250000 steps  (32%)
[20:38:25] Completed 82500 out of 250000 steps  (33%)
[20:47:17] Completed 85000 out of 250000 steps  (34%)
[20:56:10] Completed 87500 out of 250000 steps  (35%)
[21:05:04] Completed 90000 out of 250000 steps  (36%)
[21:14:00] Completed 92500 out of 250000 steps  (37%)
[21:22:55] Completed 95000 out of 250000 steps  (38%)
[21:31:51] Completed 97500 out of 250000 steps  (39%)
[21:40:47] Completed 100000 out of 250000 steps  (40%)
[21:49:43] Completed 102500 out of 250000 steps  (41%)
[21:58:38] Completed 105000 out of 250000 steps  (42%)
[22:07:34] Completed 107500 out of 250000 steps  (43%)
[22:16:29] Completed 110000 out of 250000 steps  (44%)
[22:25:25] Completed 112500 out of 250000 steps  (45%)
[22:34:20] Completed 115000 out of 250000 steps  (46%)
[22:43:16] Completed 117500 out of 250000 steps  (47%)
[22:52:12] Completed 120000 out of 250000 steps  (48%)
[23:01:08] Completed 122500 out of 250000 steps  (49%)
[23:10:04] Completed 125000 out of 250000 steps  (50%)
[23:19:01] Completed 127500 out of 250000 steps  (51%)
[23:27:56] Completed 130000 out of 250000 steps  (52%)
[23:36:52] Completed 132500 out of 250000 steps  (53%)
[23:45:48] Completed 135000 out of 250000 steps  (54%)
[23:49:56]
[23:49:56] Folding@home Core Shutdown: INTERRUPTED
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 102) - process 0
[cli_2]: aborting job:
Fatal error in MPI_Sendrecv: Error message texts are not available
[0]0:Return code = 102
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 1
[0]3:Return code = 0, signaled with Segmentation fault
[23:50:00] CoreStatus = 66 (102)
[23:50:00] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[23:50:00] Killing all core threads

Folding@Home Client Shutdown.
FAHlog:

Code: Select all

[15:43:35] 
[15:43:35] *------------------------------*
[15:43:35] Folding@Home Gromacs SMP Core
[15:43:35] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[15:43:35] 
[15:43:35] Preparing to commence simulation
[15:43:35] - Ensuring status. Please wait.
[15:43:44] - Looking at optimizations...
[15:43:44] - Working with standard loops on this execution.
[15:43:44] - Files status OK
[15:43:45] - Expanded 4836062 -> 23981533 (decompressed 495.8 percent)
[15:43:45] Called DecompressByteArray: compressed_data_size=4836062 data_size=23981533, decompressed_data_size=23981533 diff=0
[15:43:45] - Digital signature verified
[15:43:45] 
[15:43:45] Project: 2669 (Run 5, Clone 196, Gen 86)
[15:43:45] 
[15:43:46] Entering M.D.
[15:43:55] Completed 0 out of 250000 steps  (0%)
[15:52:54] Completed 2500 out of 250000 steps  (1%)
[16:01:54] Completed 5000 out of 250000 steps  (2%)
[16:10:55] Completed 7500 out of 250000 steps  (3%)
[16:19:55] Completed 10000 out of 250000 steps  (4%)
[16:28:55] Completed 12500 out of 250000 steps  (5%)
[16:37:53] Completed 15000 out of 250000 steps  (6%)
[16:46:51] Completed 17500 out of 250000 steps  (7%)
[16:55:48] Completed 20000 out of 250000 steps  (8%)
[17:04:47] Completed 22500 out of 250000 steps  (9%)
[17:13:45] Completed 25000 out of 250000 steps  (10%)
[17:22:43] Completed 27500 out of 250000 steps  (11%)
[17:31:38] Completed 30000 out of 250000 steps  (12%)
[17:40:33] Completed 32500 out of 250000 steps  (13%)
[17:49:30] Completed 35000 out of 250000 steps  (14%)
[17:58:31] Completed 37500 out of 250000 steps  (15%)
[18:07:29] Completed 40000 out of 250000 steps  (16%)
[18:16:25] Completed 42500 out of 250000 steps  (17%)
[18:25:19] Completed 45000 out of 250000 steps  (18%)
[18:34:08] Completed 47500 out of 250000 steps  (19%)
[18:42:58] Completed 50000 out of 250000 steps  (20%)
[18:51:48] Completed 52500 out of 250000 steps  (21%)
[19:00:41] Completed 55000 out of 250000 steps  (22%)
[19:09:33] Completed 57500 out of 250000 steps  (23%)
[19:18:26] Completed 60000 out of 250000 steps  (24%)
[19:25:51] - Autosending finished units... [April 22 19:25:51 UTC]
[19:25:51] Trying to send all finished work units
[19:25:51] + No unsent completed units remaining.
[19:25:51] - Autosend completed
[19:27:19] Completed 62500 out of 250000 steps  (25%)
[19:36:12] Completed 65000 out of 250000 steps  (26%)
[19:45:04] Completed 67500 out of 250000 steps  (27%)
[19:53:56] Completed 70000 out of 250000 steps  (28%)
[20:02:50] Completed 72500 out of 250000 steps  (29%)
[20:11:44] Completed 75000 out of 250000 steps  (30%)
[20:20:37] Completed 77500 out of 250000 steps  (31%)
[20:29:32] Completed 80000 out of 250000 steps  (32%)
[20:38:25] Completed 82500 out of 250000 steps  (33%)
[20:47:17] Completed 85000 out of 250000 steps  (34%)
[20:56:10] Completed 87500 out of 250000 steps  (35%)
[21:05:04] Completed 90000 out of 250000 steps  (36%)
[21:14:00] Completed 92500 out of 250000 steps  (37%)
[21:22:55] Completed 95000 out of 250000 steps  (38%)
[21:31:51] Completed 97500 out of 250000 steps  (39%)
[21:40:47] Completed 100000 out of 250000 steps  (40%)
[21:49:43] Completed 102500 out of 250000 steps  (41%)
[21:58:38] Completed 105000 out of 250000 steps  (42%)
[22:07:34] Completed 107500 out of 250000 steps  (43%)
[22:16:29] Completed 110000 out of 250000 steps  (44%)
[22:25:25] Completed 112500 out of 250000 steps  (45%)
[22:34:20] Completed 115000 out of 250000 steps  (46%)
[22:43:16] Completed 117500 out of 250000 steps  (47%)
[22:52:12] Completed 120000 out of 250000 steps  (48%)
[23:01:08] Completed 122500 out of 250000 steps  (49%)
[23:10:04] Completed 125000 out of 250000 steps  (50%)
[23:19:01] Completed 127500 out of 250000 steps  (51%)
[23:27:56] Completed 130000 out of 250000 steps  (52%)
[23:36:52] Completed 132500 out of 250000 steps  (53%)
[23:45:48] Completed 135000 out of 250000 steps  (54%)
[23:49:56] 
[23:49:56] Folding@home Core Shutdown: INTERRUPTED
[23:50:00] CoreStatus = 66 (102)
[23:50:00] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[23:50:00] Killing all core threads

Folding@Home Client Shutdown.
restarting client...
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Project: 2669 (Run 5, Clone 196, Gen 86) seg fault

Post by 7im »

For reference, it helps if you report your system OS and Specs when reporting problems.
http://fahwiki.net/index.php/Early_Unit ... rting_EUEs
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: Project: 2669 (Run 5, Clone 196, Gen 86) seg fault

Post by alpha754293 »

7im wrote:For reference, it helps if you report your system OS and Specs when reporting problems.
http://fahwiki.net/index.php/Early_Unit ... rting_EUEs
System specs are reported throughout the site somewhere. Use the search.

Also, quote:

"As noted above, there are several causes for WU to EUE. All true EUEs result in the core exiting with a Core status of 72 (114). Other abnormal exits characterised by different core statuses are not EUEs, but rather the symptom of another problem.
A list of EUE Types is available here"

I'm getting Core status 66 and per the definition above, and here, it is not an EUE that I'm getting.

On top of all that, this isn't the first time that my system has been reporting segmentation faults.

But since you asked, here you go:

Code: Select all

Bootdata ok (command line is root=/dev/disk/by-id/scsi-35000cca001c46e41-part2 vga=0x317    resume=/dev/sda2 splash=silent)
Linux version 2.6.16.60-0.21-smp (geeko@buildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP Tue May 6 12:41:02 UTC 2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009a400 (usable)
 BIOS-e820: 000000000009a400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000cc000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000fbf70000 (usable)
 BIOS-e820: 00000000fbf70000 - 00000000fbf76000 (ACPI data)
 BIOS-e820: 00000000fbf76000 - 00000000fbf80000 (ACPI NVS)
 BIOS-e820: 00000000fbf80000 - 00000000fc000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000400000000 (usable)
DMI present.
ACPI: RSDP (v002 PTLTD                                 ) @ 0x00000000000f6a30
ACPI: XSDT (v001 PTLTD  	 XSDT   0x06040000  LTP 0x00000000) @ 0x00000000fbf7313b
ACPI: FADT (v003 AMD    HAMMER   0x06040000 PTEC 0x000f4240) @ 0x00000000fbf7328f
ACPI: SRAT (v001 AMD    HAMMER   0x06040000 AMD  0x00000001) @ 0x00000000fbf75bfc
ACPI: HPET (v001 AMD    HAMMER   0x06040000 PTEC 0x00000000) @ 0x00000000fbf75d74
ACPI: SSDT (v001 AMD-K8 AMD-ACPI 0x06040000  AMD 0x00000001) @ 0x00000000fbf75dac
ACPI: SSDT (v001 AMD-K8 AMD-ACPI 0x06040000  AMD 0x00000001) @ 0x00000000fbf75e49
ACPI: MADT (v001 PTLTD  	 APIC   0x06040000  LTP 0x00000000) @ 0x00000000fbf75ee6
ACPI: SPCR (v001 PTLTD  $UCRTBL$ 0x06040000 PTL  0x00000001) @ 0x00000000fbf75fb0
ACPI: DSDT (v001 AMD-K8  AMDACPI 0x06040000 MSFT 0x0100000e) @ 0x0000000000000000
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 1 -> APIC 2 -> Node 1
SRAT: PXM 1 -> APIC 3 -> Node 1
SRAT: PXM 2 -> APIC 4 -> Node 2
SRAT: PXM 2 -> APIC 5 -> Node 2
SRAT: PXM 3 -> APIC 6 -> Node 3
SRAT: PXM 3 -> APIC 7 -> Node 3
SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 0-fc000000
SRAT: Node 1 PXM 1 100000000-200000000
SRAT: Node 2 PXM 2 200000000-300000000
SRAT: Node 3 PXM 3 300000000-400000000
NUMA: Using 32 for the hash shift.
Bootmem setup node 0 0000000000000000-00000000fc000000
Bootmem setup node 1 0000000100000000-0000000200000000
Bootmem setup node 2 0000000200000000-0000000300000000
Bootmem setup node 3 0000000300000000-0000000400000000
On node 0 totalpages: 1016840
  DMA zone: 2944 pages, LIFO batch:0
  DMA32 zone: 1013896 pages, LIFO batch:31
  Normal zone: 0 pages, LIFO batch:0
  HighMem zone: 0 pages, LIFO batch:0
On node 1 totalpages: 1034240
  DMA zone: 0 pages, LIFO batch:0
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 1034240 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:0
On node 2 totalpages: 1034240
  DMA zone: 0 pages, LIFO batch:0
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 1034240 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:0
On node 3 totalpages: 1034240
  DMA zone: 0 pages, LIFO batch:0
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 1034240 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:0
ACPI: PM-Timer IO Port: 0xc008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
Processor #2 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Processor #3 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
Processor #4 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
Processor #5 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
Processor #6 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
Processor #7 15:1 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 17, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfc000000] gsi_base[24])
IOAPIC[1]: apic_id 9, version 17, address 0xfc000000, GSI 24-27
ACPI: IOAPIC (id[0x0a] address[0xfc001000] gsi_base[28])
IOAPIC[2]: apic_id 10, version 17, address 0xfc001000, GSI 28-31
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to physical flat
ACPI: HPET id: 0x102282a0 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at fc400000 (gap: fc000000:2c00000)
SMP: Allowing 8 CPUs, 0 hotplug CPUs
Built 4 zonelists
Kernel command line: root=/dev/disk/by-id/scsi-35000cca001c46e41-part2 vga=0x317    resume=/dev/sda2 splash=silent
bootsplash: silent mode.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 14.318180 MHz WALL HPET GTOD HPET timer.
time.c: Detected 2405.912 MHz processor.
Console: colour dummy device 80x25
Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Checking aperture...
CPU 0: aperture @ 0 size 32 MB
No AGP bridge found
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Nosave address range: 000000000009a000 - 000000000009b000
Nosave address range: 000000000009b000 - 00000000000a0000
Nosave address range: 00000000000a0000 - 00000000000cc000
Nosave address range: 00000000000cc000 - 0000000000100000
Nosave address range: 00000000fbf70000 - 00000000fbf76000
Nosave address range: 00000000fbf76000 - 00000000fbf80000
Nosave address range: 00000000fbf80000 - 00000000fc000000
Nosave address range: 00000000fc000000 - 00000000fec00000
Nosave address range: 00000000fec00000 - 00000000fee00000
Nosave address range: 00000000fee00000 - 00000000fee01000
Nosave address range: 00000000fee01000 - 00000000fff80000
Nosave address range: 00000000fff80000 - 0000000100000000
Memory: 16384044k/16777216k available (1983k kernel code, 326652k reserved, 904k data, 204k init)
Calibrating delay using timer specific routine.. 4818.06 BogoMIPS (lpj=9636126)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(2) -> Node 0 -> Core 0
checking if image is initramfs... it is
Freeing initrd memory: 3091k freed
 not found!
Using local APIC timer interrupts.
result 12530806
Detected 12.530 MHz APIC timer.
Booting processor 1/8 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 4811.01 BogoMIPS (lpj=9622025)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(2) -> Node 0 -> Core 1
AMD Athlon(tm) or Opteron(tm) CPU-model unknown stepping 02
CPU 1: Syncing TSC to CPU 0.
CPU 1: synchronized TSC with CPU 0 (last diff 0 cycles, maxerr 854 cycles)
Booting processor 2/8 APIC 0x2
Initializing CPU#2
Calibrating delay using timer specific routine.. 4811.04 BogoMIPS (lpj=9622080)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 2(2) -> Node 1 -> Core 0
AMD Athlon(tm) or Opteron(tm) CPU-model unknown stepping 02
CPU 2: Syncing TSC to CPU 0.
CPU 2: synchronized TSC with CPU 0 (last diff 136 cycles, maxerr 643 cycles)
Booting processor 3/8 APIC 0x3
Initializing CPU#3
Calibrating delay using timer specific routine.. 4811.00 BogoMIPS (lpj=9622007)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 3(2) -> Node 1 -> Core 1
AMD Athlon(tm) or Opteron(tm) CPU-model unknown stepping 02
CPU 3: Syncing TSC to CPU 0.
CPU 3: synchronized TSC with CPU 0 (last diff 139 cycles, maxerr 643 cycles)
Booting processor 4/8 APIC 0x4
Initializing CPU#4
Calibrating delay using timer specific routine.. 4811.00 BogoMIPS (lpj=9622010)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 4(2) -> Node 2 -> Core 0
AMD Athlon(tm) or Opteron(tm) CPU-model unknown stepping 02
CPU 4: Syncing TSC to CPU 0.
CPU 4: synchronized TSC with CPU 0 (last diff 139 cycles, maxerr 651 cycles)
Booting processor 5/8 APIC 0x5
Initializing CPU#5
Calibrating delay using timer specific routine.. 4811.03 BogoMIPS (lpj=9622075)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 5(2) -> Node 2 -> Core 1
AMD Athlon(tm) or Opteron(tm) CPU-model unknown stepping 02
CPU 5: Syncing TSC to CPU 0.
CPU 5: synchronized TSC with CPU 0 (last diff 20 cycles, maxerr 902 cycles)
Booting processor 6/8 APIC 0x6
Initializing CPU#6
Calibrating delay using timer specific routine.. 4811.07 BogoMIPS (lpj=9622148)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 6(2) -> Node 3 -> Core 0
AMD Athlon(tm) or Opteron(tm) CPU-model unknown stepping 02
CPU 6: Syncing TSC to CPU 0.
CPU 6: synchronized TSC with CPU 0 (last diff 2 cycles, maxerr 1538 cycles)
Booting processor 7/8 APIC 0x7
Initializing CPU#7
Calibrating delay using timer specific routine.. 4811.02 BogoMIPS (lpj=9622047)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 7(2) -> Node 3 -> Core 1
AMD Athlon(tm) or Opteron(tm) CPU-model unknown stepping 02
CPU 7: Syncing TSC to CPU 0.
CPU 7: synchronized TSC with CPU 0 (last diff -4 cycles, maxerr 1522 cycles)
Brought up 8 CPUs
testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)!
migration_cost=410,602
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Subsystem revision 20060127
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
Boot video device is 0000:01:06.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 5 10 *11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 *5 10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 5 *10 11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 5 10 *11)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.TP2P._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.G0PA._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.G0PB._PRT]
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
hpet0: at MMIO 0xfed00000 (virtual 0xffffffffff5fe000), IRQs 2, 8, 0
hpet0: 3 32-bit timers, 14318180 Hz
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ 4000000 size 65536 KB
PCI-DMA: using GART IOMMU.
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
PCI: Bridge: 0000:00:06.0
  IO window: 2000-2fff
  MEM window: fc100000-fdffffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:0a.0
  IO window: 3000-3fff
  MEM window: fe000000-fe0fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:0b.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
audit: initializing netlink socket (disabled)
audit(1239099767.684:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
PCI: MSI quirk detected. PCI_BUS_FLAGS_NO_MSI set for subordinate bus.
PCI: MSI quirk detected. PCI_BUS_FLAGS_NO_MSI set for subordinate bus.
 0000:01:03.0: HCRESET not completed yet!
 0000:01:03.1: HCRESET not completed yet!
vesafb: framebuffer at 0xfd000000, mapped to 0xffffc20000080000, using 6144k, total 8128k
vesafb: mode is 1024x768x16, linelength=2048, pages=4
vesafb: scrolling: redraw
vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
bootsplash 3.1.6-2004/03/31: looking for picture...
bootsplash: silentjpeg size 47357 bytes
bootsplash: ...found (1024x768, 30410 bytes, v3).
Console: switching to colour frame buffer device 123x44
fb0: VESA VGA frame buffer device
Real Time Clock Driver v1.12ac
Non-volatile memory driver v1.2
Linux agpgart interface v0.101 (c) Dave Jones
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled
RAMDISK driver initialized: 16 RAM disks of 128000K size 1024 blocksize
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard as /class/input/input0
input: PC Speaker as /class/input/input1
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
NET: Registered protocol family 2
IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
NET: Registered protocol family 1
ACPI wakeup devices: 
TP2P  USB USB1 USB2 G0PA G0PB 
ACPI: (supports S0 S1 S4 S5)
Freeing unused kernel memory: 204k freed
input: PS2++ Logitech Wheel Mouse as /class/input/input2
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SCSI subsystem initialized
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
    ide0: BM-DMA at 0x1020-0x1027, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0x1028-0x102f, BIOS settings: hdc:pio, hdd:DMA
Probing IDE interface ide0...
Probing IDE interface ide1...
hdd: DV-28E-N, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
Fusion MPT base driver 3.04.06-suse
Copyright (c) 1999-2007 LSI Corporation
Fusion MPT SPI Host driver 3.04.06-suse
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt 0000:02:04.0[A] -> GSI 24 (level, low) -> IRQ 169
mptbase: ioc0: Initiating bringup
ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
ACPI: PCI Interrupt 0000:02:04.0[A] -> GSI 24 (level, low) -> IRQ 169
scsi0 : ioc0: LSI53C1030 B2, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=169
  Vendor: HITACHI   Model: HUS151414VL3800   Rev: S3B0
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target0:0:0: Beginning Domain Validation
 target0:0:0: Ending Domain Validation
 target0:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 127)
SCSI device sda: 287140277 512-byte hdwr sectors (147016 MB)
sda: Write Protect is off
sda: Mode Sense: cb 00 10 08
SCSI device sda: drive cache: write back w/ FUA
SCSI device sda: 287140277 512-byte hdwr sectors (147016 MB)
sda: Write Protect is off
Losing some ticks... checking if CPU frequency changed.
sda: Mode Sense: cb 00 10 08
SCSI device sda: drive cache: write back w/ FUA
 sda: sda1 sda2
sd 0:0:0:0: Attached scsi disk sda
  Vendor: HITACHI   Model: HUS151414VL3800   Rev: S3B0
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target0:0:1: Beginning Domain Validation
sd 0:0:0:0: Attached scsi generic sg0 type 0
 target0:0:1: Ending Domain Validation
 target0:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 127)
SCSI device sdb: 287140277 512-byte hdwr sectors (147016 MB)
sdb: Write Protect is off
sdb: Mode Sense: cb 00 10 08
SCSI device sdb: drive cache: write back w/ FUA
SCSI device sdb: 287140277 512-byte hdwr sectors (147016 MB)
sdb: Write Protect is off
sdb: Mode Sense: cb 00 10 08
SCSI device sdb: drive cache: write back w/ FUA
 sdb: sdb1 sdb2
sd 0:0:1:0: Attached scsi disk sdb
sd 0:0:1:0: Attached scsi generic sg1 type 0
GSI 17 sharing vector 0xB1 and IRQ 17
ACPI: PCI Interrupt 0000:02:04.1[B] -> GSI 25 (level, low) -> IRQ 177
mptbase: ioc1: Initiating bringup
ioc1: LSI53C1030 B2: Capabilities={Initiator,Target}
ACPI: PCI Interrupt 0000:02:04.1[B] -> GSI 25 (level, low) -> IRQ 177
scsi1 : ioc1: LSI53C1030 B2, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=177
BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
Attempting manual resume
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdb2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 33559776k swap on /dev/disk/by-id/scsi-35000cca001c46ed3-part2.  Priority:-1 extents:1 across:33559776k
hw_random: AMD768 system management I/O registers at 0xC000.
hw_random hardware driver 1.0.0 loaded
tg3.c:v3.86b (April 2, 2008)
hdd: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 24 (level, low) -> IRQ 169
Fusion MPT misc device (ioctl) driver 3.04.06-suse
mptctl: Registered with Fusion MPT base driver
mptctl: /dev/mptctl @ (major,minor=10,220)
usbcore: registered new driver usbfs
eth0: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:e0:81:5f:1e:08
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[769f0000] dma_mask[64-bit]
ACPI: PCI Interrupt 0000:02:09.1[B] -> GSI 25 (level, low) -> IRQ 177
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
usbcore: registered new driver hub
shpchp: HPC vendor_id 1022 device_id 7460 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
USB Universal Host Controller Interface driver v2.3
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:e0:81:5f:1e:09
eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1]
eth1: dma_rwctrl[769f0000] dma_mask[64-bit]
shpchp: HPC vendor_id 1022 device_id 7450 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: HPC vendor_id 1022 device_id 7450 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
GSI 18 sharing vector 0xB9 and IRQ 18
ACPI: PCI Interrupt 0000:01:00.0[D] -> GSI 19 (level, low) -> IRQ 185
ohci_hcd 0000:01:00.0: OHCI Host Controller
ohci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
ohci_hcd 0000:01:00.0: irq 185, io mem 0xfc100000
usb usb1: new device found, idVendor=0000, idProduct=0000
usb usb1: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: OHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.16.60-0.21-smp ohci_hcd
usb usb1: SerialNumber: 0000:01:00.0
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 3 ports detected
ACPI: PCI Interrupt 0000:01:00.1[D] -> GSI 19 (level, low) -> IRQ 185
ohci_hcd 0000:01:00.1: OHCI Host Controller
ohci_hcd 0000:01:00.1: new USB bus registered, assigned bus number 2
ohci_hcd 0000:01:00.1: irq 185, io mem 0xfc101000
usb usb2: new device found, idVendor=0000, idProduct=0000
usb usb2: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: OHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.16.60-0.21-smp ohci_hcd
usb usb2: SerialNumber: 0000:01:00.1
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
GSI 19 sharing vector 0xC1 and IRQ 19
ACPI: PCI Interrupt 0000:01:03.0[A] -> GSI 17 (level, low) -> IRQ 193
PCI: VIA IRQ fixup for 0000:01:03.0, from 5 to 1
uhci_hcd 0000:01:03.0: UHCI Host Controller
uhci_hcd 0000:01:03.0: new USB bus registered, assigned bus number 3
uhci_hcd 0000:01:03.0: irq 193, io base 0x00002400
usb usb3: new device found, idVendor=0000, idProduct=0000
usb usb3: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: UHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.16.60-0.21-smp uhci_hcd
usb usb3: SerialNumber: 0000:01:03.0
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
GSI 20 sharing vector 0xC9 and IRQ 20
ACPI: PCI Interrupt 0000:01:03.1[B] -> GSI 16 (level, low) -> IRQ 201
PCI: VIA IRQ fixup for 0000:01:03.1, from 11 to 9
uhci_hcd 0000:01:03.1: UHCI Host Controller
uhci_hcd 0000:01:03.1: new USB bus registered, assigned bus number 4
uhci_hcd 0000:01:03.1: irq 201, io base 0x00002420
usb usb4: new device found, idVendor=0000, idProduct=0000
usb usb4: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: UHCI Host Controller
usb usb4: Manufacturer: Linux 2.6.16.60-0.21-smp uhci_hcd
usb usb4: SerialNumber: 0000:01:03.1
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
GSI 21 sharing vector 0xD1 and IRQ 21
ACPI: PCI Interrupt 0000:01:03.2[C] -> GSI 18 (level, low) -> IRQ 209
PCI: VIA IRQ fixup for 0000:01:03.2, from 10 to 1
ehci_hcd 0000:01:03.2: EHCI Host Controller
ehci_hcd 0000:01:03.2: new USB bus registered, assigned bus number 5
ehci_hcd 0000:01:03.2: irq 209, io mem 0xfc103000
ehci_hcd 0000:01:03.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb5: new device found, idVendor=0000, idProduct=0000
usb usb5: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb5: Product: EHCI Host Controller
usb usb5: Manufacturer: Linux 2.6.16.60-0.21-smp ehci_hcd
usb usb5: SerialNumber: 0000:01:03.2
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 4 ports detected
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
dm-netlink version 0.0.2 loaded
loop: loaded (max 8 devices)
AppArmor: AppArmor initialized
audit(1239114212.020:2):  info="AppArmor initialized" pid=2494
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
No dock devices found.
NET: Registered protocol family 17
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
audit(1239114221.420:3): operation="inode_permission" requested_mask="rw" denied_mask="r" name="/dev/tty10" pid=2677 profile="/sbin/syslog-ng"
audit(1239114223.948:4): audit_pid=3660 old=0 by auid=4294967295
mtrr: type mismatch for fd000000,800000 old: write-back new: write-combining
mtrr: type mismatch for fd000000,800000 old: write-back new: write-combining
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
FahCore_a2.exe[31672]: segfault at 00002aa885deded4 rip 0000000000436e25 rsp 00000000407fac60 error 6
FahCore_a2.exe[1320]: segfault at 00002aad04b1c408 rip 0000000000456792 rsp 00000000407facf0 error 6
FahCore_a2.exe[7070]: segfault at 00002aaaae4c9000 rip 0000000000436e3e rsp 00000000407fac60 error 6
FahCore_a2.exe[10727]: segfault at 0000000318328180 rip 00000000006517b9 rsp 00000000407fb2d0 error 4
FahCore_a2.exe[10730]: segfault at 00002aaab96f9510 rip 0000000000651806 rsp 00000000407fb2d0 error 4
FahCore_a2.exe[13325]: segfault at 00002aaaf6e21830 rip 0000000000436ddf rsp 00000000407fac60 error 4
FahCore_a2.exe[28010]: segfault at 00002aaecef6cc48 rip 00000000006548af rsp 00000000407fae40 error 4
FahCore_a2.exe[3203]: segfault at 0000000000000000 rip 0000000000518a88 rsp 00000000407fb560 error 6
FahCore_a2.exe[3350]: segfault at 0000000000000000 rip 0000000000000000 rsp 00000000407fc280 error 14
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Project: 2669 (Run 5, Clone 196, Gen 86) seg fault

Post by 7im »

Of course that's not an EUE. As you noted, it's a symptom of another problem, so the more information you can provide about your setup is all the more helpful for those who might be inclined to offer assistance. I thought the EUE link might be a good suggestion as to what info someone new to the process of reporting WU errors might want to include.
alpha754293 wrote:...System specs are reported throughout the site somewhere. Use the search.
I'll be sure to pass that answer along to Pande Group. When you are asking volunteers for help, they tend to be more inclined towards helpfulness when you show useful details and less attitude. :(
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: Project: 2669 (Run 5, Clone 196, Gen 86) seg fault

Post by alpha754293 »

7im wrote:Of course that's not an EUE. As you noted, it's a symptom of another problem, so the more information you can provide about your setup is all the more helpful for those who might be inclined to offer assistance. I thought the EUE link might be a good suggestion as to what info someone new to the process of reporting WU errors might want to include.
alpha754293 wrote:...System specs are reported throughout the site somewhere. Use the search.
I'll be sure to pass that answer along to Pande Group. When you are asking volunteers for help, they tend to be more inclined towards helpfulness when you show useful details and less attitude. :(
Well...it is a silly question that deserves a silly answer. As I said, this isn't the first time that my systems have produced a seg fault. Why don't you go look up some of those instead as I am quite certain that you're bound to find the hardware info somewhere in them.

(Course if the signature length was also increased, I could have included the system specs by name in my signature so that you'd always see it.)

The way I figure is that if moderators, admins, and the like are able to tell us "users" to use to search feature when we're looking for your help, the reverse is also likewise true.

Just a simple search for "seg fault" with my name on it returned 21 matches alone. 41 results for "segmentation fault".

From viewtopic.php?f=19&t=8325&hilit=Opteron+880

"well..those previous corrupted runs...I don't know what happened. the system right now is pretty much a dedicated F@H machine, running SLES 10 SP2 x64 on a Tyan B4882-D barebones server with 4x AMD Opteron 880 and 16 GB of RAM and two 146 GB 15krpm U320 drives.

It's got an 850 W PSU in it, and running two "-smp 4" clients and a2 cores, it only draws about 510 W (measured with the killawatt thing)."

9 matches by "Opteron 880" all with my name on it. First mention of the CPU (along with the remainder of the system/hardware specs) went up about 3 months ago yesterday.

Plus, I've already posted the output from dmesg, so what more would you like?

P.S. I'm not actually asking for help on this.

Out of the 21 seg faults that I've encountered already, the most that people have been able to do is to suggest running memtest and other various stability checks/hardware checks, etc. (which have all been completed and passed without errors), and that there's only a check/cross-reference to some kind of database that I guess you guys use to see whether a WU has been returned to the server or not successfully.

So far, there isn't been much in the way of being able to determine the root cause of the seg faults, other than it happens, and that sometimes it takes two to three starts of the clients (for it to pick up from where it left off) to finish the WU.

So, as far as "fixing" it goes; that's about the best that we've got so far.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2669 (Run 5, Clone 196, Gen 86) seg fault

Post by bruce »

alpha754293 wrote:P.S. I'm not actually asking for help on this.

Out of the 21 seg faults that I've encountered already, the most that people have been able to do is to suggest running memtest and other various stability checks/hardware checks, etc. (which have all been completed and passed without errors), and that there's only a check/cross-reference to some kind of database that I guess you guys use to see whether a WU has been returned to the server or not successfully.

So far, there isn't been much in the way of being able to determine the root cause of the seg faults, other than it happens, and that sometimes it takes two to three starts of the clients (for it to pick up from where it left off) to finish the WU.

So, as far as "fixing" it goes; that's about the best that we've got so far.
A Seg Fault is not an error which has a simple explanation -- which is why it doesn't fall under the category of "EUE" The OS has detected that the program made a reference to an invalid memory location. It doesn't tell anybody WHY that happened.

If the memory reference was made because of a hardware failure in your memory subsystem, it's something that you can probably fix, hence the suggestion to run memtest. If the memory reference was made because of a strange set of values in the particular WU, it should be repeatable, so we have to search for others who have processed the same WU and perhaps report it as a bad WU. If the memory reference was made because of a software bug in the FahCore, the developers will need to figure out how to reproduce the problem, and there's a pretty good chance that running the same WU in their lab will NOT produce the same error so they can't identify a bug that needs fixing.

Since the only one that you can do anything about is the first one, and you've apparently already exhausted the memtest scenario, all you can do is report it and we can work on the second one. Nest time you post this sort of error, you might say you're just reporting -- not looking for help -- in your first post and avoid future contentious discussion like this one has been.
Post Reply