Project: 5765, 5766, 5767, and 5768 multiple errors

Moderators: Site Moderators, FAHC Science Team

Post Reply
bishr1
Posts: 2
Joined: Sun Dec 28, 2008 6:21 pm

Project: 5765, 5766, 5767, and 5768 multiple errors

Post by bishr1 »

Across multiple machines with nVidia GPUs I've had issues starting any work unit from projects 5765, 5766, 5767, and 5768. I did not log all of the work units but here is a partial list:

Code: Select all

Project: 5765 (Run 1, Clone 200, Gen 1)
Project: 5765 (Run 11, Clone 183, Gen 1)
Project: 5765 (Run 13, Clone 110, Gen 4)
Project: 5765 (Run 2, Clone 49, Gen 16)
Project: 5765 (Run 5, Clone 147, Gen 6)
Project: 5765 (Run 9, Clone 202, Gen 14)
Project: 5765 (Run 9, Clone 83, Gen 22)
Project: 5766 (Run 1, Clone 15, Gen 11)
Project: 5766 (Run 10, Clone 148, Gen 6)
Project: 5766 (Run 12, Clone 147, Gen 0)
Project: 5766 (Run 13, Clone 158, Gen 0)
Project: 5766 (Run 14, Clone 97, Gen 6)
Project: 5766 (Run 2, Clone 39, Gen 9)
Project: 5766 (Run 3, Clone 133, Gen 9)
Project: 5766 (Run 3, Clone 133, Gen 9)
Project: 5766 (Run 3, Clone 72, Gen 15)
Project: 5766 (Run 4, Clone 111, Gen 5)
Project: 5766 (Run 5, Clone 116, Gen 10)
Project: 5766 (Run 6, Clone 153, Gen 2)
Project: 5766 (Run 7, Clone 134, Gen 2)
Project: 5766 (Run 7, Clone 134, Gen 2)
Project: 5766 (Run 8, Clone 106, Gen 4)
Project: 5767 (Run 12, Clone 81, Gen 8)
Project: 5767 (Run 14, Clone 138, Gen 11)
Project: 5767 (Run 5, Clone 49, Gen 12)
Project: 5767 (Run 6, Clone 232, Gen 1)
Project: 5767 (Run 8, Clone 154, Gen 2)
Project: 5768 (Run 1, Clone 153, Gen 0)
Project: 5768 (Run 13, Clone 168, Gen 8)
Project: 5768 (Run 3, Clone 193, Gen 0)
Project: 5768 (Run 8, Clone 146, Gen 5)
This is the log from one of the work units but all of them have ended at the same point and with the same error.

Code: Select all

[23:16:49] + Attempting to get work packet
[23:16:49] - Connecting to assignment server
[23:16:49] - Successful: assigned to (171.67.108.11).
[23:16:49] + News From Folding@Home: GPU folding beta
[23:16:49] Loaded queue successfully.
[23:16:50] + Closed connections
[23:16:50] 
[23:16:50] + Processing work unit
[23:16:50] Core required: FahCore_11.exe
[23:16:50] Core found.
[23:16:50] Working on queue slot 03 [December 28 23:16:50 UTC]
[23:16:50] + Working ...
[23:16:50] 
[23:16:50] *------------------------------*
[23:16:50] Folding@Home GPU Core - Beta
[23:16:50] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[23:16:50] 
[23:16:50] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[23:16:50] Build host: amoeba
[23:16:50] Board Type: Nvidia
[23:16:50] Core      : 
[23:16:50] Preparing to commence simulation
[23:16:50] - Looking at optimizations...
[23:16:50] - Created dyn
[23:16:50] - Files status OK
[23:16:50] - Expanded 46707 -> 252912 (decompressed 541.4 percent)
[23:16:50] Called DecompressByteArray: compressed_data_size=46707 data_size=252912, decompressed_data_size=252912 diff=0
[23:16:50] - Digital signature verified
[23:16:50] 
[23:16:50] Project: 5765 (Run 9, Clone 202, Gen 14)
[23:16:50] 
[23:16:51] Assembly optimizations on if available.
[23:16:51] Entering M.D.
[23:16:57] Working on Protein
[23:16:59] Client config found, loading data.
[23:16:59] Starting GUI Server
[23:16:59] mdrun_gpu returned 
[23:16:59] NANs detected on GPU
[23:16:59] 
[23:16:59] Folding@home Core Shutdown: UNSTABLE_MACHINE
[23:17:03] CoreStatus = 7A (122)
[23:17:03] Sending work to server
[23:17:03] Project: 5765 (Run 9, Clone 202, Gen 14)
[23:17:03] - Read packet limit of 540015616... Set to 524286976.
[23:17:03] - Error: Could not get length of results file work/wuresults_03.dat
[23:17:03] - Error: Could not read unit 03 file. Removing from queue.
Across three dedicated folding machines with five GPUs I am seeing the same errors. All machines are running stock clock frequencies.

Code: Select all

Machine 1
Vista x64 with all currently released patches
Video driver: nVidia 180.84
GPU2 System Tray Client 6.20
Intel Pentium D 915
2 GB RAM
nVidia 9800GT
nVidia 9800GT
nVidia 570 SLI Motherboard

Machine 2
Vista x64 with all currently released patches
Video driver: nVidia 178.13
GPU2 System Tray Client 6.20
Intel Pentium D 915
2 GB RAM
nVidia 9800GSO
nVidia 9800GSO
nVidia 570 SLI Motherboard

Machine 3
Vista x64 with all currently released patches
GPU2 System Tray Client 6.23
Video driver: nVidia 180.48
Intel Celeron E1200
2 GB RAM
nVidia 8800GTS 640MB
Intel P965 Motherboard
I do have the work files from the FAHlog above if needed. (logfile_03.txt wudata_03.dat wudata_03.log wuinfo_03.dat)
I have tried running the executable as administrator and/or with XP compatibility mode with no changes in behaviour. On one box I also moved an in progress working project 5770 work unit to the gpu that had just been marked unstable by a 5766 work unit. I did this by stopping both gpu clients and changing the shortcuts to start on the other gpu. The 5770 work unit resumed and completed without error.
At this point I've left all three machines off as they keep getting assigned these units and stopping anyway. For the past three days I had been restarting the clients and randomly getting work units from other projects to continue folding. I have not had any issues with other projects prior to these.

Thanks.
-Randy
sortofageek
Site Admin
Posts: 3110
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Project: 5765, 5766, 5767, and 5768 multiple errors

Post by sortofageek »

I'm sorry to see you having trouble with this series. I did take the time to look up every one of them, however, and each one has been completed successfully by at least one other donor. So, it would seem the trouble is somewhere other than the WUs themselves.

Code: Select all

Project: 5765 (Run 1, Clone 200, Gen 1)
Completed successfully by another donor.


Project: 5765 (Run 11, Clone 183, Gen 1)
Completed successfully by two other donors.


Project: 5765 (Run 13, Clone 110, Gen 4)
Completed successfully by two other donors.


Project: 5765 (Run 2, Clone 49, Gen 16)
Completed successfully by another donor.


Project: 5765 (Run 5, Clone 147, Gen 6)
Completed successfully by another donor.


Project: 5765 (Run 9, Clone 202, Gen 14)
Completed successfully by another donor.


Project: 5765 (Run 9, Clone 83, Gen 22)
Completed successfully by another donor.


Project: 5766 (Run 1, Clone 15, Gen 11)
Completed successfully by another donor.


Project: 5766 (Run 10, Clone 148, Gen 6)
Completed successfully by another donor.


Project: 5766 (Run 12, Clone 147, Gen 0)
Completed successfully by another donor.


Project: 5766 (Run 13, Clone 158, Gen 0)
Completed successfully by another donor.


Project: 5766 (Run 14, Clone 97, Gen 6)
Completed successfully by another donor.


Project: 5766 (Run 2, Clone 39, Gen 9)
Completed successfully by another donor.


Project: 5766 (Run 3, Clone 133, Gen 9)
Completed successfully by another donor.


Project: 5766 (Run 3, Clone 72, Gen 15)
Completed successfully by another donor.


Project: 5766 (Run 4, Clone 111, Gen 5)
Completed successfully by another donor.


Project: 5766 (Run 5, Clone 116, Gen 10)
Completed successfully by another donor.


Project: 5766 (Run 6, Clone 153, Gen 2)
Completed successfully by two other donors.


Project: 5766 (Run 7, Clone 134, Gen 2)
Completed successfully by another donor.


Project: 5766 (Run 7, Clone 134, Gen 2)
Completed successfully by another donor.


Project: 5766 (Run 8, Clone 106, Gen 4)
Completed successfully by another donor.


Project: 5767 (Run 12, Clone 81, Gen 8)
Completed successfully by another donor.


Project: 5767 (Run 14, Clone 138, Gen 11)
Completed successfully by two other donors.


Project: 5767 (Run 5, Clone 49, Gen 12)
Completed successfully by another donor.


Project: 5767 (Run 6, Clone 232, Gen 1)
Completed successfully by another donor.


Project: 5767 (Run 8, Clone 154, Gen 2)
Completed successfully by another donor.


Project: 5768 (Run 1, Clone 153, Gen 0)
Completed successfully by another donor.


Project: 5768 (Run 13, Clone 168, Gen 8)
Completed successfully by another donor.


Project: 5768 (Run 3, Clone 193, Gen 0)
Completed successfully by another donor.


Project: 5768 (Run 8, Clone 146, Gen 5)
Completed successfully by another donor.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 5765, 5766, 5767, and 5768 multiple errors

Post by bruce »

Please identify which GPU was used to process each of the WUs. There's a theory running around that certain WUs and certain GPUs have trouble while the same WU might work fine on a different GPU. We need more information to find a pattern that might identify what's going on.
bishr1
Posts: 2
Joined: Sun Dec 28, 2008 6:21 pm

Re: Project: 5765, 5766, 5767, and 5768 multiple errors

Post by bishr1 »

I've had this issue across all five of my nvidia cards that span three differnet model gpus. One is a 8800gts 640MB, two are 9600gso 384MB, and the last two are 9800gt. A friend of mine is having the same issue with his gtx260. The only common thing between out setups is Vista 64. In my original post I've included a bit more of the specs on my three machines including client and drivers used.

I just wanted to mention that I do not have any issues with the 5769-5772 units.

Thanks.

-Randy
Post Reply