Page 1 of 1

Project: 16608 (Run 188, Clone 0, Gen 103)

Posted: Wed Dec 01, 2021 4:52 pm
by v00d00
A Guru Meditation error. The core crashed.

Code: Select all

01:42:43:WU00:FS00:0x22:Project: 16608 (Run 188, Clone 0, Gen 103)
01:42:43:WU00:FS00:0x22:Unit: 0x00000000000000000000000000000000
01:42:43:WU00:FS00:0x22:Reading tar file core.xml
01:42:43:WU00:FS00:0x22:Reading tar file integrator.xml
01:42:43:WU00:FS00:0x22:Reading tar file state.xml
01:42:44:WU00:FS00:0x22:Reading tar file system.xml
01:42:45:WU00:FS00:0x22:Digital signatures verified
01:42:45:WU00:FS00:0x22:Folding@home GPU Core22 Folding@home Core
01:42:45:WU00:FS00:0x22:Version 0.0.18
01:42:45:WU00:FS00:0x22:  Checkpoint write interval: 62500 steps (5%) [20 total]
01:42:45:WU00:FS00:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
01:42:45:WU00:FS00:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
01:42:45:WU00:FS00:0x22:  Global context and integrator variables write interval: disabled
01:42:45:WU00:FS00:0x22:There are 4 platforms available.
01:42:45:WU00:FS00:0x22:Platform 0: Reference
01:42:45:WU00:FS00:0x22:Platform 1: CPU
01:42:45:WU00:FS00:0x22:Platform 2: OpenCL
01:42:45:WU00:FS00:0x22:  opencl-device 0 specified
01:42:45:WU00:FS00:0x22:Platform 3: CUDA
01:42:45:WU00:FS00:0x22:  cuda-device 0 specified
01:42:48:WU01:FS00:Upload 7.73%
01:42:54:WU01:FS00:Upload 17.50%
01:43:00:WU01:FS00:Upload 27.04%
01:43:06:WU01:FS00:Upload 36.59%
01:43:12:WU01:FS00:Upload 46.36%
01:43:18:WU01:FS00:Upload 55.91%
01:43:23:WU00:FS00:0x22:Attempting to create CUDA context:
01:43:23:WU00:FS00:0x22:  Configuring platform CUDA
01:43:24:WU01:FS00:Upload 65.23%
01:43:30:WU01:FS00:Upload 74.77%
01:43:36:WU01:FS00:Upload 83.86%
01:43:37:WU00:FS00:0x22:  Using CUDA and gpu 0
01:43:37:WU00:FS00:0x22:Completed 0 out of 1250000 steps (0%)
01:43:39:WU00:FS00:0x22:Checkpoint completed at step 0
01:43:42:WU01:FS00:Upload 93.41%
01:43:47:WU01:FS00:Upload complete
01:43:47:WU01:FS00:Server responded WORK_ACK (400)
01:43:47:WU01:FS00:Final credit estimate, 447704.00 points
01:43:47:WU01:FS00:Cleaning up
01:45:32:WU00:FS00:0x22:Completed 12500 out of 1250000 steps (1%)
~
02:10:09:WU00:FS00:0x22:Completed 175000 out of 1250000 steps (14%)
02:10:54:WU00:FS00:0x22:ERROR:Guru Meditation #5d0fca4e793897b.ac6044f93e95e795 (2524152.2524244) '00/01/positions.xtc'

Re: Project: 16608 (Run 188, Clone 0, Gen 103)

Posted: Sat Dec 04, 2021 3:00 pm
by toTOW
Guru Meditation errors are usually related to disks. It usually happens when a checkpoint gets corrupted or when the the core can't access to storage ...

It could also be something with the WU itself since I don't find any record for it in the stats db yet ...

Re: Project: 16608 (Run 188, Clone 0, Gen 103)

Posted: Sun Dec 05, 2021 10:37 pm
by v00d00
It re-ran and dumped at the same point, so maybe the workunit. It was running on my RTX 2080 Ti @ Stock. The card has done quite a few workunits since without issue, apart from that other unit that crashed with a CUDA issue. Besides these, I've lost no workunits in maybe a year.

Anyway I thought i'd report them. In reality someone else might get it to finish and if not someone on the science side might be able to work out why.

Re: Project: 16608 (Run 188, Clone 0, Gen 103)

Posted: Mon Dec 06, 2021 6:22 pm
by toTOW
The common thing in your two failures is that it occurred on two big WUs ... and the latest version of core 22 is pushing GPUs a little bit harder than the usual one.

I had to reduce overclocking on my 980 by 25 MHz to avoid instabilities (mostly NaNs).

Note : the WU reported in this thread has not yet been completed ...