Project: 18201 (Run 53421, Clone 4, Gen 2) dumped

Moderators: Site Moderators, FAHC Science Team

Post Reply
parkut
Posts: 364
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Project: 18201 (Run 53421, Clone 4, Gen 2) dumped

Post by parkut »

This WU processed normally, but on completion, the upload to 128.252.203.11 failed to send, retried, got a WORK_QUIT (404) and the work unit was dumped

Code: Select all

02:59:44:WU00:FS01:Starting
02:59:44:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.13/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -
version 706 -lifeline 1286 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
02:59:44:WU00:FS01:Started FahCore on PID 1679
02:59:44:WU00:FS01:Core PID:1683
02:59:44:WU00:FS01:FahCore 0x22 started
02:59:44:WU00:FS01:0x22:*********************** Log Started 2021-06-20T02:59:44Z ***********************
02:59:44:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:59:44:WU00:FS01:0x22:       Core: Core22
02:59:44:WU00:FS01:0x22:       Type: 0x22
02:59:44:WU00:FS01:0x22:    Version: 0.0.13
02:59:44:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:59:44:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
02:59:44:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
02:59:44:WU00:FS01:0x22:       Date: Sep 19 2020
02:59:44:WU00:FS01:0x22:       Time: 01:10:35
02:59:44:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
02:59:44:WU00:FS01:0x22:     Branch: core22-0.0.13
02:59:44:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
02:59:44:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
02:59:44:WU00:FS01:0x22:             -funroll-loops -DOPENMM_GIT_HASH="\"189320d0\""
02:59:44:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
02:59:44:WU00:FS01:0x22:       Bits: 64
02:59:44:WU00:FS01:0x22:       Mode: Release
02:59:44:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
02:59:44:WU00:FS01:0x22:             <peastman@stanford.edu>
02:59:44:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 1679 -checkpoint 15
02:59:44:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
02:59:44:WU00:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100
02:59:44:WU00:FS01:0x22:************************************ libFAH ************************************
02:59:44:WU00:FS01:0x22:       Date: Sep 15 2020
02:59:44:WU00:FS01:0x22:       Time: 05:14:43
02:59:44:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
02:59:44:WU00:FS01:0x22:     Branch: HEAD
02:59:44:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
02:59:44:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
02:59:44:WU00:FS01:0x22:             -funroll-loops
02:59:44:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
02:59:44:WU00:FS01:0x22:       Bits: 64
02:59:44:WU00:FS01:0x22:       Mode: Release
02:59:44:WU00:FS01:0x22:************************************ CBang *************************************
02:59:44:WU00:FS01:0x22:       Date: Sep 15 2020
02:59:44:WU00:FS01:0x22:       Time: 05:11:04
02:59:44:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
02:59:44:WU00:FS01:0x22:     Branch: HEAD
02:59:44:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
02:59:44:WU00:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
02:59:44:WU00:FS01:0x22:             -funroll-loops -fPIC
02:59:44:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
02:59:44:WU00:FS01:0x22:       Bits: 64
02:59:44:WU00:FS01:0x22:       Mode: Release
02:59:44:WU00:FS01:0x22:************************************ System ************************************
02:59:44:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
02:59:44:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
02:59:44:WU00:FS01:0x22:       CPUs: 8
02:59:44:WU00:FS01:0x22:     Memory: 3.56GiB
02:59:44:WU00:FS01:0x22:Free Memory: 1.87GiB
02:59:44:WU00:FS01:0x22:    Threads: POSIX_THREADS
02:59:44:WU00:FS01:0x22: OS Version: 4.15
02:59:44:WU00:FS01:0x22:Has Battery: false
02:59:44:WU00:FS01:0x22: On Battery: false
02:59:44:WU00:FS01:0x22: UTC Offset: -4
02:59:44:WU00:FS01:0x22:        PID: 1683
02:59:44:WU00:FS01:0x22:        CWD: /var/lib/fahclient/work
02:59:44:WU00:FS01:0x22:************************************ OpenMM ************************************
02:59:44:WU00:FS01:0x22:   Revision: 189320d0
02:59:44:WU00:FS01:0x22:********************************************************************************
02:59:44:WU00:FS01:0x22:Project: 18201 (Run 53421, Clone 4, Gen 2)
02:59:44:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
02:59:44:WU00:FS01:0x22:Reading tar file core.xml
02:59:44:WU00:FS01:0x22:Reading tar file integrator.xml
02:59:44:WU00:FS01:0x22:Reading tar file state.xml
02:59:45:WU00:FS01:0x22:Reading tar file system.xml
02:59:46:WU00:FS01:0x22:Digital signatures verified
02:59:46:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:59:46:WU00:FS01:0x22:Version 0.0.13
02:59:46:WU00:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
02:59:46:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
02:59:46:WU00:FS01:0x22:  XTC frame write interval: 20000 steps (1.6%) [62 total]
02:59:46:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
02:59:46:WU00:FS01:0x22:There are 4 platforms available.
02:59:46:WU00:FS01:0x22:Platform 0: Reference
02:59:46:WU00:FS01:0x22:Platform 1: CPU
02:59:46:WU00:FS01:0x22:Platform 2: OpenCL
02:59:46:WU00:FS01:0x22:  opencl-device 0 specified
02:59:46:WU00:FS01:0x22:Platform 3: CUDA
02:59:46:WU00:FS01:0x22:  cuda-device 0 specified
03:00:00:WU00:FS01:0x22:Attempting to create CUDA context:
03:00:00:WU00:FS01:0x22:  Configuring platform CUDA
03:00:15:WU00:FS01:0x22:  Using CUDA and gpu 0
03:00:15:WU00:FS01:0x22:Completed 0 out of 1250000 steps (0%)
03:00:16:WU00:FS01:0x22:Checkpoint completed at step 0
03:03:18:WU00:FS01:0x22:Completed 12500 out of 1250000 steps (1%)
03:06:20:WU00:FS01:0x22:Completed 25000 out of 1250000 steps (2%)
03:06:23:WU00:FS01:0x22:Checkpoint completed at step 25000
.... normal steps omitted....
07:59:06:WU00:FS01:0x22:Checkpoint completed at step 1225000
08:02:08:WU00:FS01:0x22:Completed 1237500 out of 1250000 steps (99%)
08:05:11:WU00:FS01:0x22:Completed 1250000 out of 1250000 steps (100%)
08:05:11:WU00:FS01:0x22:Average performance: 9.46849 ns/day
08:05:12:WU00:FS01:0x22:Checkpoint completed at step 1250000
08:05:20:WU00:FS01:0x22:Saving result file ../logfile_01.txt
08:05:20:WU00:FS01:0x22:Saving result file checkpointIntegrator.xml
08:05:20:WU00:FS01:0x22:Saving result file checkpointState.xml
08:05:25:WU00:FS01:0x22:Saving result file positions.xtc
08:05:25:WU00:FS01:0x22:Saving result file science.log
08:05:25:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
08:05:26:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
08:05:26:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:18201 run:53421 clone:4 gen:2 core:0x22 unit:0x0000000400000002000047190000d0ad
08:05:26:WU00:FS01:Uploading 27.49MiB to 128.252.203.11
08:05:26:WU00:FS01:Connecting to 128.252.203.11:8080
08:05:58:WU00:FS01:Upload 0.91%
08:05:58:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
08:05:58:WU00:FS01:Trying to send results to collection server
08:05:58:WU00:FS01:Uploading 27.49MiB to 128.252.203.2
08:05:58:WU00:FS01:Connecting to 128.252.203.2:8080
08:06:04:WU00:FS01:Upload 11.82%
08:06:10:WU00:FS01:Upload 23.64%
08:06:16:WU00:FS01:Upload 36.15%
08:06:23:WU00:FS01:Upload 47.74%
08:06:29:WU00:FS01:Upload 60.24%
08:06:35:WU00:FS01:Upload 71.84%
08:06:41:WU00:FS01:Upload 84.11%
08:06:47:WU00:FS01:Upload 96.16%
08:06:49:WU00:FS01:Upload complete
08:06:49:WU00:FS01:Server responded WORK_QUIT (404)
08:06:49:WARNING:WU00:FS01:Server did not like results, dumping
08:06:49:WU00:FS01:Cleaning up
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Project: 18201 (Run 53421, Clone 4, Gen 2) dumped

Post by JimboPalmer »

When your Work Unit finishes, your PC computes a checksum and includes it in the file it uploads. If the server does not compute the same checksum, it dumps the upload.

In an ideal world, the PC would try to upload multiple times, to be sure the data is not garbled by the internet, but so far as I know (I am just a user like you, not a developer) it only tries once.

https://en.wikipedia.org/wiki/Checksum
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project: 18201 (Run 53421, Clone 4, Gen 2) dumped

Post by Joe_H »

Another possibility is that the connection between the WS and the CS was either not set up correctly or it had been interrupted at some point before your WU was returned. The upload first tried the WS, and then the CS where it was given the WORK_QUIT message. That could mean the CS did not have the information to recognize the WU as valid.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
mgetz
Posts: 57
Joined: Tue Aug 11, 2020 6:23 pm

Re: Project: 18201 (Run 53421, Clone 4, Gen 2) dumped

Post by mgetz »

Joe_H wrote:Another possibility is that the connection between the WS and the CS was either not set up correctly or it had been interrupted at some point before your WU was returned. The upload first tried the WS, and then the CS where it was given the WORK_QUIT message. That could mean the CS did not have the information to recognize the WU as valid.
Given the reported issues with 18202 (base points for wus that should have gotten bonus, no points, all sorts of weirdness) this wouldn't surprise me at all 128.252.203.11 has been a bit of a problem as late.

viewtopic.php?f=18&t=37270

viewtopic.php?f=18&t=33072&p=352055#p352044
Image
jjmiller
Scientist
Posts: 81
Joined: Fri Apr 09, 2021 4:43 pm

Re: Project: 18201 (Run 53421, Clone 4, Gen 2) dumped

Post by jjmiller »

Hi parkut, Apologies- 128.252.203.11 was a bit grumpy last week due to high load and then a full storage cache. I was under the impression that things had been ironed out. I'll talk to the folks here who have a better knowledge of this server than I do.
Post Reply