p2675 r1 c21 g57 FAILED

Moderators: Site Moderators, FAHC Science Team

alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

p2675 r1 c21 g57 FAILED

Post by alpha754293 »

contents of logfile:

Code: Select all

[17:57:22] + Attempting to send results [January 23 17:57:22 UTC]
[18:04:21] + Results successfully sent
[18:04:21] Thank you for your contribution to Folding@Home.
[18:04:21] + Starting local stats count at 1
[18:04:23] - Preparing to get new work unit...
[18:04:23] + Attempting to get work packet
[18:04:23] - Connecting to assignment server
[18:04:24] - Successful: assigned to (171.64.65.56).
[18:04:24] + News From Folding@Home: Welcome to Folding@Home
[18:04:24] Loaded queue successfully.
[18:04:41] + Closed connections
[18:04:41] 
[18:04:41] + Processing work unit
[18:04:41] Core required: FahCore_a2.exe
[18:04:41] Core found.
[18:04:41] Working on queue slot 00 [January 23 18:04:41 UTC]
[18:04:41] + Working ...
[18:04:41] 
[18:04:41] *------------------------------*
[18:04:41] Folding@Home Gromacs SMP Core
[18:04:41] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[18:04:41] 
[18:04:41] Preparing to commence simulation
[18:04:41] - Ensuring status. Please wait.
[18:04:42] Called DecompressByteArray: compressed_data_size=4840576 data_size=23999905, decompressed_data_size=23999905 diff=0
[18:04:42] - Digital signature verified
[18:04:42] 
[18:04:42] Project: 2675 (Run 1, Clone 21, Gen 57)
[18:04:42] 
[18:04:42] Assembly optimizations on if available.
[18:04:42] Entering M.D.
[18:04:51] (Run 1, Clone 21, Gen 57)
[18:04:51] 
[18:04:52] Entering M.D.
[18:14:18] Completed 5000 out of 250000 steps  (2%)
[18:18:57] Completed 7500 out of 250000 steps  (3%)
[18:23:36] Completed 10000 out of 250000 steps  (4%)
[18:28:15] Completed 12500 out of 250000 steps  (5%)
[18:32:54] Completed 15000 out of 250000 steps  (6%)
[18:37:33] Completed 17500 out of 250000 steps  (7%)
[18:42:11] Completed 20000 out of 250000 steps  (8%)
[18:46:51] Completed 22500 out of 250000 steps  (9%)
[18:51:30] Completed 25000 out of 250000 steps  (10%)
[18:56:10] Completed 27500 out of 250000 steps  (11%)
[19:00:50] Completed 30000 out of 250000 steps  (12%)
plus additional notes:

Code: Select all

t = 28560.082 ps: Water molecule starting at atom 106441 can not be settled.
Check for bad contacts and/or reduce the timestep.
Wrote pdb files with previous and current coordinates

A list of missing interactions:
              Settle of  41132 missing     -1

-------------------------------------------------------
Program mdrun, VERSION 3.3.99_development_200800503
Source code file: domdec_top.c, line: 87

Software inconsistency error:
Some interactions seem to be assigned multiple times

-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 8

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[cli_2]: aborting job:
Fatal error in MPI_Allreduce: Error message texts are not available
[cli_4]: aborting job:
Fatal error in MPI_Allreduce: Error message texts are not available
[cli_6]: aborting job:
Fatal error in MPI_Allreduce: Error message texts are not available
Suggestions?
Last edited by alpha754293 on Sat Jan 24, 2009 12:09 am, edited 1 time in total.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: p2671 r1 c21 g57 FAILED

Post by bruce »

I don't know if it's related or not, but there's a fix in early testing that involves changes to something called "SETTLE" Hopefully it will be ready to release soon, but as with all new versions, there's no way to know until it happens.

Thanks for the report.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: p2671 r1 c21 g57 FAILED

Post by alpha754293 »

bruce wrote:I don't know if it's related or not, but there's a fix in early testing that involves changes to something called "SETTLE" Hopefully it will be ready to release soon, but as with all new versions, there's no way to know until it happens.

Thanks for the report.
What should I do with the WU in the meantime though?

Purge? Keep? Dump? Save? etc.?

(I had to pick up two other WUs because I wanted to put a freeze on it and await further instructions.)
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: p2671 r1 c21 g57 FAILED

Post by bruce »

alpha754293 wrote:What should I do with the WU in the meantime though?

Purge? Keep? Dump? Save? etc.?
Your post doesn't include the information about what happened. Did that WU continue running, or did it have and Early_Unit_End? Most errors lead to an EUE and the WU is discarded. The client is supposed to either retry the WU or move on to the next WU. If that's what happened, all that is useful is the PRCG values. Assuming it's not due to a fault in your hardware, they should be able to reproduce it from the PRCG numbers together with the appropriate version of the software or use it as a test case on a new version.
bollix47
Posts: 2976
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: p2671 r1 c21 g57 FAILED

Post by bollix47 »

The title says P2671 and the log says P2675??
Image
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: p2671 r1 c21 g57 FAILED

Post by alpha754293 »

bruce wrote:
alpha754293 wrote:What should I do with the WU in the meantime though?

Purge? Keep? Dump? Save? etc.?
Your post doesn't include the information about what happened. Did that WU continue running, or did it have and Early_Unit_End? Most errors lead to an EUE and the WU is discarded. The client is supposed to either retry the WU or move on to the next WU. If that's what happened, all that is useful is the PRCG values. Assuming it's not due to a fault in your hardware, they should be able to reproduce it from the PRCG numbers together with the appropriate version of the software or use it as a test case on a new version.
See the logs and the console output that I had included.

Skip that. I'll make it easier for you:

Quote:
t = 28560.082 ps: Water molecule starting at atom 106441 can not be settled.
Check for bad contacts and/or reduce the timestep.
Wrote pdb files with previous and current coordinates

A list of missing interactions:
Settle of 41132 missing -1

-------------------------------------------------------
Program mdrun, VERSION 3.3.99_development_200800503
Source code file: domdec_top.c, line: 87

Software inconsistency error:
Some interactions seem to be assigned multiple times

-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 8

gcq#0: Thanx for Using GROMACS - Have a Nice Day

[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[cli_2]: aborting job:
Fatal error in MPI_Allreduce: Error message texts are not available
[cli_4]: aborting job:
Fatal error in MPI_Allreduce: Error message texts are not available
[cli_6]: aborting job:
Fatal error in MPI_Allreduce: Error message texts are not available


Beyond what's in the console outputs, I still have the files for the WU. Other than that, I don't know anything more of or about it.

The error console output wasn't written to the logfile so I had to pull it off the console itself and dump it into a logfile by itself.

So, all the information that I have (that I know how to read) is all there.

If there's any other place that you would like me to look specifically, I can definitely do that.

On the other hand...if you can tell me how I can reduce the time step in order to settle the water molecule at t=28560.082 ps, let me know. Thanks.

(still holding on that WU...and await further instructions.)
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: p2671 r1 c21 g57 FAILED

Post by alpha754293 »

bollix47 wrote:The title says P2671 and the log says P2675??
oh....hehe. whoops! my bad.

misread it from FahMon.

a very appropriate "D'OH!" a la Homer Simpson seems in order here.

*dawns duncecap*. Yay to dyslexia? Or fast optical substitutions?
bollix47
Posts: 2976
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: p2671 r1 c21 g57 FAILED

Post by bollix47 »

You should be able to fix the title by editing the first post of the thread. If not, then a mod would have to fix it.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: p2671 r1 c21 g57 FAILED

Post by bruce »

alpha754293 wrote:Skip that. I'll make it easier for you:

Quote:[cli_6]: aborting job:
Fatal error in MPI_Allreduce: Error message texts are not available
I was specifically interested in what happened after the "[cli_6]: aborting job" messages, which you still didn't post. Please don't assume I cannot/do not read.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: p2671 r1 c21 g57 FAILED

Post by alpha754293 »

bruce wrote:
alpha754293 wrote:Skip that. I'll make it easier for you:

Quote:[cli_6]: aborting job:
Fatal error in MPI_Allreduce: Error message texts are not available
I was specifically interested in what happened after the "[cli_6]: aborting job" messages, which you still didn't post. Please don't assume I cannot/do not read.
Well...everything that I've got is there.

So unless there's another place that I need to look for specifically, otherwise, I don't have any more information than what I have already posted.

(the WU terminated much earlier prior to the MPI kill.)

It says "Halting parallel program mdrun CPU 0 out of 8"

It didn't give me any more messages than that and accord to the process/task listing, all cores stopped. No information about how "cleanly" they stopped. But it was stopped.

MPI_Abort(MPI_COMM_WORLD) returned -1.
those multiple cli might be because I have 4 terminals up. *shrug* dunno.

no idea.

*edit*

I THINK that the answer that you're looking for is....

nothing.

The CLI windows stops at the last line of that message and there's no prompt or anything like that right now.

And the cores were no longer listed in my process/task list.
Last edited by alpha754293 on Sat Jan 24, 2009 12:14 am, edited 1 time in total.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: p2671 r1 c21 g57 FAILED

Post by alpha754293 »

bollix47 wrote:You should be able to fix the title by editing the first post of the thread. If not, then a mod would have to fix it.
change complete, but I don't think that the change would propagate through the rest of the posts that are already here. *shrug* dunno.
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: p2675 r1 c21 g57 FAILED

Post by 7im »

As a point of reference, there is a preference to use the full WU format (as seen in the fahlog) in the title:

Project: 2675 (Run 1, Clone 21, Gen 57)

And it's just so darn easy to cut and paste from the log, and so much easier to read, and it helps prevent any typing mistakes. ;)

See this thread as an example: http://foldingforum.org/viewtopic.php?f=19&t=8075
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: p2675 r1 c21 g57 FAILED

Post by alpha754293 »

7im wrote:As a point of reference, there is a preference to use the full WU format (as seen in the fahlog) in the title:

Project: 2675 (Run 1, Clone 21, Gen 57)

And it's just so darn easy to cut and paste from the log, and so much easier to read, and it helps prevent any typing mistakes. ;)

See this thread as an example: http://foldingforum.org/viewtopic.php?f=19&t=8075
yea...I'm lazy. What can I say? I was reading the project number from the FahMon.

Besides I don't think that it really matters or makes much of a difference.

*edit*
BTW, in the unitinfo.txt, it lists it a similiar fashion as I do here. (Except that it's in all uppercase letters, and there are no spaces), but other than that, it's the same.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: p2675 r1 c21 g57 FAILED

Post by alpha754293 »

P.S. The WU got purged in the process of benchmarking while it was awaiting instructions.
bollix47
Posts: 2976
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: p2675 r1 c21 g57 FAILED

Post by bollix47 »

Besides I don't think that it really matters or makes much of a difference.
It does make a difference because the format that 7im is suggesting is the format the mods require to check the WU database and not doing it the correct way means they have to edit it before they can do a proper search for the WU.
Image
Post Reply