Peter Love wrote:
> Hi Steve, as you know we see this also. You've done more investigation
> than us so we look forward to any suggestions :-)
Suggestions; umm... yes, well... I guess I'm just throwing this on the
floor so everyone can kick it around. As you can see, I'm in two minds.
I don't mind little bugs as long as I know about them. Maybe it's
sufficient to make this a "known issue"? Meanwhile, my opinion on the
correct working is below.
Here's one position: the "contract" between the CE and the TORQUE
server is that the promised stagein files are actually there, apart from
proxy files, which may disappear if they get stale while the job is
queued. This is the status quo, which leads to the W-state problem.
Fixes include qdel'ing any job that belongs to the stale proxy. The
"status quo" for some is to do this manually, from time to time.
But here's another position: the "contract" between the CE and the
TORQUE server is that all the promised stagein files are actually there.
If some component of the CE fails to put down, or later removes, some
stagein file (for any reason) then that contract is broken. The "fix" is
to make the CE fulfill its contract, i.e. do not remove a stagein file
of any type until the job associated with the file is absolutely
finished. That's how I think the system, should work; i.e. if the CE
tells the WN that the file is supposed to be there (expired or not) then
it should be there. Having said that, it's pointless (yet fairly
harmless) to keeping looping the job through Q/R/W/Q/R/W ... for ever,
as if the proxy file might someday show up. It won't.
>> Steve Jones [log in to unmask]
>> System Administrator office: 220
>> High Energy Physics Division tel (int): 42334
>> Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
>> University of Liverpool http://www.liv.ac.uk/physics/hep/
Steve Jones [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42334
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
University of Liverpool http://www.liv.ac.uk/physics/hep/