Ewan MacMahon wrote:
> I'd have thought the obvious thing would be to have the CE
> tell the batch system that a job is dead when the CE knows
> that it's dead - i.e. clean up the input files /and/ qdel the
> job.
>
Yes - the problem is solved if the CE cleans up fully (getting rid of
stale stageins and explicitly terminating the job). Ironically, problem
is also solved if the CE does nothing, because a stale proxy is just
another input file (as far as a job on a worker node is concerned). This
problem is only caused if the CE detects the stale proxy, gets rid of it
and leaves the job itself undisturbed.
So the question becomes - is a CE expected to "interfere" with jobs in
its queue that authenticated properly when they arrived, but which
(after a queue delay) ultimately have stale proxies (sorry for the
tortuous prose there - queues and batch processing do not lend
themselves to elegant language). If yes, then the job should be qdel'd.
If not, then no detection of stale proxies should be done.
Have a nice weekend!
Steve
--
Steve Jones [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42334
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
University of Liverpool http://www.liv.ac.uk/physics/hep/
|