On 11/08/2012 03:23 PM, Andreas Gellrich wrote:
> Hi *,
> We migrated our CREAM-CEs to EMI-2/SL58, most recent version.
>
> We noticed that the CREAM-CEs sometimes resubmits jobs which have
> already finished.
>
> This is visible since the worker node which receives a resubmitted
> jobs can not download (gridftp) the input sandbox because it was
> deleted by the CREAM-CE after the end of the (first) job. The job ends
> up as W (waiting) in torque and stays there forever.
Hi Andreas,
This looks like a familiar issue. Maybe the job comes through some WMS
and winds up on a CREAM server. The CREAM Server hands to to some batch
server (torque).The job gets queued.The proxy expires, and CREAM removes
the files. The job gets to the front of the queue and runs. The job
can't load its files (e.g. proxy). The batch system (torque) puts the
job in W state. The batch system holds it for a while then tries it
again. So the job just goes round and round. Please see
https://ggus.eu/tech/ticket_show.php?ticket=72506
But it is odd that you have not seen it before. Maybe it's something new.
Steve
> There are also hints that jobs are resubmitted although the first submit
> still runs.
>
> This showed up immediately after we put the EMI-2-CREAM-CE into
> operations. We do not see and have never seen this with the last
> remaining glite-CREAM-CE.
>
> Any ideas or similar observations?
>
> Thanx
> Andreas
>
> # Andreas Gellrich
> # DESY IT / Grid Computing
> # 2b/317, Notkestr. 85, D-22607 Hamburg, +49 40 8998 2732
--
Steve Jones [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42334
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
University of Liverpool http://www.liv.ac.uk/physics/hep/
|