On 13/06/12 17:30, Kashif Mohammad wrote:
> Hi John
>
> It is a longstanding issue with CREAM/PBS and Stephen opened a
> detailed ticket (https://ggus.eu/tech/ticket_show.php?ticket=72506 )
> but it is not fixed yet. At Oxford we regularly kill jobs which
> are either in W state or in Q state but assigned to a WN.
>
> for job in $(qstat | grep " Q " | cut -d. -f1) ; do if ( qstat -f ${job} | grep exec>>/dev/null) ; then qdel -p ${job} ; fi ; done
>
> It will kill any job which is in Q state but assigned to a WN.
>
> One of the issue we have noticed is that some time jobs from lower
> priority VO/users has to stay in queue for long enough to get its
> proxy expired and CREAM doesn't handle this situation properly.
Not more proxy problems :-(.
I know you raised the possibility of this happening at a previous
operations meeting - but I wasn't aware anyone had done tests to
demonstrate this as a problem in addition to the WMS not renewing proxies.
Can I have a GGUS ticket number please.
Thanks,
Chris
>
> Cheers
> Kashif
>
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:[log in to unmask]] On Behalf Of John Hill
> Sent: 13 June 2012 16:37
> To: [log in to unmask]
> Subject: Cleaning up the PBS/Torque queues
>
> While investigating the recent supposed CVMFS and analysis job issues at
> Cambridge, I came across PBS errors in /var/log/messages on the WNs
> which reported copy errors when getting files from the CREAM Sandbox
> area. Further digging has identified these as old pilot jobs (some from
> August last year!) which are still lurking in the PBS queue and are
> being periodically restarted. "showq" indicates that we have about 3500
> of these relic jobs.I was wondering whether there was there a
> recommended way to tidy up the queue?
>
> John
|