Hi Kashif,
Thanks for the suggestion - I was going to do something like this,
but wanted to make sure that there weren't any nasty gotchas.
Cheers,
John
On 13/06/2012 17:30, Kashif Mohammad wrote:
> Hi John
>
> It is a longstanding issue with CREAM/PBS and Stephen opened a detailed ticket (https://ggus.eu/tech/ticket_show.php?ticket=72506 ) but it is not fixed yet. At Oxford we regularly kill jobs which are either in W state or in Q state but assigned to a WN.
>
> for job in $(qstat | grep " Q " | cut -d. -f1) ; do if ( qstat -f ${job} | grep exec>>/dev/null) ; then qdel -p ${job} ; fi ; done
>
> It will kill any job which is in Q state but assigned to a WN.
>
> One of the issue we have noticed is that some time jobs from lower priority VO/users has to stay in queue for long enough to get its proxy expired and CREAM doesn't handle this situation properly.
>
> Cheers
> Kashif
>
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:[log in to unmask]] On Behalf Of John Hill
> Sent: 13 June 2012 16:37
> To: [log in to unmask]
> Subject: Cleaning up the PBS/Torque queues
>
> While investigating the recent supposed CVMFS and analysis job issues at
> Cambridge, I came across PBS errors in /var/log/messages on the WNs
> which reported copy errors when getting files from the CREAM Sandbox
> area. Further digging has identified these as old pilot jobs (some from
> August last year!) which are still lurking in the PBS queue and are
> being periodically restarted. "showq" indicates that we have about 3500
> of these relic jobs.I was wondering whether there was there a
> recommended way to tidy up the queue?
>
> John
|