Hi All,
I got a bit carried away writing a script to track down whose jobs were
killing my cluster and have ended up with a script for querying pbs that may
be of use to others.
It's still a little rough and ready but seems to work for me, if anyone else
thinks it may be useful I'll put in some time tidying it up.
Essentially it slurps up the output of 'qstat -f $(qselect -s R)' and
'pbsnodes -a' and sticks them into a couple of database tables in RAM and
then takes an SQL query to run on the tables.
For example you can run something like:
pbsquery --query 'select id,user,group,vmem from jobInfo where vmem >
3000000000 order by mem'
or
pbsquery --query 'select id,user,vmem,execHost,availmem from jobInfo,
nodeInfo where jobInfo.execHost = nodeInfo.name AND vmem > 5000000000 AND
availmem < 1000000000 order by availmem'
and if you are better with SQL than me I'm sure you can do more.
It's written in perl and requires perl-DBI from the sl repo and
perl-DBD-AnyData from epel.
A couple of warnings, the AnyData SQL dialect is quite basic so there's
quite a lot of fancy things that are not possible. It can use quite a lot of
memory, especially if you do joins so take care if you run it on the
pbs_server. At the moment the list of node properties is hard coded so
you'll have to edit the script to change it.
If it looks like it could be useful for people I'll polish it up a bit and
work on a adding a few more features like dynamic node property lists or I'd
quite like to add info about queued jobs and from 'diagnose -p' so you can
get priority info including the user and group. Ultimately, it would be nice
to add in an interactive perl shell so you can process the results of the
queries even more.
You can get the current version from:
http://hepunx.rl.ac.uk/~brew/code/pbsquery
Yours,
Chris (don't blame me if it breaks your cluster) Brew
|