fyi.
-------- Original Message --------
Subject: qtop: yet another tool to struggle with torque and PBS family systems
Date: Wed, 1 Sep 2010 01:20:29 +0200
To: <[log in to unmask]>
Hi,
it is not uncommon for shepherds of PBS-family based clusters to wander around
in the system, trying to understand where users' jobs and site resources graze.
Or, you may just try to understand if you are being hit by something like a
bug. (*)
Fortunately,
you are not alone in this world and others have same troubles as you do ;-).
Here is a script I wrote to try to get better control over any torque or pbs
instance:
https://twiki.cscs.ch/twiki/bin/view/DECH/QTOP
It provides a brief summary of your PBS and Nodes status, along with a job matrix.
In case you ask, the CPUids are the ones reported at command pbsnodes -a,
so you may wish to check that its output seems reasonable, before trying qtop.
It can be particularly useful if you assign colors to your scheduler's policy
groups,
so that you can visually check if your policy is honored; be prepared for
surprises.
Generally, I hope it can help you to keep the entropy of a torque system low,
by giving a fast overview of what is going on.
enjoy,
Fotis
(*)
http://www.supercluster.org/pipermail/torqueusers/2010-August/011198.html
-------- Original Message --------
For attendants of lcg-rollout it may be interesting that the default qtop.colormap
is tuned for plenty of common pool account names found in WLCG/EGEE clusters,
eg. red for ATLAS, green for CMS, yellow for OPS, blue for EGEE*VOs and so on.
enjoy,
Fotis
|