One of our worker nodes (tbgen01) has had close to zero load on it for the
last 24 hours, however it has constantly had active jobs. Currently it is
executing "globus-url-copy". I am wondering how many of these "active but
doing nothing" jobs there are, and if there is any way to allow other PBS
jobs to come in if these "inactive" type jobs are running.
See the cyan line at http://pptb01.physics.ox.ac.uk/graphs/load.gif (not yet
dynamically updated) for the load profile for the last several days.
Oxford active jobs can be viewed at:
http://tbce01.physics.ox.ac.uk/cgi-bin/lsh/qstat
And the state of the Worker Nodes is summarised at:
http://tbce01.physics.ox.ac.uk/cgi-bin/lsh/pbsnodes
In other news, I am working on a script which can run from the command line
or as a CGI which will return in text, XML, HTML, and "single field mode"
different status indicators. The current state of this can be seen at:
http://tbce01.physics.ox.ac.uk/cgi-bin/lsh/
Simply add one of the commands to the end of the URL and append options
after that to see what it does. I _believe_ it is quite safe, given the
filtering I do on all input parameters, and the mapping of a command name to
an executable via a hash table. I'd be happy for any feedback. I'm hoping
to come up with something which ties this in with RRDtool and Nagios for
Grid site monitoring -- Yes, I know, Yet Another Grid Monitoring Project...
Ian.
--
Ian Stokes-Rees [log in to unmask]
Particle Physics, Oxford http://www-pnp.physics.ox.ac.uk/~stokes/
|