Hi everyone,
Following on from this weeks meeting, I've added new monitoring targets to
the MonAMI instance that is running on my dCache head node and pool/door
nodes.
On the head node I am still monitoring the number of srmGet, Put and Copy
requests, but am now checking that the SRM, httpd, dcap and admin
processes are listening on the relevant ports. Have a look at the plots in
out ganglia here (look for titles beginning dcache-*):
http://mon.epcc.ed.ac.uk/ganglia/?r=hour&c=ScotGrid-Edinburgh&h=srm.epcc.ed.ac.uk
On the pool/door nodes I am looking at the number of TCP connections that
are in a (CLOSE_WAIT, CONNECTING, DISCONNECTING, ESTABLISHED) state as
well as checking if the gridftp and gsidcap processes are listening on the
relevant ports. If you want to see what it looks like, have a look for the
plots here:
http://mon.epcc.ed.ac.uk/ganglia/?r=hour&c=ScotGrid-Edinburgh&h=pool1.epcc.ed.ac.uk
More work still has to be done to extend the monitoring out to more
targets, but this is a start. I would like to use MonAMI to report the
status of the exisiting targets to Nagios, this way alerts could be
raised when certain thresholds are met (i.e. number of CLOSE_WAIT
connections gets too high, due to the bug in dCache).
You can contact Paul Millar or myself if you want more information. I will
update the wiki entry with this new information.
https://www.gridpp.ac.uk/wiki/MonAMI_dCache_plugin
Cheers,
Greig
--
=======================================================================
Dr Greig A Cowan http://www.ph.ed.ac.uk/~gcowan1
School of Physics, University of Edinburgh, James Clerk Maxwell Building
TIER-2 STORAGE SUPPORT PAGES: http://wiki.gridpp.ac.uk/wiki/Grid_Storage
=======================================================================
|