Hi,
On 08/09/2011 13:10, Stuart Purdie wrote:
>> Are you using a custom nagios check command for the file count in
>> registry.npudir? A quite check of the default ones didn't seem to have that
>> functionality and I can always knock one up but if someone else already
>> has...
> Yep. Well, I say _we_ - really that was Dave Crook's work, and I'll leave it to him to expand on that.
I run it as a NRPE command - I've posted the bash script below. It does
a ls | wc --lines on the npudir and matches it against a pair of
thresholds. It's come in handy - we saw svr014 ramping up again and
caught it in time to let it clear itself out in a few minutes.
# Define exit statuses
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
crit_threshold=400
warn_threshold=5
entries=`ls /var/glite/blah/user_blah_job_registry.bjr/registry.npudir/
| wc --lines`
if [ $entries -gt $crit_threshold ]; then
echo "$entries found: npudir entries critical, more than
$crit_threshold"
exit $STATE_CRITICAL
elif [ $entries -gt $warn_threshold ]; then
echo "$entries found: npudir entries warning, more than
$warn_threshold"
exit $WARN_CRITICAL
else
echo "All OK"
exit $STATE_OK
fi
Cheers
Dave
|