Hi Chris,
I guess you'll have seen this, but just to make a note of the deliberate
mistake in the code I sent round, the $WARN_CRITICAL should have been a
$STATE_WARNING - sorry about that.
Cheers,
Dave
On 08/09/2011 13:49, David Crooks wrote:
> Hi,
>
> On 08/09/2011 13:10, Stuart Purdie wrote:
>>> Are you using a custom nagios check command for the file count in
>>> registry.npudir? A quite check of the default ones didn't seem to have that
>>> functionality and I can always knock one up but if someone else already
>>> has...
>> Yep. Well, I say _we_ - really that was Dave Crook's work, and I'll leave it to him to expand on that.
> I run it as a NRPE command - I've posted the bash script below. It does
> a ls | wc --lines on the npudir and matches it against a pair of
> thresholds. It's come in handy - we saw svr014 ramping up again and
> caught it in time to let it clear itself out in a few minutes.
>
> # Define exit statuses
>
> STATE_OK=0
> STATE_WARNING=1
> STATE_CRITICAL=2
> STATE_UNKNOWN=3
>
> crit_threshold=400
> warn_threshold=5
>
> entries=`ls /var/glite/blah/user_blah_job_registry.bjr/registry.npudir/
> | wc --lines`
>
> if [ $entries -gt $crit_threshold ]; then
> echo "$entries found: npudir entries critical, more than
> $crit_threshold"
> exit $STATE_CRITICAL
> elif [ $entries -gt $warn_threshold ]; then
> echo "$entries found: npudir entries warning, more than
> $warn_threshold"
> exit $WARN_CRITICAL
> else
> echo "All OK"
> exit $STATE_OK
> fi
>
> Cheers
> Dave
|