Alessandra wrote a script which is here:
Her email to me:
I put the script in the repository. It is really stupid! it greps for
the
nodes names
and counts how many times a node appear in a day. If above 100 it prints
a
warning. The number can be changed we decided to be conservative but
sometimes pilots confuse things.
https://www.sysadmin.hep.ac.uk/svn/fabric-management/lcg_ce/black-holes-
finder.sh
Alessandra Forti
NorthGrid Technical Coordinator
University of Manchester
Cheers Pete
--
----------------------------------------------------------------------
Peter Gronbech Senior Systems Manager and Tel No. : 01865 273389
SouthGrid Technical Co-ordinator Fax No. : 01865 273418
Department of Particle Physics,
University of Oxford,
Keble Road, Oxford OX1 3RH, UK E-mail : [log in to unmask]
----------------------------------------------------------------------
-----Original Message-----
From: Testbed Support for GridPP member institutes
[mailto:[log in to unmask]] On Behalf Of Simon George
Sent: 12 October 2009 16:35
To: [log in to unmask]
Subject: black hole node detection
I vaguely recall hearing that someone had automated the detection of
"black hole" nodes, i.e. worker nodes that have a problem so that jobs
that start on them immediately fail and end. The node therefore sucks in
all quued jobs pretty quickly. I haven't been able to find it with
google or the tbsupport archive. Anyone out there know what I am looking
for?
Thanks,
Simon
|