The GOC report this morning takes a different form. I have recently
concentrated on setting up the static version of MapCenter developed by
Franck Bonnassieux at Lyon to correctly reflect the state of services which
the LCG1 sites are currently providing, and I now want to test, with your
help, whether this is providing useful information.
In case you are not familiar with MapCenter this is what it does. A host
may be tested in three ways: the usual ping, a port scan of specified ports,
and a variety of other checks grouped under URLs (this latter is not used by
the LCG1 GOC at present). Each check results in a pass or a fail, and
MapCenter presents the current status of all hosts in a variety of displays.
The status of a host is represented by the results from all the tests: all
tests passed is shown by a green 'OK'; all tests failed by a red 'X'; and a
mixture of results by a brown dot. Individual hosts are grouped in sites,
and the status of a site is represented by the best and worst state of the
hosts in that site. Sites similarly are grouped into countries whose state
is represented by the best and worst state of the sites in that country.
Have a look at http://mapcentre.rl.ac.uk/fullview.html which shows this
hierarchy of states. The nature and result of the individual tests is shown
at the right of each host name. Any test can either pass (shown in green)
or fail (shown in red), and the text shows either 'icmp' (ping test) or the
name of the service (port scan test) which was scanned.
Other views present this information in different ways, best explored by
yourself, but one further view will be of interest to sysadmins. Click on
the hostname in the view referenced above and you will see details of the
tests performed on that host and below that a history showing all the recent
changes of state of that host (called Alarms History). This history will
show when tests failed and when they started working again (to a resolution
of 10 minutes).
Comments on the general usefulness and correctness of individual sites to
the list please.
OK, so what is MapCenter showing this morning:
Prague: all services up
IN2P3: not yet operational
(when a test has never been passed since the last restart it
is shown in
purple, and the state by a blue '?')
FZK: the SE mds (ldap) service is not responding (port 2135)
the CE logd service is not responding (port 9002)
(the LCFG and UI states are not of interest and these states
are not
propagated higher, shown by the '<-?' symbol at the right)
Budapest: all services up (but see the history of some hosts, Gergo)
INFN: the SE gsiftp and mds services are not responding (ports
2811 2135)
ICEPP: all services up
SINP: all services currently down; history shows they come and go
Krakow: all services up
Barcelona: all services up (the LCFG server should not propagate up - a
bug?)
CERN: I've tried to divide the hosts into production and testbed;
perhaps
Emanuele could check this and let me know if it is right
before I comment.
Taiwan: all service up
RAL: the SE gsiftp and mds services are not responding (ports
2811 2135)
(fluctuating history)
the PROXy service is not responding (port 7512)
BNL: not yet operational
FNAL: all services up
HTH
Trevor
.lf n25
Dr Trevor Daniels
c/o CCLRC eSC Department Phone: (+44)|(0) 1235 778093
Rutherford Appleton Laboratory Fax: (+44)|(0) 1235 446626
Chilton, DIDCOT, Oxon, OX11 0QX, UK Email: [log in to unmask]
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended recipients do
not take action on it or show it to anyone else, but return this email to
the sender and delete your copy of it.
|