Hi Alessandra, David,
There are some requirements for documentation in this thread, which I'll
extract below. Your comment fits exactly with what I was saying earlier.
On 02/12/2014 03:56 PM, Alessandra Forti wrote:
>
> > I check here now and again and if all is well, that's it:
> https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi
>
> gridppnagios at Oxford while perhaps still useful will not be used for
> the availability anymore.
Up to now, I have used gridppnagios at Oxford instead of proper
requirements. If there are changes, I support the following.
a) experiments to decide what availability requirements are important
for them, then
b) experiments to document how the metrics to measure those requirements
are computed, and
c) experiments to declare the requirements to sites so that sites know
what to provide and
d) experiments to declare where the gathered metrics will be displayed.
The last step (d) is necessary for several reasons.
1) Sites need to be able to operationally check that they are complying
with the availability requirements and
2) Experiments will check the same data that the sites use (to avoid
ambiguity) and
3) Sites may challenge a metric is it is inaccurately gathered.
> The availaibility is now calculated on the experiments sam tests.
So they must have already done steps a. and b. So I'd like to document
what are the availability requirements, how they are measured and where
the measurements are displayed.
> There are experiments nagios instances (always linked from the
> monitoring pages). Unfortunately these nagios instances are not
> guaranteed they depend on the experiments manpower - WLCG and some of
> the experiments wanted to ditched them because they were not used. A
> nagios API to import the results from the SUM dashboard in a site
> local nagios is also in the planning so these experiment nagios
> instances might become less important in the future but it is still in
> development at PIC. Without nagios and API the monitoring to look at
> is the one linked by David.
That all relates to steps c. and d. The sites need to know what to
provide and what to check to see they are providing it.
In summary, for good documentation of this change, we really need to ask
the experiments to confirm what availability requirements exist, how
they are measured, and where can we read them to make sure they are right.
Then I need to put that in a document on the wiki for all to see.
Cheers,
Steve
--
Steve Jones [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42334
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
University of Liverpool http://www.liv.ac.uk/physics/hep/
|