Hi,
This is the explanation from Gstat's help:
RGMAService_RChkFilter
RGMAService Check Filter: checks if RGMA services are running properly
Column Name: rgma
Values: . - RGMA Service not found for site
ok - RGMA Services are up
other - RGMA error found
This test queries RGMA 'Service' and 'ServiceStatus' table for a
list of RGMA services and their current status. This plugin
checks if all the services are "up" and if the last update
timestamp is not older than 10 minutes. Either of these
conditions raises an alert.
Services found in 'Service' table with MeasurementDate older
than 1 week are ignored and assumed to be decomissioned.
Gstat assumes that these service status entries are updated every minute, so
if it finds entries older than 10 minutes is raises a "note" alert. The
checks are done against UTC time.
For error message below was shown at 19:28 UTC on Gstat. R-GMA appears to
be functioning properly for this site. But there seems to be too long of a
delay between the query and the time the check is made. So instead of
comparing against the current time, the check should be done against the
time of the query. I will modify the code so that this is done instead.
Thanks for bring up this issue.
Alastair,
I also noticed that no entries are found for all sites when I do the
ServiceStatus latest query except for Taiwan-LCG2. This is also reflected
on the R-GMA browser: http://lcgic01.gridpp.rl.ac.uk:8080/R-GMA/index.html
However, the continuous+old queries are able to produce more results. Maybe
there is something wrong with the latest secondary producer now.
BR,
Min
MEASUREMENT TIME UP
SERVICE URI MESSAGE
2005-05-21 19:15:21 y
http://lcg00128.grid.sinica.edu.tw:8080/R-GMA/ArchiverServlet mem usage:
49%
2005-05-21 19:15:22 y
http://lcg00128.grid.sinica.edu.tw:8080/R-GMA/ConsumerServlet mem usage:
49%
2005-05-21 19:15:23 y
http://lcg00128.grid.sinica.edu.tw:8080/R-GMA/CanonicalProducerServlet mem
usage: 49%
2005-05-21 19:15:23 y
http://lcg00128.grid.sinica.edu.tw:8080/R-GMA/DBProducerServlet mem usage:
49%
2005-05-21 19:15:24 y
http://lcg00128.grid.sinica.edu.tw:8080/R-GMA/LatestProducerServlet mem
usage: 49%
2005-05-21 19:15:24 y
http://lcg00128.grid.sinica.edu.tw:8080/R-GMA/BrowserServlet mem usage: 49%
2005-05-21 19:15:24 y
http://lcg00128.grid.sinica.edu.tw:8080/R-GMA/StreamProducerServlet mem
usage: 49%
ALERT MESSAGES
ArchiverServlet has not been updated for 600!
ConsumerServlet has not been updated for 600!
CanonicalProducerServlet has not been updated for 600!
DBProducerServlet has not been updated for 600!
LatestProducerServlet has not been updated for 600!
BrowserServlet has not been updated for 600!
StreamProducerServlet has not been updated for 600!
test:: edg-rgma -c "latest select ServiceStatus.URI, ServiceStatus.up,
ServiceStatus.message, ServiceStatus.MeasurementDate,
ServiceStatus.MeasurementTime from ServiceStatus, Service where
ServiceStatus.URI=Service.URI and Service.site='Taiwan-LCG2'"
-----Original Message-----
From: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
On Behalf Of Alastair Duncan
Sent: Friday, May 20, 2005 3:18 PM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] ALERT MESSAGES - RGMA
I suspect that this is a problem with the GIIS scripts and timezones.
All sites seem to have this same alert. The service status tuples are
being published ok. The tuple Measurement Time is inserted in UTC Z
timezone(GMT) by the servicetool. Looking at the browser on lcgic01
service status table the latest times all look good ie. within the last
hour. So why the alert is happening leads me to suspect the scripts that
are doing the time comparison. I think that they may be comparing the
published time with the local time where the script is run(I'm assuming
here that this is in Taiwan) and that the times therefore may be way
out. This is just a guess as I haven't seen the scripts.
Alastair
On Fri, 2005-05-20 at 11:53, Juan Jose Pardo Navarro wrote:
> You can see the GIIS:
> http://goc.grid.sinica.edu.tw/gstat/
>
> UAM-LCG2
>
>
> RGMAService Check
> Result: note
> alert_history
> info.gif
>
> MEASUREMENT TIME UP
SERVICE URI MESSAGE
> 2005-05-20 10:29:14 y
http://grid013.ft.uam.es:8080/R-GMA/BrowserServlet mem usage: 27%
> 2005-05-20 10:29:54 y
http://grid013.ft.uam.es:8080/R-GMA/StreamProducerServlet mem usage: 24%
> 2005-05-20 10:29:45 y
http://grid013.ft.uam.es:8080/R-GMA/LatestProducerServlet mem usage: 24%
> 2005-05-20 10:29:34 y
http://grid013.ft.uam.es:8080/R-GMA/DBProducerServlet mem usage: 26%
> 2005-05-20 10:29:15 y
http://grid013.ft.uam.es:8080/R-GMA/CanonicalProducerServlet mem usage: 27%
> 2005-05-20 10:30:05 y
http://grid013.ft.uam.es:8080/R-GMA/ArchiverServlet mem usage: 24%
> 2005-05-20 10:29:24 y
http://grid013.ft.uam.es:8080/R-GMA/ConsumerServlet mem usage: 26%
>
> ALERT MESSAGES
> BrowserServlet has not been updated for 600!
> StreamProducerServlet has not been updated for 600!
> LatestProducerServlet has not been updated for 600!
> DBProducerServlet has not been updated for 600!
> CanonicalProducerServlet has not been updated for 600!
> ArchiverServlet has not been updated for 600!
> ConsumerServlet has not been updated for 600!
>
>
> test:: edg-rgma -c "latest select ServiceStatus.URI, ServiceStatus.up,
ServiceStatus.message, ServiceStatus.MeasurementDate,
ServiceStatus.MeasurementTime from ServiceStatus, Service where
ServiceStatus.URI=Service.URI and Service.site='UAM-LCG2'"
>
>
>
> El vie, 20-05-2005 a las 12:47, Alastair Duncan escribi:
> > I believe that there is a script which is being run which checks the
> > date and time of tuple inserted in to the Archiver for service and
> > service status. As the servicetool by default updates service data every
> > hour it is probably checking to see that the latest value is within the
> > last hour. It can be assumed that some service data is out of date for
> > some reason.
> >
> > Which machine is giving these alert messages? Is there any other
> > information associated with it?
> >
> > Alastair
> >
> > On Fri, 2005-05-20 at 11:14, Juan Jose Pardo Navarro wrote:
> > > Hi,
> > >
> > > I see:
> > >
> > > ALERT MESSAGES
> > > ArchiverServlet has not been updated for 600!
> > > ConsumerServlet has not been updated for 600!
> > > CanonicalProducerServlet has not been updated for 600!
> > > DBProducerServlet has not been updated for 600!
> > > LatestProducerServlet has not been updated for 600!
> > > StreamProducerServlet has not been updated for 600!
> > > BrowserServlet has not been updated for 600!
> > >
> > > That mean these alert messages?
> > >
> > > thanks for all.
|