Hi Andrew, Pete etc,
This sounds very similar to something that Hugh Tallini did before he went
on holiday a week or so ago. We didn't advertise it because we wanted to
do some more testing (which it needs) and make the output more easily
understandable. Currently this tests every queue accessable from the RB
and if there is an associated storage element close by it will test the
ability to write to and read from it. The output from this tool can be
seen at http://www.hep.ph.ic.ac.uk/~dguser/Qstatus.html (and is linked
from the RB status page
http://www.hep.ph.ic.ac.uk/~dguser/diagnostics.html.
I am afraid that there might have been some duplication of effort here.
Hugh is still on holiday so I am not sure what to suggest here. We should
also have told the obvious people (Andrew, Steve, Gav etc) what we were
doing to avoid duplication... sorry.
I should also point out that I know that the RB monitoring pages have had
a few problems recently (they work fine provided the II has not jammed,
but become a mess if it has), I will try to sort them in the near future.
But for now, if there they look a mess then something (probably the II) is
not working.
All the best,
david
On Tue, 18 Feb 2003, Peter Clarke wrote:
> Andy
>
> Well done, as always.
>
> Yet again something useful done without needing a strategic
> planning meeting or a competitive tender judged by the PPARC
> Grid sterring committee. I dont know... not playing the game
>
> Seriously, a couple of things:
>
> 1) I assume you are in contact with Dave_C. There will be someone
> at IC working full time on resource usage monitoring. I presume
> this might be something they could use
>
> 2) Of all the black dots on the map - who will be the last
> to get filled in - im not sure if its the sites
> problem, or jou just havnt done them yet - a challenge
> to all back dots nevertheless.
>
> Pete
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]]On Behalf Of Andrew McNab
> Sent: 18 February 2003 15:30
> To: [log in to unmask]
> Subject: GridPP Monitoring
>
>
> Hi,
>
> Last week (after a bit of prompting) I hacked together a stop-gap
> monitoring map, which checks accessibility through the IC and EDG RBs, as
> well as via plain Globus in the way Gavin's does already. This is just
> intended as an interim measure, and gives us time to sort out a better
> solution whilst still having something to use in the meantime.
>
> The map itself is at http://www.gridpp.ac.uk/map/ (I'm still adding
> the names for each site and doing some tidying up, and not all sites
> details are correct yet - see below.)
>
> The notes at http://www.gridpp.ac.uk/map/notes.html explain how it
> works, and I've pasted them on to the end of this email. However, to
> get things going I need two things from each site:
>
> Your current preferred Globus gatekeeper hostname (if you're running the
> EDG software, this is your CE.)
>
> Your choice of site "label" (eg RAL-PRO) if it's different to the one
> I've got on the table below the map. (Sites that didn't already have
> a label, currently have their .ac.uk 3rd-level domain name.)
>
> It is also time to mail ukhepgrid with an encouraging announcement about
> the Testbed and an invitation to get sites online. I'd like to do that as
> soon as 1.4.4 settles down (eg that the CERN RB gets going again -
> tomorrow?) and then try and get as many green stars on the new map as we
> can in the next 2-3 weeks.
>
> Cheers,
>
> Andrew
>
> GridPP Monitoring notes
>
> The GridPP Monitoring map page is an extension of Gavin's Green Dot Map
> system. The new system is a simple way of checking the accessibility of
> GridPP sites via Globus, via the GridPP Resource Broker at Imperial and
> via the EDG RB at CERN.
>
> Jobs are submitted via the three alternative routes and if they execute
> successfully, call back to the GridPP webserver via HTTP with a job ID
> number. A script on the website periodically rebuilds the map and the
> table below depending on the time of the last successful callback.
>
> On the map, the colour and shape of each site's marker indicates its
> status:
>
> * Black dot - no responses from site.
> * Red dot - no responses within timeout period (1 hr)
> * Amber dot - Globus responses within timeout, but none via Resource
> Brokers.
> * Green dot - Response via GridPP RB within timeout, but none via EDG.
> * Green star - Response via EDG RB within timeout; GridPP RB and
> Globus status ignored.
>
> Globus job submission is done to a fixed list of Globus Gatekeepers, one
> per site. These should be the same machine as the EDG Computing
> Element. If you change your CE hostname, please tell us so we can update
> the list.
>
> GridPP and EDG RB submission is done using site name Environment labels,
> which each site must define for itself in its LCFG site-cfg.h file. If you
> are settihng up a site, you may use the label in the table, or choose
> something else. If so, please tell us or you won't go green! (If you think
> it likely your site may join the Development as well as the Production
> testbed - ie that you will run two sets of machines - then you may prefer
> to use -PRO as a suffix, eg RAL-PRO.)
>
> The jobs are run using Andrew's UKHEP certificate, so sites need to have
> "/O=Grid/O=UKHEP/OU=hep.man.ac.uk/CN=Andrew McNab" in their
> grid-mapfile. (I'm in the GridPP testbed, and the EDG iteam and wpsix VOs
> so this isn't normally a problem once you have the EDG software
> installed.)
>
> To tell us about updates or problems, please mail [log in to unmask]
>
|