Print

Print


On Fri, 15 Mar 2002 13:26:54 -0000, Traylen, SM (Steve)
<[log in to unmask]> wrote:

> Is there some information somewhere about what
> the ukmap on the gridpp site is looking at,
>
> RAL appears to be down this afternoon, not sure
> why though.
>
> http://ukmap.gridpp.ac.uk:8001/
>
>   Steve
>
>--
>Steve Traylen
>[log in to unmask]

Hi Steve,

the underlying java program tries to send a gram job request to the
gatekeeper in question.. in RAL's case, it's csf.rl.ac.uk

The trace overnight is patchy indicating that the job submission was failing
intermittently for some reason during the night..
There appear to be many reasons why this can happen, few of which the
globus toolkit (TM) software properly trap, sufficient for you to recognize
them.. using the CoG API gicves you more information, but not always enough.

The load average increased during this time, which i've seen before can
be a sympton of disconnected globus-jobmanagers hanging about on the system;
the jobmanaegrs don't appear to have a proper timeout, so just hang around
on the gatekeeper machine around if the connection to the client dies.

-- but the real reason it stayed red was because my machine crashed ;-)
sorry.. the system has been restarted now..

cheers,
gav

ps. has the QMUL gatekeeper changed / or is it really down?? The gatekeeper
is not updated from LDAP (and there's no effort here to code that up just
now) so any changes in site gatekeeper configuration (eg. moving the
gatekeeper) should be sent to me please !

pps. any other sites curently white.. please contact me as well..