Hi Dave
Thanks for sharing the findings.
All
It seems to me that problems with WNs are becoming more common. Are the
recent cases coincidence, evidence that tests are getting stricter or
something else? Whatever the reason, it is good practice to improve
monitoring of WNs for potential problems. As background to a general
move to improve monitoring please see the monitoring talks (14:00-15:40)
at last week's GDB:
http://indico.cern.ch/conferenceDisplay.py?confId=8484
Thanks,
Jeremy
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of David Robson
> Sent: 11 June 2007 13:10
> To: [log in to unmask]
> Subject: Re: EFDA-JET failing SAM CE tests
>
> EFDA-JET has now passed the last 65 SAM CE tests. The problem was
taced
> to a faulty WN.
>
> Thanks to Yves and Stephen with their assistance
>
>
> Dave
>
>
>
> David Robson wrote:
> > I can confirm that PBS is transferring the outputs and the errors to
> > the CE,
> > and that NFS is working fine on all nodes. We seem to have passed
the
> > last SAM CE test,
> > so the problem seems to be sporadic.
> >
> > Dave
> >
> >
> > Burke, S (Stephen) wrote:
> >> Testbed Support for GridPP member institutes
> >>> [mailto:[log in to unmask]] On Behalf Of Yves Coppens said:
> >>> - Got a job held event, reason: Globus error 158: the job manager
> >>> could not lock the state lock file
> >>>
> >>
> >> Maybe check NFS mounts?
> >>
> >> Stephen
> >>
> >>
> >
|