I wonder if this might be affecting us as well here.
We are getting
Status info for the Job : https://boswachter.nikhef.nl:9000/n55nav7KgILFWRkoIoJD8Q
Current Status: Aborted
Status Reason: Cannot plan: BrokerHelper: Problems querying the information service boswachter.nikhef.nl
reached on: Fri Jan 28 10:56:41 2005
as well as "no compatible resource" messages when jobs are being
submitted. Not all the time, I think the frequency is something like a
few percent of all jobs. Maybe some rogue GIIS is not responding
quickly enough, or is polluting the II with wrong info?
Used to be you could take down the II if you put in info that was
sufficiently corrupt.
While investigating the above messages I did see stuff like the
following:
grep -A 2 defin /opt/lcg/var/bdii/tmp/stderr.log
ldap_add: Undefined attribute type
additional info: GlueUIDEngine: attribute type undefined
ldif_record() = 17
ber_flush: 670 bytes to sd 3
--
ldap_add: Undefined attribute type
additional info: GlueUIDEngine: attribute type undefined
ldif_record() = 17
ber_flush: 368 bytes to sd 3
--
ldap_add: Undefined attribute type
additional info: GlueHostReconocedorPatrones: attribute type undefined
ldif_record() = 17
ber_flush: 746 bytes to sd 3
On Fri, 2005-01-28 at 13:10, Burke, S (Stephen) wrote:
> LHC Computer Grid - Rollout
> > [mailto:[log in to unmask]] On Behalf Of Maarten Litmaath said:
> > It turns out that both RBs were using the same BDII, which
> > got overloaded
> > due to a large number of job submissions with complex JDLs
> > each requiring a lot of BDII searches.
>
> That suggests that the search isn't done very efficiently, can't it do one
> search and cache the result?
>
> I've also noticed that matching a JDL with an input file is now taking
> several minutes, is that down to replica manager queries or BDII queries?
>
> Stephen
|