Restart the BDII and see if it goes away.
service nordugrid-arc-ldap-infosys restart
Ste
On 2015-12-16 12:14, Winnie Lacesso wrote:
> Dear All,
>
> Our ARC-CE has been failing gridppnagios tests about 13hrs now with
> error
>
> emi.cream.glexec.CREAMCE-JobState-/ops/Role=pilot CRITICAL 12-16-2015
> 11:54:13 0d 0h 6m 25s 2/2 CRITICAL: [Waiting->Cancelled
> [timeout/dropped]]
> 'BrokerHelper: no compatible resources'.
> https://lcglb01.gridpp.rl.ac.uk:9000/e5kEIXe_3hZHhYCKOw90PQ
>
> (or it's lcglb02.gridpp.rl.ac.uk or a hep.ic.ac.uk node, so it's not
> lcglb01.gridpp.rl.ac.uk)
>
> emi.cream.glexec.CREAMCE-JobSubmit-/ops/Role=pilot CRITICAL 12-16-2015
> 11:54:13 0d 13h 16m 47s 2/2 CRITICAL: [Waiting->Cancelled
> [timeout/dropped]]
> 'BrokerHelper: no compatible resources'.
> https://lcglb01.gridpp.rl.ac.uk:9000/e5kEIXe_3hZHhYCKOw90PQ
>
> It's the only UK ARC-CE doing that so it's not "upstream" problem.
>
> lcgce01 bdii & all things we check *seem* healthy. I'm not that
> familiar
> yet with ARC-CE, my colleague who is more familiar with it says the
> errors
> I point out from various logfiles look like "normal errors" (sigh, need
> to
> get acquainted with "normal" ARC-CE errors...)
>
> The ARC-CE itself is full of running jobs & has lots queued.
>
> Contradictorily (sp?!), it's fully green in
> https://gridppnagios.physics.ox.ac.uk/myegi monitoring!!
>
> One suggestion was that the ops jobs are just not getting a slot &
> timing
> out, the CE is too busy, that is, "no compatible resources" = "lcgce01
> has no jobslot free for you, ops nagios test, sorry!"
>
> But, wouldn't that show up as fails in
> gridppnagios.physics.ox.ac.uk/myegi
> monitoring?
>
> Pointers/advice/enlightenment most welcome
> WinnieL / ARC-CE noob
|