Hi Winnie
SAM Nagios has two set of tests for ARC CE. One group of test is submitted directly to the ARC CE which is named as org.nordugrid.* . lcgce01.phy.bris is passing this group of test. Availability and reliability calculation is based on this group of test. Other group of test is emi.cream* glexec tests and it is submitted through WMS. Your ARC CE is failing glexec tests and it seems that WMS is not finding the resource. You are unlikely to find anything in logs as job is not reaching your ARC CE. Try to submit few jobs through WMS to your ARC CE and see whether it succeeds?
egi page is showing green because glexec test is not used for availability and reliability calculation.
Cheers
Kashif
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Winnie Lacesso
> Sent: 16 December 2015 12:14
> To: [log in to unmask]
> Subject: Help to debug ARC-CE?
>
> Dear All,
>
> Our ARC-CE has been failing gridppnagios tests about 13hrs now with error
>
> emi.cream.glexec.CREAMCE-JobState-/ops/Role=pilot CRITICAL 12-16-2015
> 11:54:13 0d 0h 6m 25s 2/2 CRITICAL: [Waiting->Cancelled [timeout/dropped]]
> 'BrokerHelper: no compatible resources'.
> https://lcglb01.gridpp.rl.ac.uk:9000/e5kEIXe_3hZHhYCKOw90PQ
>
> (or it's lcglb02.gridpp.rl.ac.uk or a hep.ic.ac.uk node, so it's not
> lcglb01.gridpp.rl.ac.uk)
>
> emi.cream.glexec.CREAMCE-JobSubmit-/ops/Role=pilot CRITICAL 12-16-2015
> 11:54:13 0d 13h 16m 47s 2/2 CRITICAL: [Waiting->Cancelled
> [timeout/dropped]]
> 'BrokerHelper: no compatible resources'.
> https://lcglb01.gridpp.rl.ac.uk:9000/e5kEIXe_3hZHhYCKOw90PQ
>
> It's the only UK ARC-CE doing that so it's not "upstream" problem.
>
> lcgce01 bdii & all things we check *seem* healthy. I'm not that familiar yet
> with ARC-CE, my colleague who is more familiar with it says the errors I point
> out from various logfiles look like "normal errors" (sigh, need to get
> acquainted with "normal" ARC-CE errors...)
>
> The ARC-CE itself is full of running jobs & has lots queued.
>
> Contradictorily (sp?!), it's fully green in
> https://gridppnagios.physics.ox.ac.uk/myegi monitoring!!
>
> One suggestion was that the ops jobs are just not getting a slot & timing out,
> the CE is too busy, that is, "no compatible resources" = "lcgce01 has no
> jobslot free for you, ops nagios test, sorry!"
>
> But, wouldn't that show up as fails in gridppnagios.physics.ox.ac.uk/myegi
> monitoring?
>
> Pointers/advice/enlightenment most welcome WinnieL / ARC-CE noob
|