Dear All,
You see what is and is not in the IC RB by looking at the monitoring page
http://www.hep.ph.ic.ac.uk/~dguser/diagnostics.html
This will also give you a links to whatever is being published in the
ldap.
So for example you can see that epcf36.ph.bham.ac.uk and
epcf37.ph.bham.ac.uk are in but only support Atlas and Alice (which is up
to them). So neither Owen or me can run jobs there (although Andrew McNab
can).
> Birmingham: Not being returned as a site for a dg-job-list-match on a
> simple "Hello World" job. Is this site included in the IC RB?
>
> UCL: It doesn't look like the GIIS is advertising the RunTimeEnvironment
> "UCL" (given on the www.gridpp.ac.uk/map webpage as the site label).
> This will mean jobs submitted with the jdl included
>
> Requirements = IsMember(other.RunTimeEnvironment,"UCL")
Is this how the map does it? By requiring a RunTimeEnvironment rather than
an II search looking CE names?
>
> return no matching resources. Does the site-cfg.h file need updating to
> include this RunTimeEnvironment?
>
I have looked at some of the other amber sites (I have been specifying a
specific que incase there is a RunTimeEnvironment problem):
Queue status
tuber5.phy.bris.ac.uk:2119/jobmanager-pbs-bseq OutputReady
pc31.hep.ucl.ac.uk:2119/jobmanager-pbs-workq OutputReady
heplnx11.pp.rl.ac.uk:2119/jobmanager-pbs-S OutputReady
hepgrid2.ph.liv.ac.uk:2119/jobmanager-pbs-workq problems
farm003.hep.phy.cam.ac.uk:2119/jobmanager-pbs-Sq problems
tbce01.physics.ox.ac.uk:2119/jobmanager-pbs-workq problems
bfa.hep.ph.ic.ac.uk:2119/jobmanager-pbs-bulk OutputReady
(Only included as a check)
for Oxfors I found the job failed with the following messages in the
logging info:
Event Type = JobMatch
dg_jobId =
https://gm03.hep.ph.ic.ac.uk:7846/155.198.216.137/14104528637531?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Tue Mar 4 14:19:52 2003
Job Match Destination =
tbce01.physics.ox.ac.uk:2119/jobmanager-pbs-workq
Host Name = gm03
Source Program = ResourceBroker
---
Event Type = JobAccept
dg_jobId =
https://gm03.hep.ph.ic.ac.uk:7846/155.198.216.137/14104528637531?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Tue Mar 4 14:22:45 2003
Job Accept New Id = 38392.
Job Accept Source = ResourceBroker
Host Name = gm03
Source Program = JobSubmissionService
---
Event Type = JobTransfer
dg_jobId =
https://gm03.hep.ph.ic.ac.uk:7846/155.198.216.137/14104528637531?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Tue Mar 4 14:22:45 2003
Job Transfer Dest = JobSubmissionService
Job Transfer Result = OK
Host Name = gm03
Source Program = ResourceBroker
---
Event Type = JobFail
Job Fail Action = 0
dg_jobId =
https://gm03.hep.ph.ic.ac.uk:7846/155.198.216.137/14104528637531?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Tue Mar 4 14:23:31 2003
Job Fail Reason = authentication with the remote server failed
Host Name = gm03
Source Program = JobSubmissionService
Jobs to Cambridge and Liverpool follow the same pattern. This also
matches what was being seen at slac yesterday.
Any ideas?
All the best,
david
|