Earlier to see if our RB was in shape after some suggestions it was not
I sent a job to each of queues.
57 qeueues in total, 42 were successful which is pretty good.
For the following that failed I tried the simplest thing I could do to try and show
if a fault was still there.
1) golias25.farm.particle.cz
2) hik-lcg-ce.fzk.de
3) lcg-ce.usc.cesga.es
4) lhc01.sinp.msu.ru
5) grid003.ft.uam.es
#######################################################################################
1) Destination: golias25.farm.particle.cz:2119/jobmanager-lcgpbs-short
Status Reason: Job RetryCount (3) hit
globus-job-run golias25.farm.particle.cz/jobmanager-lcgpbs /bin/pwd : Fails
with nooutput. Best guess is the ssh from the WN unchallenged back to CE
does not work.
#######################################################################################
2) Destination: hik-lcg-ce.fzk.de:2119/jobmanager-lcgpbs-infinite
Status Reason: 7 authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain: init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context
globus-job-run hik-lcg-ce.fzk.de /bin/pwd , fails
Had a quick look at the CRLs at
http://grid.fzk.de/ca/gridka-crl.pem
http://grid.fzk.de/ca/fzk-crl.pem
both CRLs looks to have expired today.
$ openssl crl -in gridka-crl.pem -noout -nextupdate
nextUpdate=Sep 12 14:19:25 2003 GMT
$ openssl crl -in fzk-crl.pem -noout -nextupdate
nextUpdate=Sep 12 14:19:19 2003 GMT
######################################################################################
3) Destination: lcg-ce.usc.cesga.es:2119/jobmanager-lcgpbs-long
Status Reason: Got a job held event, reason: Globus error 3: an I/O operation failed
which is new one on me.
Fork jobs okay but
globus-job-run lcg-ce.usc.cesga.es:2119/jobmanager-lcgpbs /boot/pwd
GRAM Job failed because an I/O operation failed (error code 3)
Don't know.
#################################################################################
4) Destination: lhc01.sinp.msu.ru:2119/jobmanager-lcgpbs-infinite
Status Reason: Job RetryCount (3) hit
Same as 3)
################################################################################
5) Destination: grid003.ft.uam.es:2119/jobmanager-lcgpbs-long
Status Reason: Job RetryCount (3) hit
Looks to be okay now.
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
|