Print

Print


Earlier to see if our RB was in shape after some suggestions it was not
I sent a job to each of queues.

57 qeueues in total, 42 were successful which is pretty good.

For the following that failed I tried the simplest thing I could do to try and show
if a fault was still there.

1) golias25.farm.particle.cz
2) hik-lcg-ce.fzk.de
3) lcg-ce.usc.cesga.es
4) lhc01.sinp.msu.ru
5) grid003.ft.uam.es

#######################################################################################
1) Destination:       golias25.farm.particle.cz:2119/jobmanager-lcgpbs-short
   Status Reason:     Job RetryCount (3) hit

   globus-job-run golias25.farm.particle.cz/jobmanager-lcgpbs /bin/pwd : Fails

   with nooutput. Best guess is the ssh from the WN unchallenged back to CE
   does not work.


#######################################################################################
2) Destination:       hik-lcg-ce.fzk.de:2119/jobmanager-lcgpbs-infinite
   Status Reason:     7 authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain:  init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context


   globus-job-run hik-lcg-ce.fzk.de /bin/pwd , fails

   Had a quick look at the CRLs at
   http://grid.fzk.de/ca/gridka-crl.pem
   http://grid.fzk.de/ca/fzk-crl.pem

   both CRLs looks to have expired today.

   $ openssl crl -in gridka-crl.pem -noout -nextupdate
   nextUpdate=Sep 12 14:19:25 2003 GMT

   $ openssl crl -in fzk-crl.pem -noout -nextupdate
   nextUpdate=Sep 12 14:19:19 2003 GMT


######################################################################################

3) Destination:       lcg-ce.usc.cesga.es:2119/jobmanager-lcgpbs-long
   Status Reason:     Got a job held event, reason: Globus error 3: an I/O operation failed

   which is new one on me.

   Fork jobs okay but

   globus-job-run lcg-ce.usc.cesga.es:2119/jobmanager-lcgpbs /boot/pwd
   GRAM Job failed because an I/O operation failed (error code 3)
   Don't know.

#################################################################################

4) Destination:       lhc01.sinp.msu.ru:2119/jobmanager-lcgpbs-infinite
   Status Reason:     Job RetryCount (3) hit

   Same as 3)

################################################################################
5)  Destination:       grid003.ft.uam.es:2119/jobmanager-lcgpbs-long
    Status Reason:     Job RetryCount (3) hit

    Looks to be okay now.





--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/