Hi Dug,
> **********************************************************************
> LOGGING INFORMATION:
> [...]
> ---
> Event: Done
> - Arrived = Wed Jul 29 17:22:29 2009 BST
> - Exit code = 1
> - Host = svr022.gla.scotgrid.ac.uk<http://svr022.gla.scotgrid.ac.uk>
> - Reason = Job got an error while in the CondorG queue.
Normally that error has a more detailed explanation in the preceding record,
but not this time. Wiki:
http://goc.grid.sinica.edu.tw/gocwiki/Job_got_an_error_while_in_the_CondorG_queue
> - Source = LogMonitor
> - Src instance = unique
> - Status code = FAILED
> - Timestamp = Wed Jul 29 17:22:28 2009 BST
> - User = [...]
> ---
The job then got resubmitted to a different CE at the same site and failed
like this:
> ---
> Event: Transfer
> - Arrived = Wed Jul 29 17:23:00 2009 BST
> - Dest host = unavailable
> - Dest instance = /var/glite/logmonitor/CondorG.log/CondorG.1248818045.log
> - Dest jobid = unavailable
> - Destination = LRMS
> - Host = svr022.gla.scotgrid.ac.uk<http://svr022.gla.scotgrid.ac.uk>
> - Reason = 10 data transfer to the server failed
> - Result = FAIL
> - Source = LogMonitor
> - Src instance = unique
> - Timestamp = Wed Jul 29 17:22:59 2009 BST
> - User = [...]
> ---
Wiki:
http://goc.grid.sinica.edu.tw/gocwiki/10_data_transfer_to_the_server_failed
Another resubmission to the original CE (but a different queue) then failed
for yet a different reason:
> ---
> Event: Transfer
> - Arrived = Wed Jul 29 17:23:30 2009 BST
> - Dest host = unavailable
> - Dest instance = /var/glite/logmonitor/CondorG.log/CondorG.1248818045.log
> - Dest jobid = unavailable
> - Destination = LRMS
> - Host = svr022.gla.scotgrid.ac.uk<http://svr022.gla.scotgrid.ac.uk>
> - Reason = 7 authentication failed: GSS Major Status:
Unexpected Gatekeeper or Service Name GSS Minor Status Error Chain: init.c:499:
globus_gss_assist_init_sec_context_async: Error during context initialization
> - Result = FAIL
> - Source = LogMonitor
> - Src instance = unique
> - Timestamp = Wed Jul 29 17:23:30 2009 BST
> - User = [...]
> ---
Wiki:
http://goc.grid.sinica.edu.tw/gocwiki/7_authentication_failed
A final resubmission was pending for a while and then the job got canceled.
Might the CEs have had some issues during this interval?
If not, it would be good to know the output of "voms-proxy-info -all"
at the time the job was submitted. In particular: was the lifetime of
the VOMS attributes shorter than the lifetime of the proxy itself?
Using such proxies is asking for trouble:
https://savannah.cern.ch/bugs/index.php?28167
|