Hi,
Recently my site began to experience the following problem.
When a job is submitted to my site with command:
edg-job-submit --vo dteam -r
ce.egee.man.poznan.pl:2119/jobmanager-lcgpbs-short hostname.jdl
the edg-job-status command shows the following output:
Current Status: Ready
Status Reason: unavailable
The job stays in "ready" state, until the following message appears:
Current Status: Aborted
Status Reason: Job RetryCount (3) hit
From edg-job-get-logging-info I assume that the job is not even
successfully submited to lrms:
Event: Transfer
- destination = LRMS
- result = FAIL
- source = LogMonitor
- timestamp = Wed Mar 30 10:43:11 2005
And in log files for globus-gatekeeper I find:
Failed reading length 0
GSS authentication failure
globus_gss_assist token :3: read failure: Connection closed
Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003
--
I can see globus-jobmanager processes dealing with the submitted
job running on my CE, but they seem to be hanging. No activity can be
noticed on WNs.
FYI:
I have valid CA rpms installed (0.27).
Some time ago everything was going fine (the last successfull job
submission was on March 24th).
I found no reasonable explanation for this problem. Does anyone know the
reason for such behaviour?
Thanks,
Piotr
--
Piotr Siwczak <[log in to unmask]>
System Administrator
Poznan Supercomputing and Networking Center
Supercomputing Department
(www.eu-egee.org <[log in to unmask]>)
--
|