Hello,
I have tried last Friday to submit jobs to the new EDG site in Glasgow,
and although my jobs ran succesfully, they were systematically declared as
Failed, as you can see in the job logging in attachement.
I happen to follow the behaviour of my jobs in time with globus-job-run,
and found that the beginning is fine ( Job submitted, transfered to RB,
matched, transfered to CE, even Scheduled ) until the job starts.
In the minute the job starts, the dg-job-status goes to Done, in spite of
the fact that the job is still alive and running.
I would explain the Fail status by the fact that the job is still running
and therefore its output is not available.
Anybody (in the CC list) already experienced this ?
Best regards,
Frederic
**********************************************************************
LOGGING INFORMATION:
Printing info for the Job : https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
---
Event Type = JobAccept
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:00:50 2003
Job Accept New Id = RB assigned ID
Job Accept Source = UserInterface
Host Name = gm03
Source Program = ResourceBroker
---
Event Type = JobTransfer
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/OU=hep.phy.cam.ac.uk/CN=Frederic Brochu
Logging Level = System
Date (UTC) = Fri May 16 17:01:37 2003
Job Transfer Dest = ResourceBroker/gm03.hep.ph.ic.ac.uk:7771
Job Transfer Result = OK
Host Name = gppui04.gridpp.rl.ac.uk
Source Program = UserInterface
---
Event Type = JobMatch
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:01:40 2003
Job Match Destination = ce0-gla.scotgrid.ac.uk:2119/jobmanager-pbs-gridqs
Host Name = gm03
Source Program = ResourceBroker
---
Event Type = JobAccept
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:01:41 2003
Job Accept New Id = 1372.
Job Accept Source = ResourceBroker
Host Name = gm03
Source Program = JobSubmissionService
---
Event Type = JobTransfer
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:01:41 2003
Job Transfer Dest = JobSubmissionService
Job Transfer Result = OK
Host Name = gm03
Source Program = ResourceBroker
---
Event Type = JobAccept
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/OU=hep.phy.cam.ac.uk/CN=Frederic Brochu
Logging Level = System
Date (UTC) = Fri May 16 17:07:28 2003
Job Accept New Id = https://ce0-gla.scotgrid.ac.uk:33001/3416/1053104843/
Job Accept Source = gm03.hep.ph.ic.ac.uk
Host Name = ce0-gla.scotgrid.ac.uk
Source Program = GlobusJobmanager
---
Event Type = JobScheduled
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/OU=hep.phy.cam.ac.uk/CN=Frederic Brochu
Logging Level = System
Date (UTC) = Fri May 16 17:07:29 2003
Host Name = ce0-gla.scotgrid.ac.uk
Source Program = GlobusJobmanager
Job Scheduled Reason = initial,jobid=89614.masternode
---
Event Type = JobTransfer
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:07:34 2003
Job Transfer Dest = ce0-gla.scotgrid.ac.uk:2119/jobmanager-pbs
Job Transfer Result = OK
Host Name = gm03
Source Program = JobSubmissionService
---
Event Type = JobDone
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/OU=hep.phy.cam.ac.uk/CN=Frederic Brochu
Logging Level = System
Date (UTC) = Fri May 16 17:08:04 2003
Host Name = ce0-gla.scotgrid.ac.uk
Source Program = GlobusJobmanager
---
Event Type = JobRun
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:12:56 2003
Job Run Node = ce0-gla.scotgrid.ac.uk
Host Name = gm03
Source Program = JobSubmissionService
---
Event Type = JobRun
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:12:57 2003
Job Run Node = ce0-gla.scotgrid.ac.uk:2119/jobmanager-pbs-gridqs
Host Name = gm03
Source Program = ResourceBroker
---
Event Type = JobFail
Job Fail Action = 0
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:12:57 2003
Job Fail Reason = Cannot read JobWrapper output, both from Condor and from Maradona.
Host Name = gm03
Source Program = JobSubmissionService
---
Event Type = JobTransfer
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:12:57 2003
Job Transfer Dest = ResourceBroker
Job Transfer Result = OK
Host Name = gm03
Source Program = JobSubmissionService
---
Event Type = JobAccept
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:12:58 2003
Job Accept New Id = Sent back by JSS
Job Accept Source = JobSubmissionService
Host Name = gm03
Source Program = ResourceBroker
---
Event Type = JobAbort
Job Abort Reason = Failure while executing job wrapper.
dg_jobId = https://gm03.hep.ph.ic.ac.uk:7846/130.246.183.172/17004969387201?gm03.hep.ph.ic.ac.uk:7771
Certificate Subject = /O=Grid/O=UKHEP/CN=host/gm03.hep.ph.ic.ac.uk
Logging Level = System
Date (UTC) = Fri May 16 17:12:58 2003
Host Name = gm03
Source Program = ResourceBroker
**********************************************************************
|