Hi Luís,
>>An example job seems to start OK, but is found finished about 15 minutes later:
>>
>>-----------------------------------------------------------------------------
>>Event: Running
>>- Arrived = Sun Jul 12 01:19:21 2009 CEST
>>- Host = wms208.cern.ch
>>- Node = gt2 axon-g01.ieeta.pt:2119/jobmanager-lcgpbs
>>- Source = LogMonitor
>>- Src instance = unique
>>- Timestamp = Sun Jul 12 01:19:21 2009 CEST
>>- User = /DC=ch/DC=cern/OU=Organic Units/OU=[...]
>> ---
>>Event: Done
>>- Arrived = Sun Jul 12 01:34:07 2009 CEST
>>- Exit code = 1
>>- Host = wms208.cern.ch
>>- Reason = File not available.Cannot read JobWrapper output,
>> both from Condor and from Maradona.
>>- Source = LogMonitor
>>-----------------------------------------------------------------------------
>>
>>Does your batch system kill SAM jobs after 15 minutes?
>>
>
>
> I don't see why it should do that. I've not change anything in the batch
> system.
I found other jobs that failed after about 100 seconds or less.
Anyway, the last failure was in the early hours of Tuesday,
your CE has passed the SAM tests OK since many hours now:
was something changed, e.g. removal of a bad WN?
|