Hello,
I managed to figure this one out but forgot to close the thread.
Checking bhist for the jobs revealed that they were being killed due to
hitting the memory limit, it appears that the admins who set up the
cluster depreciated the "#BSUB -M X" job option in favour of "#BSUB -R
'rusage[mem=X]". Altering our local lsf submit wrapper script to take
this into account fixed things.
Thanks for getting back to me, have a Good holiday everyone.
Matt
Massimo Sgaravatto - INFN Padova wrote:
> Looks like the job was killed
>
> What do glite-ce-cream.log* files say for this job ?
>
> What does bhist say for this job ?
>
>
> Cheers, Massimo
>
>
>
> On Tue, 21 Dec 2010, Matt Doidge wrote:
>
>> Hello,
>> On our CREAM CE we're seeing a number of atlas software installation
>> jobs fail with the user seeing the error:
>>
>> Terminated
>> Master process killed
>>
>> We can also see this error in the job's StandardError. In the job's
>> StandardOutput I can see the interesting last 2 lines:
>> job exit status = 1
>> jw exit status = 2
>>
>> I'm having trouble figuring out what's going on here, I don't see
>> anything out of the ordinary in the logs (but then I'm still very much
>> a CREAM noob). Other jobs from the same user ran without this error.
>> I'm not even sure if this is a CREAM error or a batch system problem,
>> and google wasn't any help either (which always worries me!). Has
>> anyone seen this before? Or perhaps have an idea of what else to look
>> for? At first I thought this was some sort of proxy error but the user
>> jobs are young (less then an hour old) and some of his jobs are
>> getting through.
>>
>> We're running glite-ce-cream-1.12.3-1 on top of an lsf-7.0.6-1 batch
>> system running on a seperate node.
>>
>> Thanks in advance,
>> Matt
>>
>
> \|||/
> -----------0oo----( o o )----oo0-------------------
> (_)
> INFN Sezione di Padova
> Via Marzolo, 8
> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> Tel: ++39 0498275908 Skype: massimo.sgaravatto
> Fax: ++39 0498275952
|