Olivier van der Aa wrote:
Finally we found what the problem was. We could reproduce the maradona
error by running the sft ourselves with our rb. We have followed the
instructions found at:
http://goc.grid.sinica.edu.tw/gocwiki/SAM_Submission_Framework
What happens is that the sam tests seems to be using 1.8GB of virtual
memory. We had a memory limit on the virtual memory and the job was
killed after the publishing was done. We still have to understand why
that number is so high as reported by sge.
Cheers, Olivier.
> Peter Love wrote:
>> Ahh, the infamous maradona error. This is an annoying problem and
>> difficult to detect as it is a symtom of a bad WN/batch system. You'll
>> need to dredge WN and CE batch system logs looking for the bad node.
>> Your SFTs probably pass because they land on good nodes. Check here for
>> possible causes:
>> http://grid-deployment.web.cern.ch/grid-deployment/eis/docs/Maradona
>>
>
> The problem is that even if we are running only with one node we still
> have the situation that we can submit without problem but sam has a
> maradona error.
>
> We use shared home directories and we don't use ssh for copy back.
>
>> First check would be WN-CE passwdless ssh is ok. Check cpu-used, as this
>> shows black-hole nodes like a sore thumb. This class of problem
>> prompted our move towards a stateful config system (cfengine).
>>
> Olivier.
>
>> Peter
>>
>> Olivier van der Aa ([log in to unmask]) wrote:
>>> Dear All,
>>>
>>> We are having problems to have the sam tests running fine on our new
>>> ce (ce00.hep.ph.ic.ac.uk).
>>>
>>> The sam tests shows ok for each individual tests
>>> http://tinyurl.com/yjhnso but the logging and book keeping shows a
>>> maradona error (http://tinyurl.com/ydgv8b).
>>>
>>> We have used the rb sam is using (gdrb02.cern.ch) and we don't have a
>>> problem at all. We have mapped ourselves as ops on the ce and that
>>> works fine.
>>>
>>> We have biomed jobs running fine on the cluster...
>>>
>>> Any idea ?
>>>
>>> Cheers, Olivier.
>>> --
>>> - O. van der Aa - Imperial College London -
>>> - LT2 Technical Coordinator -
>>> - tel: +442075947810, +442071005426 -
>>> - SIP: [log in to unmask] -
>>> - fax: +442078238830 -
>>> - http://surl.se/agtu -
>
>
--
- O. van der Aa - Imperial College London -
- LT2 Technical Coordinator -
- tel: +442075947810, +442071005426 -
- SIP: [log in to unmask] -
- fax: +442078238830 -
- http://surl.se/agtu -
|