Peter Love wrote:
> Ahh, the infamous maradona error. This is an annoying problem and
> difficult to detect as it is a symtom of a bad WN/batch system. You'll
> need to dredge WN and CE batch system logs looking for the bad node.
> Your SFTs probably pass because they land on good nodes. Check here for
> possible causes:
> http://grid-deployment.web.cern.ch/grid-deployment/eis/docs/Maradona
>
The problem is that even if we are running only with one node we still
have the situation that we can submit without problem but sam has a
maradona error.
We use shared home directories and we don't use ssh for copy back.
> First check would be WN-CE passwdless ssh is ok. Check cpu-used, as this
> shows black-hole nodes like a sore thumb. This class of problem
> prompted our move towards a stateful config system (cfengine).
>
Olivier.
> Peter
>
> Olivier van der Aa ([log in to unmask]) wrote:
>> Dear All,
>>
>> We are having problems to have the sam tests running fine on our new ce
>> (ce00.hep.ph.ic.ac.uk).
>>
>> The sam tests shows ok for each individual tests
>> http://tinyurl.com/yjhnso but the logging and book keeping shows a
>> maradona error (http://tinyurl.com/ydgv8b).
>>
>> We have used the rb sam is using (gdrb02.cern.ch) and we don't have a
>> problem at all. We have mapped ourselves as ops on the ce and that works
>> fine.
>>
>> We have biomed jobs running fine on the cluster...
>>
>> Any idea ?
>>
>> Cheers, Olivier.
>> --
>> - O. van der Aa - Imperial College London -
>> - LT2 Technical Coordinator -
>> - tel: +442075947810, +442071005426 -
>> - SIP: [log in to unmask] -
>> - fax: +442078238830 -
>> - http://surl.se/agtu -
--
- O. van der Aa - Imperial College London -
- LT2 Technical Coordinator -
- tel: +442075947810, +442071005426 -
- SIP: [log in to unmask] -
- fax: +442078238830 -
- http://surl.se/agtu -
|