:-)
Feel free to contribute:
https://twiki.cern.ch/twiki/bin/view/LCG/BestErrorMessages
:-)
Gergo
Graeme Stewart a écrit :
> Ah, so it says I/O error. but this means it's out of memory. Right...
> I'll now chalk up the hours Mark and I spent trying to debug file
> transfers to the CE as gaining a deeper understanding of the Zen of
> Globus error messages. Or is that actually Alice in Wonderland?
>
> 'When I use a error message,' Humpty Dumpty said, in rather a scornful
> tone, `it means just what I choose it to mean -- neither more nor less.'
>
> `The question is,' said Alice, `whether you can make error messages mean
> so many different things.'
>
> `The question is,' said Humpty Dumpty, `which is to be master -- that's
> all.'
>
> Alice was too much puzzled to say anything; so after a minute Humpty
> Dumpty began again. `They've a temper, some of them -- particularly
> Globus errors: they're the proudest - batch system errors you can do
> anything with, but not Globus errors - however, I can manage the whole
> lot of them! Impenetrability! That's what I say!'
>
> `Would you tell me please,' said Alice, `what that means?'
>
> Graeme
>
> PS. https://savannah.cern.ch/bugs/index.php?25048 offers a path to
> atonement, if not enlightenment...
>
> On 26 Mar 2007, at 16:17, Maarten Litmaath wrote:
>
>> Maarten Litmaath wrote:
>>
>>> Mark Nelson wrote:
>>>> Hello
>>>>
>>>> I have a problem with my site, the site has been working perfectly
>>>> for months, this morning at ~ 1am it suddenly stopped accepting
>>>> jobs, the following have been checked -
>>>>
>>>> 1. /tmp and /scratch (TMP directory for grid jobs) are not full
>>>> 2. Home directories are not full and they are no issues with quota's
>>>> 3. Can submit to the batch system via qsub on the CE.
>>>> 4. edg-job-submit submits job.
>>>> 5. edg-job-status returns the following error -
>>>> "cannot plan: BrokerHelper: no compatible resource."
>>> For the "ops" VO SAM reports this:
>>> Globus error 3: an I/O operation failed
>>> Did you look at this Wiki entry:
>>> http://goc.grid.sinica.edu.tw/gocwiki/Globus_error_3
>>> In particular, what is the current memory usage on your CE?
>>
>> Indeed, you have a huge number of globus-job-manager processes stuck
>> like this:
>>
>> ----------------------------------------------------------------------------------
>>
>> atlas032 9591 0.1 0.1 5008 2844 ? S 15:58 0:00
>> globus-job-manager
>> -conf /opt/globus/etc/globus-job-manager.conf -type lcgpbs -rdn
>> jobmanager-lcgpbs
>> -machine-type unknown -publish-jobs
>> atlas032 9615 0.5 0.2 7612 6060 ? S 15:59 0:00 \_
>> /usr/bin/perl
>> /opt/globus/libexec/globus-job-manager-script.pl -m lcgpbs -f
>> /tmp/gram_EHLVFj -c
>> cache_cleanup
>> ----------------------------------------------------------------------------------
>>
>>
>> Your home directories are NFS-automounted: were there any problems
>> with NFS or the
>> automounter recently?
>>
>> Can you do the following:
>>
>> lsof -p 9615 -o lsof.out
>> strace -p 9615 -o strace.out
>>
>> Interrupt the strace after some time and send me the output of both
>> commands.
>>
>> Please leave a few of those stuck processes around (e.g. one per grid
>> account)
>> and kill the rest.
>
> --
> Dr Graeme Stewart - http://wiki.gridpp.ac.uk/wiki/User:Graeme_stewart
> GridPP DM Wiki - http://wiki.gridpp.ac.uk/wiki/Data_Management
> ScotGrid - http://www.scotgrid.ac.uk/ http://scotgrid.blogspot.com/
>
>
>
|