On 8 Jul 2009, at 17:03, Maarten Litmaath wrote:
> Hi Stuart,
>
>>>> I've got an odd scenario - I'm in the process of testing out
>>>> some software, and I'm finding that the jobs are hanging on
>>>> the worker node.
>>>> I did some investigation, and it appears that glite-wms-job-
>>>> submit places the zipped input sandbox in the correct place,
>>>> but it remains there in an archive - i.e. it's not getting
>>>> unpacked.
>
> Could be this one:
>
> https://savannah.cern.ch/bugs/index.php?46235
Well, it _could_ be, but the notes made about it made it sound
probabilistic in nature, which doesn't match the behavior I've seen.
It's consistent for me (at least in end behavior - I can't rule out
two separate problems causing zipped ISB to tank at this stage),
seems independent of the order of the input sandbox (which results in
slightly different sized archives, as one might expect from
compression dictionaries), and shows the same behavior across almost
all the permutations of UI, WMS, LB and CE (2 of each, with the WMS,
LB and CE notionally identical).
Back of envelope sums indicate that if is a probabilistic failure,
the probability is likely at least 0.9something. Of course, I could
just be having a particularly unlucky day.
(My gut feeling is that, somehow, I'm including some thing in the
sandbox that's breaking things, but I've not found that by successive
elimination yet).
Which brings me back to my original question - if it was SIGSEGVing
in unpacking the ISB, that'd be logged somewhere, or at least there's
an option to cause it to log or report such problems? If so where/
what is it?
|