Print

Print


Do we need to disable checkpoint at all?

On Mon, Jul 11, 2011 at 10:10 AM, Alessandra Forti <[log in to unmask]
> wrote:

>  That'll be because the same proxy is shared over a group of jobs - when
>> the pilot factory submits jobs in batches, it does this to
>>  reduce demand on the proxy renewals, and to save copying the same file
>> multiple times.  (I think it's hard links on on the backend
>>  that are used...)
>>
>
>
> on my system the disappeared proxies this weekend belonged all to jobs that
> queued for too long.
>
> cheers
> alessandra
>
>
>
> On 11/07/2011 10:03, Stuart Purdie wrote:
>
>> On 11 Jul 2011, at 09:58, Stephen Jones wrote:
>>
>>  Ben Waugh wrote:
>>>
>>>> Looks like at the moment I've got a lot of missing stagein files but
>>>> they are job wrapper scripts, not proxies. Is this normal?
>>>>
>>>> Job Id: 277752.lcg-ce04.hep.ucl.ac.uk - missing stagein:
>>>> /opt/glite/var/cream_sandbox/**atlaspil/_C_UK_O_eScience_OU_**
>>>> Glasgow_L_Compserv_CN_graeme_**stewart_atla
>>>> s_Role_pilot_Capability_NULL_**pilatl04/76/CREAM768345533/**
>>>> CREAM768345533_jobWrapper.sh
>>>> On 08/07/11 16:59, Stephen Jones wrote:
>>>>
>>> It's senseless. We may have situations where (a) the proxy is gone (b)
>>> the job wrapper is gone (c) job wrapper and proxy are both gone. The script
>>> lists all missing stagein files. Your output shows that your proxy IS
>>> available, yet the job wrapper isn't (to prove it, you could put this
>>> "patch" into the script to explicitly list the stagein files that _are_
>>> there).
>>>
>> That'll be because the same proxy is shared over a group of jobs - when
>> the pilot factory submits jobs in batches, it does this to reduce demand on
>> the proxy renewals, and to save copying the same file multiple times.  (I
>> think it's hard links on on the backend that are used...)
>>
>> It's likely that another job in the same group of jobs is still running,
>> but that one died.
>>
>