JISCMail - TB-SUPPORT Archives

Hi John,

These seem very sensible suggestions indeed ...

All the best,
david

Gordon, JC (John) wrote:
> The stageout issues take me back to the early days of LEP when CSF was
> developing (yes, OK, I am old). Nodes started another job when the
> previous one entered its output phase and was copying output across the
> network. I don't think WMS can do much to push another job. This issue
> is bigger than just staging the sandbox back to WMS, jobs often need to
> send their data elsewhere and not everyone has async solutions for this.
> The options I see are:-
> 
> a) local batch system - LSF can start more jobs when cpu load drops.
> Obviously a risk if jobs stall at the start when stageing in. What can
> other batch systems do?
> 
> b) Pilot jobs - obviously they can know enough to start another job at
> the appropriate time but launching payloads other than serially
> introduces opportunities for interference and difficulties in cleaning
> up.
> 
> John
> 
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Graeme Stewart
>> Sent: 19 October 2008 11:01
>> To: [log in to unmask]
>> Subject: Re: [Fwd: Jobs idling on transfers..]
>>
>> On Sun, Oct 19, 2008 at 11:04 AM, Coles, J (Jeremy)
>> <[log in to unmask]> wrote:
>>> Hi Graeme
>>>
>>>>> Which VO are the jobs running under?
>>>> Unless I'm mistaken Kostas has pulled out code from the RB/WMS job
>>>> epliogue wrapper. So the VO is not really relevant.
>>> I think it is relevant from a user education standpoint, rather than
>>> simply one of catching inefficient jobs at the batch system.
>> No it's not. If it's user education that would be teaching them "don't
>> use the WMS, it's rubbish and it can't get your job outputs back to
>> you..."
>>
>> :-)
>>
>> Graeme
>>
>> --
>> Dr Graeme Stewart              http://www.physics.gla.ac.uk/~graeme/
>> Department of Physics and Astronomy, University of Glasgow, Scotland