JISCMail - TB-SUPPORT Archives

Hi Graeme,

they point both at the same space.

ta

cheers
alessandra

Graeme Stewart wrote:
> On Wed, Jul 22, 2009 at 14:05, Duncan Rand<[log in to unmask]> wrote:
>   
>> Alessandra Forti wrote:
>>     
>>> Hi Graeme,
>>>
>>>       
>>>> ANALY_MANC1: Bad. Build job problems, which seem to be on stage-out
>>>> after the code compiles, e.g.,
>>>> http://panda.cern.ch:25980/server/pandamon/query?job=1016502474.
>>>>
>>>>
>>>>         
>>> all data servers links are saturated.
>>>
>>>       
>>>> ANALY_MANC2: Fair. Running out of local stage space? "Error details:
>>>> pilot: Too little space left on local disk to run job: 2050048 kB
>>>> (need > 2097152 kB)" - maybe need to clean up disks or run less jobs
>>>> on these nodes? Otherwise ok.
>>>>
>>>>
>>>>         
>>> indeed this is a recurring problem that we will solve when we upgrade to
>>> SL5. We have only two cpus per node but only 20GB in scratch. It used to be
>>> enough but these analisys seem to copy far more than 10GB per job.
>>>       
>> We now get torque to create a per-job temporary directory which gets cleared
>> up at the end of the job:
>>
>> http://www.clusterresources.com/torquedocs21/users/2.2files.shtml#tmpdir
>>     
>
> For historical reasons (remember the EDG RB?) the pre-pilot wrapper
> script looks first for EDG_WL_SCRATCH, then TMPDIR:
>
> if [ -n "$EDG_WL_SCRATCH" ]; then
>     cd $EDG_WL_SCRATCH
> elif [ -n "$TMPDIR" ]; then
>     cd $TMPDIR
> fi
>
> I think this is now an anachronism and I will reverse the order, but
> be kind to me if in the next few hours the pilot wrapper does the
> wrong thing...
>
> Graeme
>
>   

-- 
No man ever steps in the same river twice, for it's not the same river and he's not the same man. (Heraclitus)

Northgrid Tier2 Technical Coordinator
http://www.hep.manchester.ac.uk/computing/tier2