Hi,
David Groep posted this from our sysadmin account -- and it bounced.
Unfortunately the problem did not magically disappear upon bounce, so
here it is again.
J "super glue and hose clamps" T
Subject:
NIKHEF RB hardware problems: jobs lost
From:
"NIKHEF NDPF Grid System Admin Group (David Groep)"
<[log in to unmask]>
Date:
Thu, 28 Jul 2005 18:31:35 +0200
To:
[log in to unmask]
CC:
NDPF sysadmin <[log in to unmask]>, Willem van Leeuwen
<[log in to unmask]>, [log in to unmask], LHC Computer Grid - Rollout
<[log in to unmask]>
Hi Antonio, and others,
We should not have looked at that RB -- I have the "bad eye" or
something like that ...
Currently the disk is dying before my eyes and giving various
hardware errors. I'll have to physically replace the disk and
insert a new one now. The RB should be back in 30 minutes or so,
but all your jobs will have disappeared.
To all others using the NIKHEF RB bosheks.nikhef.nl: your jobs
are lost and you will have to resubmit (also the
output of any running jobs will be lost).
I'm quite sorry for the inconvenience.
Cheers,
DavidG.
NIKHEF NDPF Grid System Admin Group (David Groep) wrote:
> Hi Antonio,
>
> We will monitor the RB carefully ... do you periodically retrieve
> the output from the output sandboxes, so that - if needed - we can
> expire old sandboxes after, say, a two-week period?
>
> Cheers,
> DavidG.
>
> [log in to unmask] wrote:
>
>>> Number: 1149
>>
>>
>> Dear administrator,
>>
>> I write you as you are the contact for the administration of a
Biomed RB.
>> You know that Biomed is in the middle of its data challenge these
days, and they might be using your RB. This means that the may be
submitting hundreds of jobs through your RB and this can make the size
of the sandbox dir increase in comparison with the values you are used
to (although their jobs only ouput 2/3 MBs of data through the sandbox,
there is a great number of jobs).
>>
>> Biomed has experienced several RBs having problems with disk space
in the mentioned directory. Please be specially careful with the
monitoring of disk space for this directory while the data challenge lasts.
>>
>> The list of Biomed RBs follows.
>>
>> Site : CEA-DAPNIA-SACLAY
>> URI : gram://node04.datagrid.cea.fr:7772
>>
>> Site : CGG-LCG2
>> URI : gram://rb1.egee.fr.cgg.com:7772
>>
>> Site : HG-01-GRNET
>> URI : gram://rb.isabella.grnet.gr:7772
>>
>> Site : IN2P3-LAPP
>> URI : gram://lappgrid07.in2p3.fr:7772
>>
>> Site : NIKHEF-ELPROD
>> URI : gram://bosheks.nikhef.nl:7772
>>
>> Site : RAL-LCG2
>> URI : gram://lcgrb01.gridpp.rl.ac.uk:7772 Site :
Taiwan-LCG2
>> URI : gram://lcg00124.grid.sinica.edu.tw:7772
>> Thank you very much.
>>
>> Antonio.
>>
>>
>>
>> -- _______________________________________
>> Antonio Delgado Peris
>> _______________________________________
>> IT Division - GD Group - EIS section
>> CERN. CH-1211 Genève 23 (Switzerland)
>> Office: 28-R-017
>> Tel.: +41 22 76 72227
>> Email: [log in to unmask]
>> _______________________________________
>>
>>> Fix:
>>
>>
>>
>>
>> Unknown
>
>
>
>
--
David Groep
** National Institute for Nuclear and High Energy Physics, PDP/Grid group **
** Room: H1.56 Phone: +31 20 5922179, PObox 41882, NL-1009DB Amsterdam
NL **
|