Hi
I think Andrew S has become rather cynical after a few too many problematic tranches of disk servers. :D
To correct him:
It was the Viglen 07 AMD (not 08) generation that had the higher failure rate last year. This was indeed fixed with a firmware update. The Stream Line 08 generation had a problem the year before with incorrectly handling single drive failures. Before that we had the 05 generation which were only RAID 5 and which had a batch of dodgy disk drives. I have probably missed a few problems but I haven't worked at RAL that long!
I would also dispute the assertion that load makes a significant difference to the failure rate. Some of our busiest service classes at RAL, such as atlasScratchDisk are made up of our oldest disk servers and do not have a significantly different failure rate. I think there is some evidence to suggest extreme load can be a factor in failure rates but sensible precautions such as putting things into read only mode when rebuilding and actually making sure you deploy enough disk servers to a service class to handle the load are sufficient to avoid these extremes. Obviously in DPM and other sites you don't have the faff with small service classes that RAL has. The fact that problematic disk servers will have more problems when used more does not mean we can extrapolate this to normal disk servers.
Alastair
On 19 Jan 2012, at 10:46, Ewan MacMahon wrote:
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Andrew Sansum
>>
>> Any substantial increase in failure rate beyond the 3 year supplier
>> warranty almost inevitably signals the likely demise of the generation
>> from production service and consequently by definition a bath tub shaped
>> curve. With younger generations we'd seek resolution, firmware updates or
>> a batch replacement of the media. Ho hum - it goes with the territory
>> unfortunately and we carry spare capacity to cope with these
>> eventualities.
>>
> It strikes me that that must be quite expensive. We’ve been getting a
> lot of kit recently from Dell, and they seem to offer the (quite cheap)
> option of a five year warranty on things, which we've been taking.
> If there are problems caused by kit being out of warranty but still in
> use, have you thought about just specifying longer warranties? As we've
> already covered, the drives should carry five year manufacturer cover,
> so it's not that much of a risk for the intermediate supplier to do the
> same - essentially they'd just be passing that on.
>
> Ewan
|