Hi Winnie,
We have a R510 bought in January 2012 as part of our GRID storage
pool. In May 2012 we started to have problems with disk 0:0:8 being
ejected. The disk was replaced but the problem persisted. Eventually we
had the PERC H700 card replaced - this resolved the problem. In July
2013 we had two separate disk failures (disks 0:0:0 and 0:0:7) - these
seemed to be provoked by a power cycle (a firmware update was done at
the same time, so I suppose that might have been the trigger rather than
the power cycle?). Replacing the disks resolved the problem. In all
these cases the disk was ejected promptly (there was not a long period
when errors were being logged).
John
On 13/02/2014 11:54, Winnie Lacesso wrote:
> Greetings all,
>
> In 2011 Bristol bought for a LCG VM-hosting box:
> Dell R510, Intel Xeon E5620 (4 x 2.4GHz), 24GB RAM, 9 x 300GB SAS 15K
> Has PERC H700 hardware RAID controller that makes those 9 disks = 2TB /sda
> in a RAID6; hosts site-bdii VM, APEL, 2 x CREAM-CE, squid, etc.
>
> Does anyone else have a box like this? Has anyone had any disk errors
> on it, requiring warranty disk replaced (supplied by Dell)?
> How many disk errors so far?
>
> Starting May 2013 disk 0:0:3 on Bristol's R510 logged major errors, & was
> replaced in Aug 2013 (hotswap hardware RAID) under warranty from Dell.
> (This was when I found out that logwatch - which I do read once a week -
> *ignores* the Dell Server Administrator error messages about bad disk or
> other error, logged in /var/log/messages - which I don't (didn't) look at
> much. I've since added to logwatch so it reports those errors.)
>
> In Oct 2013 disk 0:0:2 logged a few errors, then more in Dec. We got a
> replacment from Dell in Dec & replaced it just before Christmas.
> (The 9 disks are 0:0:0 to 0:0:8)
>
> Then starting mid-Jan 2014 disk 0:0:4 logged a few errors, & has
> continued to log errors with slowly increasing frequency.
> This time Dell is suggesting that there may be some other problem than
> just a disk (since the "bad disk/errors" seems to be getting a bit
> strangely frequent). They say we need to shut the server down, create
> some microsoft-boot-able-usb-thing, boot the server from that, & update
> the firmware on all the drives.
>
> (I'm not inclined to do this... if we have to all I can say is, it better
> not wreck ANYTHING on the vm-hosting box!)
>
>
> Has anyone else got a Dell R510 that has had similar issues & has either
> had this advice from Dell, or done it*? If so, was the outcome good?
>
> Bristol PP is in the market for another server, & the above experience
> makes me want a DNUK, not a Dell....
>
> * or even "done anything like it"? (with a good outcome)
>
> Grateful for advice
>
> Winnie Lacesso / 55% HPC Storage Admin, 20% Particle Physics, 25% SysOps
> HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK
> University of Bristol
>
|