On 31/03/11 14:15, Peter Grandi wrote:
[ ... ]
> This threads on the XFS mailing list about 3ware and WD green issues:
>
> http://oss.sgi.com/archives/xfs/2011-02/msg00226.html
Just noticed a generic Adaptec HA comment in the Lustre mailing list:
http://lists.lustre.org/pipermail/lustre-discuss/2011-April/015428.html
>>> aacraid: Host adapter abort request (0,0,0,0)
>>> aacraid: Host adapter reset request. SCSI hang ?
>>> AAC: Host adapter BLINK LED 0xef
>>> AAC0: adapter kernel panic'd ef.
> We have ~ 60 servers with these Adaptec controllers, and
> found this problem just to happen from time to time.
> Upgrade of the aacraid module wouldn't help. We had
> contacts to Adaptec, but they had no clue either. Only
> good thing is it seems that this adapter panic happens in
> an instant, halting the machine, but has no prior phase of
> degradation: the controller doesn't start leaving out
> every second bit or just writing the '1's and not the '0's
> or ... - so whatever data has made it to the disks before
> the crash seems to be quite sensible.
This is a guy from GSI, from an earlier presentation on the same
storage system:
http://hepix.caspur.it/storage/hep_pdf/2009/Fall/w.shoen-hepixfall2009-lustre-gsi.pdf
it seems that they are using 1TB WD Green drives.
BTW random reset in HAs seem to be widespread issue; I have seen
it in Mellanox IB HAs and 3ware SATA HAs. WD Green drives seem
to trigger them more than others, but I have seen them happen
with other drives too.
|