Hi James,
Yes, they are "2TB Western Digital RE4 Green Power Hard Disk Drive". Is
it simple to update the drive firmware? I guess I should push this back
to the vendor if possible, but it might be quicker to do it ourselves if
it doesn't require too much expertise.
Cheers,
Ben
On 07/03/11 11:59, James Thorne wrote:
> Hi Ben.
>
> What drives do you have attached to the Adaptec card? We had some
> problems with WD 2TB green power drives and Adaptec controllers and
> ended up updating the drive firmware. I know CERN have recently had
> some problems with their Adaptec controllers too.
>
> James.
>
> On 7 March 2011 11:37, Ben Waugh <[log in to unmask]> wrote:
>> Hi Storage Experts,
>>
>> In the absence of any well-known procedure for burning in or stress testing
>> file servers, I thought I would try a naive approach and see what happened.
>> Now I have problems but don't know how they have arisen or whether I am
>> simply making unreasonable demands on the system.
>>
>> My naive test procedure involves simply copying a lot of bytes from
>> /dev/zero onto multiple filesystems on our new RAID servers. So basically I
>> create one 60 TB partition on each RAID, make it into an LVM physical
>> volume, created a volume group on top of that, and then divide it into six
>> or so logical volumes, creating an XFS filesystem on each. Then I start
>> writing to these in parallel as follows:
>> dd if=/dev/zero of=/mnt/data/temp1/testfile bs=1M &
>> dd if=/dev/zero of=/mnt/data/temp2/testfile bs=1M &
>> etc.
>>
>> This does not make any allowance for possible file-size limits, but I would
>> have hoped at least for a graceful exit with a helpful error message.
>> Instead, one of the servers has stopped writing to the disks and displays an
>> impressive variety of errors in /var/log/messages, starting with:
>>
>> Mar 7 08:23:58 nfs2 kernel: aacraid: Host adapter abort request (0,0,1,0)
>> Mar 7 08:23:58 nfs2 kernel: aacraid: Host adapter abort request (0,0,1,0)
>> Mar 7 08:24:56 nfs2 last message repeated 188 times
>> Mar 7 08:24:56 nfs2 kernel: aacraid: Host adapter reset request. SCSI hang
>> ?
>> Mar 7 08:24:56 nfs2 kernel: sd 0:0:1:0: SCSI error: return code =
>> 0x08000002
>> Mar 7 08:24:56 nfs2 kernel: sdb: Current: sense key: Hardware Error
>> Mar 7 08:24:56 nfs2 kernel: Add. Sense: Internal target failure
>> Mar 7 08:24:56 nfs2 kernel:
>> Mar 7 08:24:56 nfs2 kernel: end_request: I/O error, dev sdb, sector
>> 53707122737
>> Mar 7 08:24:56 nfs2 kernel: I/O error in filesystem ("dm-6") meta-data dev
>> dm-6 block 0x28001a68f ("xlog_iodone") error 5 buf count 2048
>>
>> This is a SuperMicro server, running SL5, with an Adaptec RAID controller.
>>
>> Any suggestions? My inclination is to try reconfiguring the RAID from
>> scratch and designing a test procedure that limits file sizes to say 1 TB,
>> but if this is indicative of a real underlying problem then maybe someone
>> here can say so. One of the messages does say "Hardware Error" but how
>> conclusive is this?
>>
>> Cheers,
>> Ben
>>
>> --
>> Dr Ben Waugh Tel. +44 (0)20 7679 7223
>> Dept of Physics and Astronomy Internal: 37223
>> University College London
>> London WC1E 6BT
>>
>
>
>
--
Dr Ben Waugh Tel. +44 (0)20 7679 7223
Dept of Physics and Astronomy Internal: 37223
University College London
London WC1E 6BT
|