Andrew McNab wrote:
> Greig A Cowan wrote:
>>
>> Just to clarify, I don't have a paper on resilient dCache, I was
>> talking about sharing an SRM between geographically separate sites.
>>
>> However, I have just created this page:
>>
>> http://www.gridpp.ac.uk/wiki/Resilient_dCache
>>
>> that documents my experiences with resilient dCache over the past couple
>> of days. It's only running on a single box with ~25GB of storage so it's
>> not exactly at the scale of running across an entire batch farm.
>> Unfortunately we don't have spare clusters lying around, but it's a
>> start.
>>
>> Comments/questions welcome.
>
> Does dCache's SRM check that the box hosting the pool is online
> when the SRM answers a query about one of its files? ie is the
Nope, dCache will happily return turls for files which "exist" in its
pnfs virtual filesystem, but have no corresponding data file on a
running pool . Its optimistic that way - it could be that the file will
be retrieved from an HSM, or that shortly a pool will start up
containing the data file and make everything okay.
> issue about not using resilient dCache just that a box/pool
> could go offline, or that plus the danger that the SRM will be
> falsely claiming to have files that are now offline?
>
> Clearly, having a few percent less storage online than you have
> in the racks isn't the end of the world (even if there are files
> on them), but _reporting_ that you have files on those inaccessible
> disks leads to job failures.
>
Its not just a few percent, only 1/n of the storage allocated in
resilient pools will be available - where n is the number of replicas
desired.
Derek
|