On Thu, Jul 06, 2006 at 09:32:51AM +0100, Greig A Cowan wrote:
> Hi Andrew,
>
> > Does dCache's SRM check that the box hosting the pool is online
> > when the SRM answers a query about one of its files? ie is the
> > issue about not using resilient dCache just that a box/pool
> > could go offline, or that plus the danger that the SRM will be
> > falsely claiming to have files that are now offline?
>
> Like Derek has said already, if dCache can't get a file from an online
> pool, it expects to be able to get it from tape or for an offline pool
> to become available again.
What is annoying is that dcache even when it knows that all pools are
online and it knows very well that we don't have a tape it will hapilly
report that the file is there and wait forever for the tape (what bloody
tape?) to deliver the file.
> > Clearly, having a few percent less storage online than you have
> > in the racks isn't the end of the world (even if there are files
> > on them), but _reporting_ that you have files on those inaccessible
> > disks leads to job failures.
>
> The current computing models are such that jobs are sent to the sites
> where the data for that job resides. The location of this data is held
> within the file catalogs. If files become unavailable on your SRM due to
> some component failure then the file catalogs are not updated, so jobs
> that refer to that file will still be sent to your site and will most
> likely fail when they cannot access the data. To compensate against this I
> would say that you need some sort of inbuilt storage resiliency. This may
> be through using a RAID 5 with hot spares on your set of disk servers, or
> having some system in place which spreads file replicas across the disks
> on your WNs.
That's the price you pay for having two independent databases keeping
information about the same data. Sooner or later they *will* get out of
sync. We *are* going to loose files, if it's a hadrware or a software error
isn't important. At the moment there is no way to sync the databases and
I don't expect it to ever happen either :(
Kostas
|