>> That's the price you pay for having two independent databases keeping
>> information about the same data. Sooner or later they *will* get out of
>> sync. We *are* going to loose files, if it's a hadrware or a software error
>> isn't important. At the moment there is no way to sync the databases and
>> I don't expect it to ever happen either :(
>
> I agree that it is likely that the SRM databases and file catalogs will
> get out of sync, but this is the system that we have to work with at the
> moment. I think sites (where their resources allow) are already trying to
> minimise the effect of failures by having some form of resiliency built in
> (database backups, RAID, file replicas, redundant power supplies...).
>
The LFC can get out of sync with the SE, and the SE's database can get
out of sync with the underlying storage system. These are ancient
problems - i.e. going back to EDG :-) and even then we agreed that the
best we can do is to make the systems resilient.
lcg-* didn't behave well till recently when the first replica was
missing, and that's obviously a bug (which is now fixed according to
Graeme).
We know we need recipes for making SEs themselves resilienter (cf item
1.3 on yesterday's agenda), and for synchronising with higher level
services (cf item 1.9, ibidem). It's not obvious that *we* (ie GridPP)
necessarily should do it - depends on how badly we need the solutions.
-j
|