Just to reply on the situation of backups - we do have quite a lot of redundancy: * we have the replay logs for the RLS application server from which we can recreate the state at any given point 9down to single operation granularity). We have these for the entire duration of production RLS services at CERN (> 1.5 years) * We have standard oracle backups of the entire database * during the data challenges (and perhaps still) we took oracle dumps of the database every 30 minutes. AFAIK, we've never been asked to do a recovery by a user (at least during the year I was a direct part of the service team). Also, even during the data challenges other parts of the experiment software (e.g. RefDB/PubDB for CMS) also stored the information. This would enable them to rerun the production RLS entry insertions from any given point in time, if necessary, as a final level of redundancy. Maria Girone, the RLS Service Manager, can give more information I'm sure. Rest assured that since we tried to set up a 24x7 service, the maintainence of the data integrity was something we spent a lot of time working on. Cheers, James. -----Original Message----- From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On Behalf Of Burke, S (Stephen) Sent: Tuesday, January 18, 2005 12:58 PM To: [log in to unmask] Subject: Re: [LCG-ROLLOUT] [ATLAS-LCG] Disk failure at Prague LHC Computer Grid - Rollout > [mailto:[log in to unmask]] On Behalf Of Jules Wolfrat said: > I accept your point, but you can't expect that sysadmins deal with > this situation, they never can tell if a validated action is wanted or > unwanted. And I wonder if you ever can do a restore of the RLS on > request of a user because of the above because of the reasons > mentioned before, the loss of changes between time of restore and time > of backup. I've tried to avoid being too explicit on a semi-public mailing list, but I guess I have to be (no security through obscurity). LCG is living on borrowed time when it comes to hackers, we have many security holes and the main thing protecting us is just that hackers haven't yet got around to noticing us; sooner or later they will, and we'll be in trouble! Probably the biggest hole as things stand is the total lack of security on the catalogues which means that any hacker can do anything they like with almost no effort. I think the minimum that can be done is to keep catalogue backups for a reasonable length of time. I agree that restoring would be quite tricky, but it wouldn't be that hard to take the union of all the records in the current and backed-up states and then go through and remove the ones which don't have a physical file at the endpoint. Certainly it would be a lot better than finding that everything has been corrupted and there is no way back ... Stephen