Hello,
I've made extensive use of the single-user mode of postgres whilst
attempting to repair our databases, and I couldn't find any glaring
errors or problems looking into these things-but then again I wouldn't
nessicerily have done as the postgres backend to pnfs has always been
something opaque and mysterious to me.
After much deliberation we've decided to go with the rollback option,
our best "customers" would rather have us back in action and needing
to be reseeded then sitting around being useless. I'm going to keep
the bust database we have and see if something can be salvaged from
it, maybe some entries can be translated over? Maybe I'm just being
overly optimisitic. I'll do the rollback after lunch, to give time for
people to object if they think it's a very very bad idea or present me
with other options, but right now I don't think we have any.
One thing's for certain, I'll never blindly assume that my backups are
working fine again.
Thanks for the help Greig,
Matt
On 07/12/2007, Greig A Cowan <[log in to unmask]> wrote:
>
> Matt,
>
> Have you considered the approach of running in single-user mode with just
> one of the broken databases?
>
> http://www.postgresql.org/docs/8.1/static/app-postgres.html
>
> postgres -D /var/lib/pgsql/data/ data1
>
> This might give you some ability to try and fix things.
>
> As you will know, I've also submitted another ticket to dCache support.
>
> Greig
>
>
>
> On Fri, 7 Dec 2007, Matt Doidge wrote:
>
> > Hello,
> >
> > Just in case the worst happens and we can't salvage pnfs from what's
> > our current postgres and have to use the 3.5 month old backup, would
> > there be guidelines as to how to go about sorting out the horrid mess
> > that would leave. There's guidelines to sort things out from the point
> > of view of a pool snuffing it, but nothing for database failures
> > (largely as such extreme database related errors shouldn't happen due
> > to regular backups being in place).
> >
> > The other piece of advice is, at what point do you should I just give
> > up and start dusting off the old back-up? We've been down since
> > Tuesday. How much downtime is a 3.5 month data rollback worth? Maybe I
> > should put this question to the major VOs we support (aka atlas)?
> >
> > As you can tell I'm a little confused and overwhelmed, and mightily frustrated.
> >
> > cheers,
> > Matt
> >
> > On 06/12/2007, Matt Doidge <[log in to unmask]> wrote:
> >> Hello,
> >>
> >> All my empty files are in place. Postgres is back up and running- I
> >> can connect to it and poke around. However there could have been
> >> dataloss, and so when pnfs looks into postgres for its gubbins all it
> >> is perhaps seeing is gobblygook and thus not be able to initialise
> >> properly. However that's just a theory that I so hope is wrong.
> >>
> >> I'll see if I can dig up a spare node to see if I can get it to work
> >> elsewhere, but I'm not sure we've got any spare machines laying about
> >> the place. It's worth a try, at the moment I'm just banging my head
> >> against a wall, which isn't getting the job done sadly.
> >>
> >> If anyone knows of any postgres queries I could issue that would test
> >> how postgres is looking to pnfs then that would be great.
> >>
> >> cheers,
> >> Matt
> >> On 06/12/2007, Greig Alan Cowan <[log in to unmask]> wrote:
> >>> You've got empty file corresponding to each database in this directory?
> >>>
> >>> /opt/pnfsdb/pnfs/databases
> >>>
> >>> Are you sure that postgres is back up and running? Can you really
> >>> connect to it?
> >>>
> >>> I don't know what causes the enabled (x) output, but it probably implies
> >>> a problem with postgres. I've only seen it once before.
> >>>
> >>> Greig
> >>>
> >>> On 06/12/07 08:35, Matt Doidge wrote:
> >>>> Thanks for the reply Greig,
> >>>>
> >>>> The /opt/pnfsdb/pnfs/info files all seem present and correct. The
> >>>> output of a mdb show gives
> >>>> much the same as usual, except the status column for each reads
> >>>> "enabled (x)". It should be noted that all the bust databases are also
> >>>> all the larger ones, of the databases that work only babar has any
> >>>> significant amount of data.
> >>>>
> >>>> Posting to both users and support was a little cheeky, but I'm trying
> >>>> to maximise coverage in the hopes of maximising my chances of finding
> >>>> a solution that doesn't involve losing 3 months of data, desperate
> >>>> times call for desperate postings!
> >>>>
> >>>> cheers,
> >>>> Matt
> >>>>
> >>>> On 06/12/2007, Greig A Cowan <[log in to unmask]> wrote:
> >>>>> Hi Matt,
> >>>>>
> >>>>> You've probably checked this already, but what are the contents of
> >>>>>
> >>>>> /opt/pnfsdb/pnfs/info
> >>>>>
> >>>>> There should be a file for each database, with contents like:
> >>>>>
> >>>>> $ cat D-0000
> >>>>> admin:0:r:enabled:/opt/pnfsdb/pnfs/databases/admin
> >>>>>
> >>>>> Also what does this command give you? It should be something like that
> >>>>> below.
> >>>>>
> >>>>> $ /opt/pnfs/tools/mdb show
> >>>>> ID Name Type Status Path
> >>>>> ----------------------------------------------
> >>>>> 0 admin r enabled (r) /opt/pnfsdb/pnfs/databases/admin
> >>>>> 1 data1 r enabled (r) /opt/pnfsdb/pnfs/databases/data1
> >>>>> ...
> >>>>>
> >>>>> Also, you should note that you should typically just post to the
> >>>>> user-forum or support@dcache (the developers get a bit narky when people
> >>>>> post to both ;) )
> >>>>>
> >>>>> Cheers,
> >>>>> Greig
> >>>>>
> >>>
> >>
> >
>
> --
> ========================================================================
> Dr Greig A Cowan http://www.ph.ed.ac.uk/~gcowan1
> School of Physics, University of Edinburgh, James Clerk Maxwell Building
>
> TIER-2 STORAGE SUPPORT PAGES: http://wiki.gridpp.ac.uk/wiki/Grid_Storage
> ========================================================================
>
|