On Fri, 7 Dec 2007 11:21:24 +0000
Matt Doidge <[log in to unmask]> wrote:
> Hello,
>
> I've made extensive use of the single-user mode of postgres whilst
> attempting to repair our databases, and I couldn't find any glaring
> errors or problems looking into these things-but then again I wouldn't
> nessicerily have done as the postgres backend to pnfs has always been
> something opaque and mysterious to me.
>
> After much deliberation we've decided to go with the rollback option,
> our best "customers" would rather have us back in action and needing
> to be reseeded then sitting around being useless. I'm going to keep
> the bust database we have and see if something can be salvaged from
> it, maybe some entries can be translated over? Maybe I'm just being
> overly optimisitic. I'll do the rollback after lunch, to give time for
> people to object if they think it's a very very bad idea or present me
> with other options, but right now I don't think we have any.
>
> One thing's for certain, I'll never blindly assume that my backups are
> working fine again.
>
> Thanks for the help Greig,
>
> Matt
Hi Matt
Do keep the pool databases, as we might just be able to recover PNFS
from them in the normal way. We only have a few experts in this area
and they are currently dealing with Tier 1 space manger issues. It
really a case of priorities and IN2P3 and FZK have both had space
manager issues at the same time, since they are higher priorities than
Lancaster as they have more users.
This area of code is slightly complicated by the pool databases now
being databases, so scripts will have to be written a fresh for Matt,
even though Matts problem is with PNFS, and it is unchanged in dcache
1.8
Regards
Owen
>
> On 07/12/2007, Greig A Cowan <[log in to unmask]> wrote:
> >
> > Matt,
> >
> > Have you considered the approach of running in single-user mode with just
> > one of the broken databases?
> >
> > http://www.postgresql.org/docs/8.1/static/app-postgres.html
> >
> > postgres -D /var/lib/pgsql/data/ data1
> >
> > This might give you some ability to try and fix things.
> >
> > As you will know, I've also submitted another ticket to dCache support.
> >
> > Greig
> >
> >
> >
> > On Fri, 7 Dec 2007, Matt Doidge wrote:
> >
> > > Hello,
> > >
> > > Just in case the worst happens and we can't salvage pnfs from what's
> > > our current postgres and have to use the 3.5 month old backup, would
> > > there be guidelines as to how to go about sorting out the horrid mess
> > > that would leave. There's guidelines to sort things out from the point
> > > of view of a pool snuffing it, but nothing for database failures
> > > (largely as such extreme database related errors shouldn't happen due
> > > to regular backups being in place).
> > >
> > > The other piece of advice is, at what point do you should I just give
> > > up and start dusting off the old back-up? We've been down since
> > > Tuesday. How much downtime is a 3.5 month data rollback worth? Maybe I
> > > should put this question to the major VOs we support (aka atlas)?
> > >
> > > As you can tell I'm a little confused and overwhelmed, and mightily frustrated.
> > >
> > > cheers,
> > > Matt
> > >
> > > On 06/12/2007, Matt Doidge <[log in to unmask]> wrote:
> > >> Hello,
> > >>
> > >> All my empty files are in place. Postgres is back up and running- I
> > >> can connect to it and poke around. However there could have been
> > >> dataloss, and so when pnfs looks into postgres for its gubbins all it
> > >> is perhaps seeing is gobblygook and thus not be able to initialise
> > >> properly. However that's just a theory that I so hope is wrong.
> > >>
> > >> I'll see if I can dig up a spare node to see if I can get it to work
> > >> elsewhere, but I'm not sure we've got any spare machines laying about
> > >> the place. It's worth a try, at the moment I'm just banging my head
> > >> against a wall, which isn't getting the job done sadly.
> > >>
> > >> If anyone knows of any postgres queries I could issue that would test
> > >> how postgres is looking to pnfs then that would be great.
> > >>
> > >> cheers,
> > >> Matt
> > >> On 06/12/2007, Greig Alan Cowan <[log in to unmask]> wrote:
> > >>> You've got empty file corresponding to each database in this directory?
> > >>>
> > >>> /opt/pnfsdb/pnfs/databases
> > >>>
> > >>> Are you sure that postgres is back up and running? Can you really
> > >>> connect to it?
> > >>>
> > >>> I don't know what causes the enabled (x) output, but it probably implies
> > >>> a problem with postgres. I've only seen it once before.
> > >>>
> > >>> Greig
> > >>>
> > >>> On 06/12/07 08:35, Matt Doidge wrote:
> > >>>> Thanks for the reply Greig,
> > >>>>
> > >>>> The /opt/pnfsdb/pnfs/info files all seem present and correct. The
> > >>>> output of a mdb show gives
> > >>>> much the same as usual, except the status column for each reads
> > >>>> "enabled (x)". It should be noted that all the bust databases are also
> > >>>> all the larger ones, of the databases that work only babar has any
> > >>>> significant amount of data.
> > >>>>
> > >>>> Posting to both users and support was a little cheeky, but I'm trying
> > >>>> to maximise coverage in the hopes of maximising my chances of finding
> > >>>> a solution that doesn't involve losing 3 months of data, desperate
> > >>>> times call for desperate postings!
> > >>>>
> > >>>> cheers,
> > >>>> Matt
> > >>>>
> > >>>> On 06/12/2007, Greig A Cowan <[log in to unmask]> wrote:
> > >>>>> Hi Matt,
> > >>>>>
> > >>>>> You've probably checked this already, but what are the contents of
> > >>>>>
> > >>>>> /opt/pnfsdb/pnfs/info
> > >>>>>
> > >>>>> There should be a file for each database, with contents like:
> > >>>>>
> > >>>>> $ cat D-0000
> > >>>>> admin:0:r:enabled:/opt/pnfsdb/pnfs/databases/admin
> > >>>>>
> > >>>>> Also what does this command give you? It should be something like that
> > >>>>> below.
> > >>>>>
> > >>>>> $ /opt/pnfs/tools/mdb show
> > >>>>> ID Name Type Status Path
> > >>>>> ----------------------------------------------
> > >>>>> 0 admin r enabled (r) /opt/pnfsdb/pnfs/databases/admin
> > >>>>> 1 data1 r enabled (r) /opt/pnfsdb/pnfs/databases/data1
> > >>>>> ...
> > >>>>>
> > >>>>> Also, you should note that you should typically just post to the
> > >>>>> user-forum or support@dcache (the developers get a bit narky when people
> > >>>>> post to both ;) )
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Greig
> > >>>>>
> > >>>
> > >>
> > >
> >
> > --
> > ========================================================================
> > Dr Greig A Cowan http://www.ph.ed.ac.uk/~gcowan1
> > School of Physics, University of Edinburgh, James Clerk Maxwell Building
> >
> > TIER-2 STORAGE SUPPORT PAGES: http://wiki.gridpp.ac.uk/wiki/Grid_Storage
> > ========================================================================
> >
|