Matt,
Have you considered the approach of running in single-user mode with just
one of the broken databases?
http://www.postgresql.org/docs/8.1/static/app-postgres.html
postgres -D /var/lib/pgsql/data/ data1
This might give you some ability to try and fix things.
As you will know, I've also submitted another ticket to dCache support.
Greig
On Fri, 7 Dec 2007, Matt Doidge wrote:
> Hello,
>
> Just in case the worst happens and we can't salvage pnfs from what's
> our current postgres and have to use the 3.5 month old backup, would
> there be guidelines as to how to go about sorting out the horrid mess
> that would leave. There's guidelines to sort things out from the point
> of view of a pool snuffing it, but nothing for database failures
> (largely as such extreme database related errors shouldn't happen due
> to regular backups being in place).
>
> The other piece of advice is, at what point do you should I just give
> up and start dusting off the old back-up? We've been down since
> Tuesday. How much downtime is a 3.5 month data rollback worth? Maybe I
> should put this question to the major VOs we support (aka atlas)?
>
> As you can tell I'm a little confused and overwhelmed, and mightily frustrated.
>
> cheers,
> Matt
>
> On 06/12/2007, Matt Doidge <[log in to unmask]> wrote:
>> Hello,
>>
>> All my empty files are in place. Postgres is back up and running- I
>> can connect to it and poke around. However there could have been
>> dataloss, and so when pnfs looks into postgres for its gubbins all it
>> is perhaps seeing is gobblygook and thus not be able to initialise
>> properly. However that's just a theory that I so hope is wrong.
>>
>> I'll see if I can dig up a spare node to see if I can get it to work
>> elsewhere, but I'm not sure we've got any spare machines laying about
>> the place. It's worth a try, at the moment I'm just banging my head
>> against a wall, which isn't getting the job done sadly.
>>
>> If anyone knows of any postgres queries I could issue that would test
>> how postgres is looking to pnfs then that would be great.
>>
>> cheers,
>> Matt
>> On 06/12/2007, Greig Alan Cowan <[log in to unmask]> wrote:
>>> You've got empty file corresponding to each database in this directory?
>>>
>>> /opt/pnfsdb/pnfs/databases
>>>
>>> Are you sure that postgres is back up and running? Can you really
>>> connect to it?
>>>
>>> I don't know what causes the enabled (x) output, but it probably implies
>>> a problem with postgres. I've only seen it once before.
>>>
>>> Greig
>>>
>>> On 06/12/07 08:35, Matt Doidge wrote:
>>>> Thanks for the reply Greig,
>>>>
>>>> The /opt/pnfsdb/pnfs/info files all seem present and correct. The
>>>> output of a mdb show gives
>>>> much the same as usual, except the status column for each reads
>>>> "enabled (x)". It should be noted that all the bust databases are also
>>>> all the larger ones, of the databases that work only babar has any
>>>> significant amount of data.
>>>>
>>>> Posting to both users and support was a little cheeky, but I'm trying
>>>> to maximise coverage in the hopes of maximising my chances of finding
>>>> a solution that doesn't involve losing 3 months of data, desperate
>>>> times call for desperate postings!
>>>>
>>>> cheers,
>>>> Matt
>>>>
>>>> On 06/12/2007, Greig A Cowan <[log in to unmask]> wrote:
>>>>> Hi Matt,
>>>>>
>>>>> You've probably checked this already, but what are the contents of
>>>>>
>>>>> /opt/pnfsdb/pnfs/info
>>>>>
>>>>> There should be a file for each database, with contents like:
>>>>>
>>>>> $ cat D-0000
>>>>> admin:0:r:enabled:/opt/pnfsdb/pnfs/databases/admin
>>>>>
>>>>> Also what does this command give you? It should be something like that
>>>>> below.
>>>>>
>>>>> $ /opt/pnfs/tools/mdb show
>>>>> ID Name Type Status Path
>>>>> ----------------------------------------------
>>>>> 0 admin r enabled (r) /opt/pnfsdb/pnfs/databases/admin
>>>>> 1 data1 r enabled (r) /opt/pnfsdb/pnfs/databases/data1
>>>>> ...
>>>>>
>>>>> Also, you should note that you should typically just post to the
>>>>> user-forum or support@dcache (the developers get a bit narky when people
>>>>> post to both ;) )
>>>>>
>>>>> Cheers,
>>>>> Greig
>>>>>
>>>
>>
>
--
=======================================================================
Dr Greig A Cowan http://www.ph.ed.ac.uk/~gcowan1
School of Physics, University of Edinburgh, James Clerk Maxwell Building
TIER-2 STORAGE SUPPORT PAGES: http://wiki.gridpp.ac.uk/wiki/Grid_Storage
=======================================================================
|