Print

Print


Yes, I think this is your only option now. The developers are being less 
than helpful.

As for tidying things up, I think it's going to be messy. WE should be 
able to use something like this to find the pnfsids that are not related 
to any file in the namespace.

http://www.sysadmin.hep.ac.uk/svn/fabric-management/dcache/pnfs/remove-orphan-files.sh

Greig

On 07/12/07 11:21, Matt Doidge wrote:
> Hello,
> 
> I've made extensive use of the single-user mode of postgres whilst
> attempting to repair our databases, and I couldn't find any glaring
> errors or problems looking into these things-but then again I wouldn't
> nessicerily have done as the postgres backend to pnfs has always been
> something opaque and mysterious to me.
> 
> After much deliberation we've decided to go with the rollback option,
> our best "customers" would rather have us back in action and needing
> to be reseeded then sitting around being useless. I'm going to keep
> the bust database we have and see if something can be salvaged from
> it, maybe some entries can be translated over? Maybe I'm just being
> overly optimisitic. I'll do the rollback after lunch, to give time for
> people to object if they think it's a very very bad idea or present me
> with other options, but right now I don't think we have any.
> 
> One thing's for certain, I'll never blindly assume that my backups are
> working fine again.
> 
> Thanks for the help Greig,
> 
> Matt
> 
> On 07/12/2007, Greig A Cowan <[log in to unmask]> wrote:
>> Matt,
>>
>> Have you considered the approach of running in single-user mode with just
>> one of the broken databases?
>>
>> http://www.postgresql.org/docs/8.1/static/app-postgres.html
>>
>> postgres -D /var/lib/pgsql/data/ data1
>>
>> This might give you some ability to try and fix things.
>>
>> As you will know, I've also submitted another ticket to dCache support.
>>
>> Greig
>>
>>
>>
>> On Fri, 7 Dec 2007, Matt Doidge wrote:
>>
>>> Hello,
>>>
>>> Just in case the worst happens and we can't salvage pnfs from what's
>>> our current postgres and have to use the 3.5 month old backup, would
>>> there be guidelines as to how to go about sorting out the horrid mess
>>> that would leave. There's guidelines to sort things out from the point
>>> of view of a pool snuffing it, but nothing for database failures
>>> (largely as such extreme database related errors shouldn't happen due
>>> to regular backups being in place).
>>>
>>> The other piece of advice is, at what point do you should I just give
>>> up and start dusting off the old back-up? We've been down since
>>> Tuesday. How much downtime is a 3.5 month data rollback worth? Maybe I
>>> should put this question to the major VOs we support (aka atlas)?
>>>
>>> As you can tell I'm a little confused and overwhelmed, and mightily frustrated.
>>>
>>> cheers,
>>> Matt
>>>
>>> On 06/12/2007, Matt Doidge <[log in to unmask]> wrote:
>>>> Hello,
>>>>
>>>> All my empty files are in place. Postgres is back up and running- I
>>>> can connect to it and poke around. However there could have been
>>>> dataloss, and so when pnfs looks into postgres for its gubbins all it
>>>> is perhaps seeing is gobblygook and thus not be able to initialise
>>>> properly. However that's just a theory that I so hope is wrong.
>>>>
>>>> I'll see if I can dig up a spare node to see if I can get it to work
>>>> elsewhere, but I'm not sure we've got any spare machines laying about
>>>> the place. It's worth a try, at the moment I'm just banging my head
>>>> against a wall, which isn't getting the job done sadly.
>>>>
>>>> If anyone knows of any postgres queries I could issue that would test
>>>> how postgres is looking to pnfs then that would be great.
>>>>
>>>> cheers,
>>>> Matt
>>>> On 06/12/2007, Greig Alan Cowan <[log in to unmask]> wrote:
>>>>> You've got empty file corresponding to each database in this directory?
>>>>>
>>>>> /opt/pnfsdb/pnfs/databases
>>>>>
>>>>> Are you sure that postgres is back up and running? Can you really
>>>>> connect to it?
>>>>>
>>>>> I don't know what causes the enabled (x) output, but it probably implies
>>>>> a problem with postgres. I've only seen it once before.
>>>>>
>>>>> Greig
>>>>>
>>>>> On 06/12/07 08:35, Matt Doidge wrote:
>>>>>> Thanks for the reply Greig,
>>>>>>
>>>>>> The /opt/pnfsdb/pnfs/info files all seem present and correct. The
>>>>>> output of a mdb show gives
>>>>>> much the same as usual, except  the status column for each reads
>>>>>> "enabled (x)". It should be noted that all the bust databases are also
>>>>>> all the larger ones, of the databases that work only babar has any
>>>>>> significant amount of data.
>>>>>>
>>>>>> Posting to both users and support was a little cheeky, but I'm trying
>>>>>> to maximise coverage in the hopes of maximising my chances of finding
>>>>>> a solution that doesn't involve losing 3 months of data, desperate
>>>>>> times call for desperate postings!
>>>>>>
>>>>>> cheers,
>>>>>> Matt
>>>>>>
>>>>>> On 06/12/2007, Greig A Cowan <[log in to unmask]> wrote:
>>>>>>> Hi Matt,
>>>>>>>
>>>>>>> You've probably checked this already, but what are the contents of
>>>>>>>
>>>>>>> /opt/pnfsdb/pnfs/info
>>>>>>>
>>>>>>> There should be a file for each database, with contents like:
>>>>>>>
>>>>>>> $ cat D-0000
>>>>>>> admin:0:r:enabled:/opt/pnfsdb/pnfs/databases/admin
>>>>>>>
>>>>>>> Also what does this command give you? It should be something like that
>>>>>>> below.
>>>>>>>
>>>>>>> $ /opt/pnfs/tools/mdb show
>>>>>>>     ID   Name         Type    Status       Path
>>>>>>>   ----------------------------------------------
>>>>>>>     0    admin         r     enabled (r)   /opt/pnfsdb/pnfs/databases/admin
>>>>>>>     1    data1         r     enabled (r)   /opt/pnfsdb/pnfs/databases/data1
>>>>>>> ...
>>>>>>>
>>>>>>> Also, you should note that you should typically just post to the
>>>>>>> user-forum or support@dcache (the developers get a bit narky when people
>>>>>>> post to both ;) )
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Greig
>>>>>>>
>> --
>> ========================================================================
>> Dr Greig A Cowan                         http://www.ph.ed.ac.uk/~gcowan1
>> School of Physics, University of Edinburgh, James Clerk Maxwell Building
>>
>> TIER-2 STORAGE SUPPORT PAGES: http://wiki.gridpp.ac.uk/wiki/Grid_Storage
>> ========================================================================
>>