JISCMail - LCG-ROLLOUT Archives

Email discussion lists for the UK Education and Research communities

Subscriber's Corner

Email Lists

LCG-ROLLOUT Archives

LCG-ROLLOUT@JISCMAIL.AC.UK

View:

Message:

[

First

Last

]

By Topic:

[

First

Last

]

By Author:

[

First

Last

]

Font:

Proportional Font

		LISTSERV Archives
		LCG-ROLLOUT Home
		LCG-ROLLOUT March 2007

Options

Subscribe or Unsubscribe

Get Password

Subject:

Re: Fwd: URGENT: update script of DPM failing

From:

Gianfranco Sciacca <[log in to unmask]>

Reply-To:

LHC Computer Grid - Rollout <[log in to unmask]>

Date:

Tue, 13 Mar 2007 17:43:24 +0000

Content-Type:

text/plain

Parts/Attachments:

text/plain (196 lines)

We have just successfully restored the DPM functionality (upgraded to 
1.6.3) following Sophie's recommendations:

1) Ditch the database
2) Restore the most recent backup prior to the apt auto-update
3) Run the latest YAIM 3.0.0-38

As a side-note, at least one site I know of carried out successfully a 
manual apt update and DPM upgrade by running YAIM 3.0.0-36.

cheers,
Gianfranco

Sophie Lemaitre wrote:
> Hello,
>
> We couldn't reproduce the database corruption:
>       - with a 1.5.10 DPM under load
>       - with APT auto-update to DPM 1.6.3
>       - without restarting the daemons
>
> So, _*could someone provide*_ me ([log in to unmask]) _*a dump of 
> his/her "corrupted" database*_ (cns_db + dpm_db), so that we can 
> investigate the problem ?
>
> Thanks a lot.
> Sophie
>
>> Could this be clarified? I had the apt auto-update on Friday and 
>> dpm-qryconf went nuts as a result. However, copying files to and from 
>> the DPM was not affected. Now, running the update script today (after 
>> having stopped all the services) fails. This failure was reported at 
>> the beginning of thread as a sign of DB corruption.
>>
>> I have regular dumps of the DB, so restoring to working order should 
>> be not a big deal (needless to say, however, it could have been 
>> avoided, as commented in this very thread). It would be desirable to 
>> know how far back I have to go in order to retrieve the latest 
>> possible functional snapshot. Is the corruption likely to occur as 
>> the new rpm's are installed (perhaps triggered by an attempted 
>> transfer after that), or is the action of running the script likely 
>> to corrupt the lot?
>>
>> cheers,
>> Gianfranco
>>
>> Michel Jouvin wrote:
>>
>>> Sophie can confirm but I think there is no risk of corruption if 
>>> running=20
>>> the new server on the old db : it will just fails. The problem is 
>>> running=20
>>> the update script with the service (old or new) running.
>>>
>>> Michel
>>>
>>> --On samedi 10 mars 2007 19:03 +0100 Debreczeni Gergely=20
>>> <[log in to unmask]> wrote:
>>>
>>>  
>>>
>>>> Hi !
>>>>
>>>> Just thiking loudly:
>>>>
>>>> The apt-autoupdate updated the rpms, but none of them was restarted.
>>>> (I've checked the .spec files).
>>>> So after the upgrade you had the new DPM libraries and files installed
>>>> but the old servers running. When we tested the upgrade script we
>>>> strictly followed the description and there no hours were passed 
>>>> between
>>>> the rpm upgrade and the database schema upgrade, and no meantime data
>>>> transfer were on the server...
>>>>
>>>>   So in your case what probably happened, that the old server 
>>>> wanted to
>>>> load one of the new shared libraries during the night (because you 
>>>> had an
>>>> ongoing transfer), which is obviously a weird situation and that 
>>>> caused
>>>> DB corruption.
>>>>
>>>>   If as you proposed the rpm postinstall script had stopped the 
>>>> service,
>>>> then you would have waken up in the morning with some crashed data
>>>> transfer... (I dunno which one is better :-))
>>>>
>>>>   So, none of the solution is perfect, personally
>>>> *I'm very much againts of apt-autoupdate*.
>>>> If I run a production site then it would be me who
>>>> would like to do the upgrade and see,follow the output and
>>>> read the release notes carefully before, not only superficially.....
>>>>
>>>>   So, both side needs some improvement ;-)
>>>>
>>>> Best regards and good weekend,
>>>> Gergo
>>>>
>>>> PS: And of course very probably after the database is corrupted the
>>>> update script is not gonna to work...
>>>>
>>>>
>>>>
>>>> Adam Padee a =C3=A9crit :
>>>>   
>>>>> Sophie Lemaitre wrote:
>>>>>     
>>>>>> Wait, I agree only with the documentation change time and date.
>>>>>>
>>>>>> But, starting and stoping the services is done by YAIM as needed.
>>>>>> This is also explained in the Wiki documentation (since the 
>>>>>> beginning)
>>>>>> as well as in the release notes.
>>>>>>
>>>>>>
>>>>>>         
>>>>>
>>>>> Well, you're right. Probably the best way to do it was to use 
>>>>> YAIM. But,
>>>>> as I mentioned previously, my SE was upgraded by apt-autoupdate, 
>>>>> which
>>>>> unfortunately doesn't run YAIM. When I woke up in the morning, my
>>>>> databases were already corrupt. So I had to deal with the problem
>>>>> manually. I don't mind updating things manually. But gLite adopted
>>>>> continuous update model, which makes sense only with automatic update
>>>>> tools. I agree, that some things cannot be done without manual
>>>>> intervention. But in such a case I would like to have it stated
>>>>> explicitly in the release notes that come to my mailbox. As 
>>>>> updates are
>>>>> "continuous", I look at these notes only superficially, and unless I
>>>>> find something really serious, stated in capital letters, I let it go
>>>>> automatically. If I had to update all the nodes manually after every
>>>>> minor update, then the "continuous" update model =3D much more 
>>>>> work than
>>>>> in the previous "release" model.
>>>>> In the update 16 release notes I see only "pay close attention to
>>>>> glite-CE and lcg-CE_torque". Nothing at all about reconfiguration of
>>>>> SE_dpm_mysql.
>>>>>
>>>>> I really don't like to repeat the discussion that has already taken
>>>>> place here in Sept'06 along with the openssh update. But I think that
>>>>> putting to the production repository the packages that without 
>>>>> special
>>>>> treatment may cause services' malfunction, when lot of people use
>>>>> apt-autoupdate, is not a very good idea. I (partially) understand the
>>>>> openssh case, as it is an external package. But if the same thing
>>>>> happens with EGEE packages, which are not critical security 
>>>>> updates, I
>>>>> begin to wonder what PPS is for?
>>>>>
>>>>>     
>>>>>> We are always happy to answer all GGUS tickets we get, so please 
>>>>>> send a
>>>>>> mail if you are "fighting", or not sure in which order to do what.
>>>>>>
>>>>>>
>>>>>>         
>>>>>
>>>>> I appreciate that, and I'm really grateful for the help I already
>>>>> received from DPM team (for example with my problem with dpm-drain in
>>>>> ver 1.5.6), but GGUS tickets have to travel very long way before they
>>>>> reach your desk. Usually they are sorted by TPM shift, sent to ROC,
>>>>> analyzed by ROC 1st line support, sent back to GGUS, and then 
>>>>> assigned
>>>>> to your group. At least this is what has happened with my previous
>>>>> ticket concerning DPM. When the harm is already done, and my site 
>>>>> does
>>>>> not work, I don't think that gong through GGUS is the quickest way to
>>>>> solve the problem.
>>>>>
>>>>> Cheers,
>>>>> Adam
>>>>>       
>>>>
>>>
>>>
>>>
>>>      *************************************************************
>>>      * Michel Jouvin                 Email : [log in to unmask] *
>>>      * LAL / CNRS                    Tel : +33 1 64468932        *
>>>      * B.P. 34                       Fax : +33 1 69079404        *
>>>      * 91898 Orsay Cedex                                         *
>>>      * France                                                    *
>>>      *************************************************************
>>>   
>>
>>
>>


-- 
Dr. Gianfranco Sciacca			Tel: +44 (0)20 7679 3044
Dept of Physics and Astronomy		Internal: 33044
University College London		D15 - Physics Building
London WC1E 6BT

Top of Message | Previous Page | Permalink

JiscMail Tools

Files Area | help

RSS Feeds and Sharing

Search Archives

Advanced Options

Archives

April 2024
March 2024
November 2023
June 2023
May 2023
April 2023
March 2023
February 2023
September 2022
June 2022
May 2022
April 2022
February 2022
December 2021
November 2021
October 2021
September 2021
July 2021
June 2021
May 2021
February 2021
January 2021
November 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
February 2018
January 2018
November 2017
October 2017
September 2017
July 2017
June 2017
May 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003

JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk