Print

Print


Hi Gianfranco,

FYI, we are still investigating the reasons of this problem with APT 
auto-update

If you are talking about the "error on rename", you should use the most 
recent backup you have before the apt auto-update.

Please use glite-yaim-3.0.0-38 (to be released today), and not 
glite-yaim-3.0.0-36.
And re-configure the DPM node with YAIM.

Thank you, Sophie.

> Could this be clarified? I had the apt auto-update on Friday and 
> dpm-qryconf went nuts as a result. However, copying files to and from 
> the DPM was not affected. Now, running the update script today (after 
> having stopped all the services) fails. This failure was reported at 
> the beginning of thread as a sign of DB corruption.
>
> I have regular dumps of the DB, so restoring to working order should 
> be not a big deal (needless to say, however, it could have been 
> avoided, as commented in this very thread). It would be desirable to 
> know how far back I have to go in order to retrieve the latest 
> possible functional snapshot. Is the corruption likely to occur as the 
> new rpm's are installed (perhaps triggered by an attempted transfer 
> after that), or is the action of running the script likely to corrupt 
> the lot?
>
> cheers,
> Gianfranco
>
> Michel Jouvin wrote:
>
>> Sophie can confirm but I think there is no risk of corruption if 
>> running=20
>> the new server on the old db : it will just fails. The problem is 
>> running=20
>> the update script with the service (old or new) running.
>>
>> Michel
>>
>> --On samedi 10 mars 2007 19:03 +0100 Debreczeni Gergely=20
>> <[log in to unmask]> wrote:
>>
>>  
>>
>>> Hi !
>>>
>>> Just thiking loudly:
>>>
>>> The apt-autoupdate updated the rpms, but none of them was restarted.
>>> (I've checked the .spec files).
>>> So after the upgrade you had the new DPM libraries and files installed
>>> but the old servers running. When we tested the upgrade script we
>>> strictly followed the description and there no hours were passed 
>>> between
>>> the rpm upgrade and the database schema upgrade, and no meantime data
>>> transfer were on the server...
>>>
>>>   So in your case what probably happened, that the old server wanted to
>>> load one of the new shared libraries during the night (because you 
>>> had an
>>> ongoing transfer), which is obviously a weird situation and that caused
>>> DB corruption.
>>>
>>>   If as you proposed the rpm postinstall script had stopped the 
>>> service,
>>> then you would have waken up in the morning with some crashed data
>>> transfer... (I dunno which one is better :-))
>>>
>>>   So, none of the solution is perfect, personally
>>> *I'm very much againts of apt-autoupdate*.
>>> If I run a production site then it would be me who
>>> would like to do the upgrade and see,follow the output and
>>> read the release notes carefully before, not only superficially.....
>>>
>>>   So, both side needs some improvement ;-)
>>>
>>> Best regards and good weekend,
>>> Gergo
>>>
>>> PS: And of course very probably after the database is corrupted the
>>> update script is not gonna to work...
>>>
>>>
>>>
>>> Adam Padee a =C3=A9crit :
>>>    
>>>
>>>> Sophie Lemaitre wrote:
>>>>      
>>>>
>>>>> Wait, I agree only with the documentation change time and date.
>>>>>
>>>>> But, starting and stoping the services is done by YAIM as needed.
>>>>> This is also explained in the Wiki documentation (since the 
>>>>> beginning)
>>>>> as well as in the release notes.
>>>>>
>>>>>
>>>>>         
>>>>
>>>> Well, you're right. Probably the best way to do it was to use YAIM. 
>>>> But,
>>>> as I mentioned previously, my SE was upgraded by apt-autoupdate, which
>>>> unfortunately doesn't run YAIM. When I woke up in the morning, my
>>>> databases were already corrupt. So I had to deal with the problem
>>>> manually. I don't mind updating things manually. But gLite adopted
>>>> continuous update model, which makes sense only with automatic update
>>>> tools. I agree, that some things cannot be done without manual
>>>> intervention. But in such a case I would like to have it stated
>>>> explicitly in the release notes that come to my mailbox. As updates 
>>>> are
>>>> "continuous", I look at these notes only superficially, and unless I
>>>> find something really serious, stated in capital letters, I let it go
>>>> automatically. If I had to update all the nodes manually after every
>>>> minor update, then the "continuous" update model =3D much more work 
>>>> than
>>>> in the previous "release" model.
>>>> In the update 16 release notes I see only "pay close attention to
>>>> glite-CE and lcg-CE_torque". Nothing at all about reconfiguration of
>>>> SE_dpm_mysql.
>>>>
>>>> I really don't like to repeat the discussion that has already taken
>>>> place here in Sept'06 along with the openssh update. But I think that
>>>> putting to the production repository the packages that without special
>>>> treatment may cause services' malfunction, when lot of people use
>>>> apt-autoupdate, is not a very good idea. I (partially) understand the
>>>> openssh case, as it is an external package. But if the same thing
>>>> happens with EGEE packages, which are not critical security updates, I
>>>> begin to wonder what PPS is for?
>>>>
>>>>      
>>>>
>>>>> We are always happy to answer all GGUS tickets we get, so please 
>>>>> send a
>>>>> mail if you are "fighting", or not sure in which order to do what.
>>>>>
>>>>>
>>>>>         
>>>>
>>>> I appreciate that, and I'm really grateful for the help I already
>>>> received from DPM team (for example with my problem with dpm-drain in
>>>> ver 1.5.6), but GGUS tickets have to travel very long way before they
>>>> reach your desk. Usually they are sorted by TPM shift, sent to ROC,
>>>> analyzed by ROC 1st line support, sent back to GGUS, and then assigned
>>>> to your group. At least this is what has happened with my previous
>>>> ticket concerning DPM. When the harm is already done, and my site does
>>>> not work, I don't think that gong through GGUS is the quickest way to
>>>> solve the problem.
>>>>
>>>> Cheers,
>>>> Adam
>>>>       
>>>
>>
>>
>>
>>      *************************************************************
>>      * Michel Jouvin                 Email : [log in to unmask] *
>>      * LAL / CNRS                    Tel : +33 1 64468932        *
>>      * B.P. 34                       Fax : +33 1 69079404        *
>>      * 91898 Orsay Cedex                                         *
>>      * France                                                    *
>>      *************************************************************
>>   
>
>
>