Could this be clarified? I had the apt auto-update on Friday and
dpm-qryconf went nuts as a result. However, copying files to and from
the DPM was not affected. Now, running the update script today (after
having stopped all the services) fails. This failure was reported at the
beginning of thread as a sign of DB corruption.
I have regular dumps of the DB, so restoring to working order should be
not a big deal (needless to say, however, it could have been avoided, as
commented in this very thread). It would be desirable to know how far
back I have to go in order to retrieve the latest possible functional
snapshot. Is the corruption likely to occur as the new rpm's are
installed (perhaps triggered by an attempted transfer after that), or is
the action of running the script likely to corrupt the lot?
cheers,
Gianfranco
Michel Jouvin wrote:
> Sophie can confirm but I think there is no risk of corruption if running=20
> the new server on the old db : it will just fails. The problem is running=20
> the update script with the service (old or new) running.
>
> Michel
>
> --On samedi 10 mars 2007 19:03 +0100 Debreczeni Gergely=20
> <[log in to unmask]> wrote:
>
>
>> Hi !
>>
>> Just thiking loudly:
>>
>> The apt-autoupdate updated the rpms, but none of them was restarted.
>> (I've checked the .spec files).
>> So after the upgrade you had the new DPM libraries and files installed
>> but the old servers running. When we tested the upgrade script we
>> strictly followed the description and there no hours were passed between
>> the rpm upgrade and the database schema upgrade, and no meantime data
>> transfer were on the server...
>>
>> So in your case what probably happened, that the old server wanted to
>> load one of the new shared libraries during the night (because you had an
>> ongoing transfer), which is obviously a weird situation and that caused
>> DB corruption.
>>
>> If as you proposed the rpm postinstall script had stopped the service,
>> then you would have waken up in the morning with some crashed data
>> transfer... (I dunno which one is better :-))
>>
>> So, none of the solution is perfect, personally
>> *I'm very much againts of apt-autoupdate*.
>> If I run a production site then it would be me who
>> would like to do the upgrade and see,follow the output and
>> read the release notes carefully before, not only superficially.....
>>
>> So, both side needs some improvement ;-)
>>
>> Best regards and good weekend,
>> Gergo
>>
>> PS: And of course very probably after the database is corrupted the
>> update script is not gonna to work...
>>
>>
>>
>> Adam Padee a =C3=A9crit :
>>
>>> Sophie Lemaitre wrote:
>>>
>>>> Wait, I agree only with the documentation change time and date.
>>>>
>>>> But, starting and stoping the services is done by YAIM as needed.
>>>> This is also explained in the Wiki documentation (since the beginning)
>>>> as well as in the release notes.
>>>>
>>>>
>>>>
>>> Well, you're right. Probably the best way to do it was to use YAIM. But,
>>> as I mentioned previously, my SE was upgraded by apt-autoupdate, which
>>> unfortunately doesn't run YAIM. When I woke up in the morning, my
>>> databases were already corrupt. So I had to deal with the problem
>>> manually. I don't mind updating things manually. But gLite adopted
>>> continuous update model, which makes sense only with automatic update
>>> tools. I agree, that some things cannot be done without manual
>>> intervention. But in such a case I would like to have it stated
>>> explicitly in the release notes that come to my mailbox. As updates are
>>> "continuous", I look at these notes only superficially, and unless I
>>> find something really serious, stated in capital letters, I let it go
>>> automatically. If I had to update all the nodes manually after every
>>> minor update, then the "continuous" update model =3D much more work than
>>> in the previous "release" model.
>>> In the update 16 release notes I see only "pay close attention to
>>> glite-CE and lcg-CE_torque". Nothing at all about reconfiguration of
>>> SE_dpm_mysql.
>>>
>>> I really don't like to repeat the discussion that has already taken
>>> place here in Sept'06 along with the openssh update. But I think that
>>> putting to the production repository the packages that without special
>>> treatment may cause services' malfunction, when lot of people use
>>> apt-autoupdate, is not a very good idea. I (partially) understand the
>>> openssh case, as it is an external package. But if the same thing
>>> happens with EGEE packages, which are not critical security updates, I
>>> begin to wonder what PPS is for?
>>>
>>>
>>>> We are always happy to answer all GGUS tickets we get, so please send a
>>>> mail if you are "fighting", or not sure in which order to do what.
>>>>
>>>>
>>>>
>>> I appreciate that, and I'm really grateful for the help I already
>>> received from DPM team (for example with my problem with dpm-drain in
>>> ver 1.5.6), but GGUS tickets have to travel very long way before they
>>> reach your desk. Usually they are sorted by TPM shift, sent to ROC,
>>> analyzed by ROC 1st line support, sent back to GGUS, and then assigned
>>> to your group. At least this is what has happened with my previous
>>> ticket concerning DPM. When the harm is already done, and my site does
>>> not work, I don't think that gong through GGUS is the quickest way to
>>> solve the problem.
>>>
>>> Cheers,
>>> Adam
>>>
>
>
>
> *************************************************************
> * Michel Jouvin Email : [log in to unmask] *
> * LAL / CNRS Tel : +33 1 64468932 *
> * B.P. 34 Fax : +33 1 69079404 *
> * 91898 Orsay Cedex *
> * France *
> *************************************************************
>
--
Dr. Gianfranco Sciacca Tel: +44 (0)20 7679 3044
Dept of Physics and Astronomy Internal: 33044
University College London D15 - Physics Building
London WC1E 6BT
|