On 22 Nov 2011, at 14:06, Steve Traylen wrote:
> On Tue, Nov 22, 2011 at 10:59 AM, Andreas Gellrich
> <[log in to unmask]> wrote:
>> Due to internal changes of the authorizatiopn model (munge) and the job data
>> base, this new release is uncompatible with the previous version (2.3.6) and
>> requires a complete drain of the batch farm (in puir case ~4800 job slots)
>> and simultaneous updates of the batch server WNs, and CEs.
>>
>
> I don't believe a drain is needed unless you are using job arrays
> (gLite does not, your local users may).
> The authentication change is only from qsub to pbs_server
> so a either a quick change of both sides or stopping new jobs coming
> in somehow should be enough.
I have direct experience of this.
We did an upgrade 2.3.9 -> 2.5.7 without draining. I'd advise installing munge, and setting it up first, before ce-disable-submission, then suspending all jobs, updating everywhere, and then _testing qsub_ from a CE, then unsuspending jobs, and finally ce-enable-submission.
We missed ce-disable-submission, and ended up with a few broken jobs that caused scheduling problems until cleaned up (Maui didn't like them).
|