Hi,
I'm pretty wary of running yaim across all the worker nodes now after a
couple of instances of it causing torque to kill all the jobs.
I've not worked out the precise cause but running yaim on a worker node does
a stop and start of the pbs_mom service which seems to delete jobs on
occasion.
If anyone has any idea what's going on and how to stop it I'm all ears.
Anyway at RALPP we've got a normally commented out stanza in cfengine to
switch the LCG_GFAL_INFOSYS envvar in /etc/profile.d/grid-env.sh (course I
accidentally corrupted the file trying to uncomment it in a hurry on Sunday
morning so It didn't help us much).
Actually if I get time tomorrow I might rewrite it to switch it based on the
presence of a control file and then revert it when we delete the file.
Yours,
Chris.
On 01/03/2010 19:16, "Ewan MacMahon" <[log in to unmask]> wrote:
>> -----Original Message-----
>> From: Wahid Bhimji [mailto:[log in to unmask]]
>>
>> How did you switch the top level BDII being used - was it just in the
>> WN sh script that runs - or a yaim rerun - or something else.
>>
> I did it by changing it in site-info.def and re-running YAIM, but
> there's probably a less sledgehammer approach.
>
> We've made some effort to ensure that we can re-YAIM most things
> without breaking stuff, so our cfengine setup runs it when the
> central copy of site-info.def changes. That makes this sort of
> reconfiguration a simple matter of tweaking the file in one place
> and waiting for everything to sort itself out.
>
> Ewan
|