Hi Matt,
I notice from the YAIM documentation that there is another locking method available you could try that before messing around with lustre as a way of testing the issue.
> Method used for input proxy file locking;Íž allowed values are flock, fcntl, disabled. Flock doesn't work on NFS, while fcntl may cause problems on older kernels.
Thanks
Ewan
Ewan Steele
+44-(0)191-334-3527
[log in to unmask]
On 5 Feb 2014, at 16:52, Matt Raso-Barnett wrote:
> On 05/02/14 16:40, Matt Raso-Barnett wrote:
>> I've found something else today on the WNs which looks to be perhaps the
>> problem.
>>
>> I turned on maximum log output for glexec on the WN earlier (somehow I
>> missed this variable when looking through /etc/glexec.conf before) and
>> immediately saw the following:
>>
>> glexec[51695] 20140205T145808Z: Reading in
>> GLEXEC_CLIENT_CERT='/mnt/lustre/grid/users/pilatl01/home_cream_445503617/cream_445503617.proxy'.
>>
>> glexec[51695] 20140205T145808Z: Could not lock file during reading of
>> proxy
>> /mnt/lustre/grid/users/pilatl01/home_cream_445503617/cream_445503617.proxy.
>> glexec[51695] 20140205T145808Z: Reading proxy failed.
>> glexec[51695] 20140205T145808Z: Failed to lock
>> $GLEXEC_CLIENT_CERT=/mnt/lustre/grid/users/pilatl01/home_cream_445503617/cream_445503617.proxy,
>> $GLEXEC_SOURCE_PROXY=(NULL) or destination proxy.
>>
>> I'm not sure yet though why this is failing but these messages are
>> occuring at the time the nagios check fails so they are likely the reason.
>
> Sorry to reply to myself, but this definitely looks like it might be the issue for me -- testing flock where it is writing the lock file to our lustre file system fails, but writing out to a local disk like /tmp works fine.
>
> It seems from some initial googling that I need to tweak the way we mount lustre to support flock.
>
> Does this sound familiar to anyone else (Chris W maybe)?
>
> Cheers,
> Matt
|