On 17:25 [Feb 05 2014], jeremy maris wrote:
> > On 05/02/14 16:40, Matt Raso-Barnett wrote:
> >
> > Sorry to reply to myself, but this definitely looks like it might be
> > the issue for me -- testing flock where it is writing the lock file
> > to our lustre file system fails, but writing out to a local disk
> > like /tmp works fine.
> >
> > It seems from some initial googling that I need to tweak the way we
> > mount lustre to support flock.
> >
> > Does this sound familiar to anyone else (Chris W maybe)?
> >
> > Cheers, Matt
>
> Not sure if flock is needed or not re glexec but I recently made it
> the default mount option for Lustre so that we could run HDF5
> parallel IO, which needs flock.
>
> Most nodes have had lustre remounted since then but not all,
> including the grid nodes. You'll need to dismount and then remount
> lustre when no jobs are running for it to take effect.
This ended up resolving the issue for us -- remounting lustre on our
worker nodes with the flock option fixed the issue.
Thanks everyone for all the help and suggestions with this issue, I've
really appreciated it!
Many thanks,
Matt
|