On Tue, 8 Mar 2005, Peter W. Draper wrote:
> On Mon, 7 Mar 2005, Peter W. Draper wrote:
>
> > Now for the actual news, I think HDS may have a long-standing UNIX file
> > mapping bug, of sorts. It turned out that I could fix the problem by
> > introducing an "msync()" call before HDS's call to "munmap()". Now reading
> > the POSIX description of mmap/msync/munmap, it's certainly not totally
> > clear that munmap will flush any modifications to disk, so armed with this
> > idea in mind I think I've tracked down the relevant discussion in
> > linux-nfs:
> >
> > http://marc.theaimsgroup.com/?l=linux-nfs&m=110062223707689&w=2
> >
> > Trond Myklebust is the chap who is in charge of NFS client, so he must be
> > authoritative.
> >
> > OK, so unless someone knows better I intend to go ahead and make the
> > necessary changes to HDS (one line, maybe two if I'm careful).
>
> I've changed my mind on this one, again. The final fix I've committed to
> HDS is to do an fsync() before the file is closed. This has the same
> effect as all the msync() calls. I've done it this way mostly for the
> expected impact on performance, together with an effect that I think may
> mean there is a bug in NFS after all. You're supposed to be able to do
> asynchronous msync()'s, but when I switch to that mode from synchronous
> updates the data still gets lost. So I'm guessing it's probably better to
> do one final sync when the file is closed (HDS does this only when needed)
> than potentially everytime an unmap happens. Keep an eye out for
> performance degradation anyway.
>
> I'll try to make a bug report about NFS, but clearly a patch now is a
> better option.
Final update on this issue. I exchanged email with Trond and he's quite
clear what we should be doing to follow standard behaviour, namely what I
describe above, but with the twist of keeping both the msync(MS_ASYNC) and
fsync() calls, rather than just having the final fsync() alone.
So the right sequence of actions is to msync(MS_ASYNC) the mapped segment
before unmapping it, this causes any dirty pages to be marked for return
to disk, and then to fsync() the file when it is closed to force the dirty
pages to be actually written. I've made changes to both branches of HDS
and committed them and tested that nothing seems to be broken on Linux,
Solaris, Tru 64 and Cygwin. OS X is currently a little odd as the
msync(MS_ASYNC) call is broken (this should be fixed 10.4) so has to be
changed to msync(MS_SYNC) after all. Since we haven't had mapped data
access on OS X before there shouldn't be a performance hit (actually
Cygwin didn't have mapped data until these changes either).
Hopefully all this should banish the problems we've seen with mapped data
over NFS.
Peter.
|