More bad news. I've spent the day looking, on and off, into what started
as a locally reported CCDPACK problem, but has become what looks a lot
like the old NFS and HDS MAP_SHARED problem, namely HDS files that are
mapped over NFS are getting corrupted (don't think we've seen this since
the 2.2 days).
In this case the symptoms are that the mapped data is not written back to
the server machine correctly, but confusingly some parts or all of it
remain cached locally so you can see the expected data, or just parts of
it (which is where I came in, the user was somewhat confused at this
point)! However, when you go and look at the remote file using the remote
machine all the data arrays are actually filled mostly with zeros.
Like the old MAP_SHARED problem if you set HDS_MAP to 0 everything starts
working correctly (as does copying to local files, but that's not always
an option).
So far this is only reproducable on Fedora Core 2 installations (I've
checked 6 machines in various states) running kernel 2.6.9 or later. I'm
still fairly clueless about what the exact problem is, or how to solve it
properly, as my only FC3 installation with the same kernel as the machine
the problem originally surfaced on, is working OK, as are all the RHEL and
RH machines I've tried and there are no reports I can find about anything
like this in the kernel mailing lists (although there is a mention of some
fixes for MAP_SHARED in the 2.6.11 release notes).
So, has anyone else seen this? Might also be a good idea to keep an eye
open for strange corrupted data reports.
Peter.
(ps, in case you're in a position to try this, the simple test I've been
using is just to run fits2ndf to create an NDF, with the input and output
files in a remote NFS mounted directory, then run GAIA locally and
remotely on the new NDF. The local display shows the file as expected and
the remote display an image full of zeros, or occasionally core dumps).
|