Ah, it turns out I spoke too soon. I made a dumb mistake in the looping
code and when I fixed it I discovered that the OSX fseek function is
broken (at least with default compiler flags) for files >= 2 Gb, no matter
how you try to use it (so you cannot incrementally just keep seeking, it
fails at 2 Gb). So I've replaced fseek with fseeko, which is not broken.
I've also tested this on 64-bit Linux now.
And I should have mentioned in the first email that on 64-bit Linux I
think that v1 of the code should work for these large files. It is only
32-bit Linux and OSX where there are problems. (And I think the problems
could be solved on both with clever use of platform-dependent compiler
flags but then you have to start worrying about whether Python has also
been compiled with these flags. I think it's not worth going there.)
Wayne
On Fri, 23 Jan 2009, Wayne Boucher wrote:
> Hello,
>
> As Andrew discovered yesterday, there is a problem with importing large
> (>= 2 Gb) files in Analysis. I have now investigated further and it was
> indeed a 32-bit problem. So some numbers which should have been positive
> were coming out negative.
>
> It turns out there were two problems. First of all, one of the types in
> the C code should have been "long long" rather than just "long". (The
> latter on many operating systems, including the default on OSX, is 4
> bytes, wherease the former is 8 bytes, and you need 8 bytes to cope
> withthese large files.)
>
> The second problem was that the system function we use to skip around the
> data file on disk (fseek) uses long, not long long, for the offset. I've
> gotten around this by adding a function which will skip at most 2^30 bytes
> (= 1 Gb) in one go.
>
> As it happens, in v2 the first change had already been made. And I've
> just added the second change to the update server. So v2 users should be
> able to use >= 2 Gb files now.
>
> In v1 neither change had been made so I've done that in our internal code
> but I haven't put the changes on the update server for two reasons. One,
> our client for uploading the code is broken in v1 because it uses ftp and
> our server no longer allows that. And two, some other code has changed in
> the relevant files and although I think the changes are consistent I'd
> rather play safe.
>
> If any v1 users want to load >= 2 Gb files then let me know and I'll sort
> out the two issues above.
>
> As Andrew discovered, there is a work-around, namely to split your data
> files up (in his case, by re-processing, and that is the best way).
>
> As it happens the CCPN data model means that in Analysis you can have
> multiple spectra inside one experiment. So that makes this work-around
> slightly less nasty.
>
> Wayne
>
|