Aleksandar Donev wrote:
> On Friday 22 June 2007 16:52, Kurt W Hirchert wrote:
>
>> The entire length of the record could be determined by
>> traversing the sequence of partial records that comprised it, but the
>> length was not stored in one place.
> Exactly---was this not included in the standard functionality because it may
> be inefficient? It seems like a useful and natural thing to allow especially
> if a lot of implementations do indeed directly store the length.
1. To say that such a functionality "may be inefficient" is a severe
understatement. I would not be surprise by slowdowns by two orders of
magnitude under some circumstances.
a. One possible implementation is, in effect, to read the entire
record to determine its length and then backspace to allow the content
of the record to be read. On sequential hardware (like tapes), this
changing of direction can have severe performance implications.
b. Another would be to copy the entire record into RAM so the two
passes over can be made there. This eliminates changing directions on
the external hardware, but the cost of acquiring storage to hold a
record whose size is not known at the beginning of the read can have
other kinds of negative performance effects.
[The effects of these implementation choices can show up in odd places.
I remember a discussion on c.l.f some time ago (perhaps a couple of
years ago?) about g77 being much slower writing unformatted files on NFS
file systems than other available f77 compilers, but not on local file
systems. It turned out that g77 writes unformatted records to an output
file "on the fly" as the iolist is processed, and then seeks back to the
beginning of the record to fill in its length (analogous to alternative
a above), while the other compilers being compared collected an image of
the record in memory and then wrote the length-content-length
sequentially. The performance difference between writing with random
jumps or totally sequentially proved to be much greater when accessing
the file through NFS than when accessing a local file.]
2. Today, it is the norm to use unformatted file formats where the
length is present explicitly. At the time much of this was originally
standardized, file formats where the length could only be determined by
reading the entire record were the norm.
3. It would be relatively trivial to return the actual length of a
record _after_ reading it. Pursuing this approach brings up three issues:
a. Trying to read "too much" from a record needs to be less of
error. (In particular, it can't make the data read from the "good" part
of the record be undefined.)
b. We need agreement on what units to use in measuring the size of
the record. (Perhaps the processor dependent units used in measuring
the size of direct-access file records?)
c. Note that the size read may not be exactly the size written. For
example, some implementations write the length of record in _words_, so
if one write characters, they may have to be padded to make the record
an even number of words. (That's one good reason for writing your own
length into the content of the record instead of depending on the
implementation's version of the length embedded in the file format.)
>
> BTW, can one really (legally) read a shorter string than what was written with
> unformatted IO (as Bill suggested)?
I believe so, but I'm too lazy to look up the rules to verify it. One
certainly can read less of an array that one writes. I would expect
this case to be equivalent, but if it is not, one could
always read and write
(c(i,i),i=1,length)
instead of
c(1:length)
to convert to that case.
[Since the equivalence between CHARACTER(N) and
CHARACTER(1),DIMENSION(N) is a little bit stronger for default character
kind than for any other character kind, I suppose there is an outside
chance that reading a shorter string is legal for default kind but not
other kinds. Even if true, the implied DO-loop approach above should
still be legal for those other kinds.]
>
> Thanks,
> Aleks
>
-Kurt
|