"Gary Scott" <[log in to unmask]> wrote:
...
> Sorry to interrupt. I would like to clarify that "newline" and "line
> feed" are not the same thing (I haven't been following this item so I
> hope this isn't too far out of scope). Character 0A hex is "line
> feed". This does not necessarily imply that following the advance to a
> new line that the "current position" is at the first character of a
> record. That would happen according to ascii only following both a CR
> and an LF pair (I'm not sure there is an equivalent to newline in
> ASCII). In EBCDIC, 15 hex is newline and 25 hex is line feed.
> Therefore, the standard should not refer to ACHAR(10) as "newline" but
> instead as "line feed". I've never understood why systems don't use 1E
> hex (record separator) to separate text records as intended (other than
> to save space in text files).
What you are describing was close to the original intent of
theASCII standard. But, neither CR nor LF, nor a combination
of the two was intended to be a record mark. These are both
"format effectors" whose purpose is to produce an specified
appearance on a display or hardcopy device. For the purposes
of determining the file's structure, they were both intended to
be merely data.
The intended characters for structuring files were called
information separators. These were US, RS, GS, and FS (ASCII
positions 31, 30, 29 and 28 respectively). Now, it has since
been determined that using marks to terminate rather than
separate is a better idea, but even so, the appropriate mark
for the purpose should have been the ones that ASCII designated.
Well, this was never popular (though in retrospect it was a good
idea). It required that the RS be replaced with CR and LF when
displaying the file (already the assumption was present that a
record and a line were the same thing). Further, one hardware
company (DEC if I recall) had printers that could be set to do
both CR and LF when only LF was transmitted to them. So, the
UNIX people disregarded the ASCII standard and implemented
their text files using a single LF for the record mark. This eliminated
any need to filter data before printing (still assuming a line and
a record were the same). I may be unfairly sigling out UNIX
in this respect. There were others that did the same. But
it was the UNIX community that acquired the clout for the
next step.
Well, after UNIX became popular (at least among University
types) the ASCII standard was revised to explicitly permit
the behavior of UNIX text files - and they adopted the rename
of LF as NL. Actually they permitted (and still permit) both
uses and meanings of the character - just not in the same
implementation. In the standard, the use of the code as
NL is still officially termed a format effector whose meaning
is the combination of CR and LF. (It was also deprecated,
and still was as recently as the 1986 revision of the standard
doc - the most recent I have a copy of.)
Today we'd all be better off (IMO) if we adopted the original
ASCII standard - with the modification of using the US, RS,
GS, and FS character as terminators rather than separators.
Nearly all display and hardcopy devices are driven through
filters (to permit data other than text) anyway. So, the
extra work required for converting RS to the CR/LF
operations would be in the device drivers where it belongs.
And, it would then be possible to use LF for it's original
intent as well.
The real problem is not so much that UNIX chose not to
comply with the original ASCII standard, but that it set
a precedent for others also to disregard it - but differently.
I don't really miss having LF (how many times do I really
need to go to the same position on the next line?). It's
the lack of a portable standard that is irksome.
--
J. Giles
|