problem in reading text files

Ben Abbott bpabbott at mac.com
Fri Sep 5 13:23:34 CDT 2008


On Friday, September 05, 2008, at 12:18PM, "John W. Eaton" <jwe at bevo.che.wisc.edu> wrote:
>On  5-Sep-2008, Ben Abbott wrote:
>
>| If so, it may be relatively simple to add the support for various line endings by examining ls-mat-ascii.cc. Is there any reason why we wouldn't want to do that?
>
>It's a PITA for every application to have to know about CRLF, LF, CR,
>etc, and deal with that in every instance where a text file is opened
>and read (and line endings matter), or every application needs a
>wrapper around the I/O library for reading text files.  That level of
>detail is supposed to be handled by opening a file in text mode.  But
>then people will complain that the text file they transferred from a
>Unixy system to a Windows system in binary mode won't be read properly.
>
>It is particularly annoying that we still have this problem
>in 2008.  By now, everyone should have converted to using the One True
>Line Ending character, LF, ASCII 0x0A.
>
>FWIW, I think this used to work properly (at least for files with the
>same style line endings as expected by the system's I/O library)
>because Octave opened these files in text mode.  I suspect that we are
>now opening in binary mode so that seek/tell will work, but no one
>added the code to explicitly handle different line endings.
>
>BTW, when we fix this problem, should we try to keep track of what the
>first line ending character is in the file and only recognize that as
>the line ending character, or should we allow a random mix of LF,
>CRLF, and CR[*]?  How will we know, for example, that an initial lone CR
>in the is really a line ending and not a CR character that is part of
>a character string?
>
>[*] OK, when reading .m files, we are already doing this (always
>accepting any mixture of LF, CRLF or CR as the line ending character)
>and it seems to work most of the time.  But I don't think that's a
>great solution.  It's just a kluge that happens to appear to work most
>of the time.
>
>jwe

As the present context is with regards to oct-ascii files is mixing the line endings a problem?

I expect m-files are very tolerant of such, what about the oct-ascii files?

Ben


More information about the Bug-octave mailing list