problem in reading text files
Yaping Zhou
yaping at us.ibm.com
Fri Sep 5 14:11:15 CDT 2008
Thank you all for looking at this.
The text files I sent before were from Perl. After I read your discussion
about line end characters, I used SciTE to change the line end characters
to LF and CR. I wasn't able to load any of them ( see attachment). I was
using 3.0.1 on Windows XP.
Regards,
Yaping
(See attached file: NotOK.zip)
-------------------------------------------------------------------------
"John W. Eaton"
<jwe at bevo.che.wis
c.edu> To
Ben Abbott <bpabbott at mac.com>
09/05/2008 11:18 cc
AM "Dmitri A. Sergatskov"
<dasergatskov at gmail.com>, Octave
<bug at octave.org>, Yaping
Zhou/Austin/IBM at IBMUS
Subject
Re: problem in reading text files
On 5-Sep-2008, Ben Abbott wrote:
| If so, it may be relatively simple to add the support for various line
endings by examining ls-mat-ascii.cc. Is there any reason why we wouldn't
want to do that?
It's a PITA for every application to have to know about CRLF, LF, CR,
etc, and deal with that in every instance where a text file is opened
and read (and line endings matter), or every application needs a
wrapper around the I/O library for reading text files. That level of
detail is supposed to be handled by opening a file in text mode. But
then people will complain that the text file they transferred from a
Unixy system to a Windows system in binary mode won't be read properly.
It is particularly annoying that we still have this problem
in 2008. By now, everyone should have converted to using the One True
Line Ending character, LF, ASCII 0x0A.
FWIW, I think this used to work properly (at least for files with the
same style line endings as expected by the system's I/O library)
because Octave opened these files in text mode. I suspect that we are
now opening in binary mode so that seek/tell will work, but no one
added the code to explicitly handle different line endings.
BTW, when we fix this problem, should we try to keep track of what the
first line ending character is in the file and only recognize that as
the line ending character, or should we allow a random mix of LF,
CRLF, and CR[*]? How will we know, for example, that an initial lone CR
in the is really a line ending and not a CR character that is part of
a character string?
[*] OK, when reading .m files, we are already doing this (always
accepting any mixture of LF, CRLF or CR as the line ending character)
and it seems to work most of the time. But I don't think that's a
great solution. It's just a kluge that happens to appear to work most
of the time.
jwe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www-old.cae.wisc.edu/pipermail/bug-octave/attachments/20080905/eb6f4838/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : https://www-old.cae.wisc.edu/pipermail/bug-octave/attachments/20080905/eb6f4838/attachment-0003.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic28718.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
Url : https://www-old.cae.wisc.edu/pipermail/bug-octave/attachments/20080905/eb6f4838/attachment-0004.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
Url : https://www-old.cae.wisc.edu/pipermail/bug-octave/attachments/20080905/eb6f4838/attachment-0005.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NotOK.zip
Type: application/zip
Size: 25018 bytes
Desc: not available
Url : https://www-old.cae.wisc.edu/pipermail/bug-octave/attachments/20080905/eb6f4838/attachment-0001.zip
More information about the Bug-octave
mailing list