suspected csvread bug
Julian Briggs (SEIY)
jb615 at york.ac.uk
Fri Apr 11 11:17:47 CDT 2008
David Bateman wrote:
> Julian Briggs wrote:
>>
>> David Bateman wrote:
>>> Julian Briggs wrote:
>>>> Dear Maintainer(s) of Octave package io,
>>>>
>>>> I find cvsread mishandles commas embedded in text data, such as
>>>> headings.
>>>> This occurs even when I skip the columns/rows containing such headings.
>>>> Presumably the problem is in dlmread.
>>>>
>>>> Here is a demonstration of the issue.
>>>> Reading file, "csvread_demo2.csv" with content (saved as cvs from Excel
>>>> spreadsheet):
>>>>
>>>> h11,h12,h13,h14
>>>> h21,1,2,3
>>>> "h31,c",4,5,6
>>>> h41,7,8,9
>>>> h51,10,11,12
>>>>
>>>> thus:
>>>>
>>>> path_sup = strcat( Templates, "csvread_demo2.csv" )
>>>> disp("\nMishandles ebedded comma in matrix row 2, col 1)")
>>>> disp("Reading with: csvread( path_sup, 1, 1)")
>>>> sup = csvread( path_sup, 1, 1);
>>>> disp("size:"), disp(size(sup))
>>>> disp("sup:"), disp(sup);
>>>>
>>>> emits:
>>>>
>>>> Mishandles ebedded comma in matrix row 2, col 1
>>>> Reading with: csvread( path_sup, 1, 1)
>>>> size:
>>>> 4 4
>>>> sup:
>>>> 1 2 3 0
>>>> 0 4 5 6
>>>> 7 8 9 0
>>>> 10 11 12 0
>>>>
>>>>> Exit code: 0
>>>>>
>>>> In the above cvsread appears to have read "h31,c" as 2 elements.
>>>>
>>>>
>>>> My details: pkg list
>>>> Package Name | Version | Installation directory
>>>> --------------+---------+-----------------------
>>>> io *| 1.0.5 |
>>>> C:\ProgramFiles\Octave\share\octave\packages\io-1.0.5
>>>> version
>>>> ans = 3.0.0
>>>> Running on Windows XP (I'd prefer Ubuntu Linux).
>>>>
>>>> I am using Octave in university research project to apply (economics)
>>>> input-output analysis to carbon footprinting. I am keen to use
>>>> Octave so a
>>>> timely fix would be much appreciated.
>>>>
>>>> Comments, workarounds and fixes welcome.
>>>>
>>>> Thanks
>>>>
>>>> Julian
>>>>
>>> Hey it appears that matlab can't read this file at all.. With
>>> Matlab2007b I get
>>>
>>> x = csvread('test.csv')
>>> ??? Error using ==> textscan
>>> Mismatch between file and format string.
>>> Trouble reading number from file (row 1, field 1) ==> h11,h
>>>
>>> Error in ==> csvread at 52
>>> m=dlmread(filename, ',', r, c);
>>>
>>> With Octave 3.0 + octave-forge or Octave 3.1.x I get
>>>
>>> x = csvread("test.csv")
>>> x =
>>>
>>> 0 0 0 0 0
>>> 0 1 2 3 0
>>> 0 0 4 5 6
>>> 0 7 8 9 0
>>> 0 10 11 12 0
>>>
>>> Yes it is ignoring the quotes in reading the comma, though I don't think
>>> this is a reasonable file format to expect csvread to accept.
>>>
>>> D.
>>>
>>>
>>>
>> Dear David,
>>
>> Thanks for your prompt response.
>>
>> A more useful comparison for me would be to test whether Matlab can
>> correctly read the above test file, skipping the text header
>> rows/columns with:
>> csvread(test.csv, 1,1);
>> (I do not have access to Matlab just now so cannot test this myself.)
>> Would you be willing to test this?
>>
>> (Also Matlab provides the functionality we need in xlsread:
>> http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html
>> which (if I understand the docs correctly) can skip text header
>> rows/columns either detecting non-numeric rows/columns or by user
>> specified range.)
>>
>> I'm keen to persuade my colleagues that Octave is a viable alternative
>> to Matlab for our project and a resolution of this issue would help.
>>
>> Thanks
>>
>> Julian
>
> Matlab fails to read this case as well. See
>
>>> csvread('test.csv', 1,1)
> ??? Error using ==> textscan
> Mismatch between file and format string.
> Trouble reading number from file (row 2, field 2) ==> c",4,
>
> Error in ==> csvread at 52
> m=dlmread(filename, ',', r, c);
>
> How does having a feature that even support convince your colleagues
> that Octave is a viable alternative to Matlab? If you want to support
> both then the fix is in your file format in any case.
>
> D.
>
Dear David,
Thanks for checking Matlab's handling of csvread(test.csv, 1,1);
I see from web searches that reading csv numeric data with text header
rows/columns is a common requirement for researchers.
I've proposee that we replace commans in our headers by semi-colons but
that workaround leaves us vulnerable to data corruption if a comma
creeps in at a later date.
Anyway thanks very much for your development time and responses.
Regards
Julian
More information about the Bug-octave
mailing list