suspected csvread bug

Julian Briggs (SEIY) jb615 at york.ac.uk
Fri Apr 11 11:17:47 CDT 2008


David Bateman wrote:
> Julian Briggs wrote:
>>
>> David Bateman wrote:
>>> Julian Briggs wrote:
>>>> Dear Maintainer(s) of Octave package io,
>>>>
>>>> I find cvsread mishandles commas embedded in text data, such as
>>>> headings.
>>>> This occurs even when I skip the columns/rows containing such headings.
>>>> Presumably the problem is in dlmread.
>>>>
>>>> Here is a demonstration of the issue.
>>>> Reading file, "csvread_demo2.csv" with content (saved as cvs from Excel
>>>> spreadsheet):
>>>>
>>>> h11,h12,h13,h14
>>>> h21,1,2,3
>>>> "h31,c",4,5,6
>>>> h41,7,8,9
>>>> h51,10,11,12
>>>>
>>>> thus:
>>>>
>>>> path_sup     = strcat( Templates, "csvread_demo2.csv" )
>>>> disp("\nMishandles ebedded comma in matrix row 2, col 1)")
>>>> disp("Reading with: csvread( path_sup, 1, 1)")
>>>> sup = csvread( path_sup, 1, 1);
>>>> disp("size:"), disp(size(sup))
>>>> disp("sup:"), disp(sup);
>>>>
>>>> emits:
>>>>
>>>> Mishandles ebedded comma in matrix row 2, col 1
>>>> Reading with: csvread( path_sup, 1, 1)
>>>> size:
>>>>    4   4
>>>> sup:
>>>>     1    2    3    0
>>>>     0    4    5    6
>>>>     7    8    9    0
>>>>    10   11   12    0
>>>>  
>>>>> Exit code: 0
>>>>>     
>>>> In the above cvsread appears to have read "h31,c" as 2 elements.
>>>>
>>>>
>>>> My details: pkg list
>>>> Package Name  | Version | Installation directory
>>>> --------------+---------+-----------------------
>>>>           io *|   1.0.5 |
>>>> C:\ProgramFiles\Octave\share\octave\packages\io-1.0.5
>>>> version
>>>> ans = 3.0.0
>>>> Running on Windows XP (I'd prefer Ubuntu Linux).
>>>>
>>>> I am using Octave in  university research project to apply (economics)
>>>> input-output analysis to carbon footprinting.  I am keen to use
>>>> Octave so a
>>>> timely fix would be much appreciated.
>>>>
>>>> Comments, workarounds and fixes welcome.
>>>>
>>>> Thanks
>>>>
>>>> Julian
>>>>   
>>> Hey it appears that matlab can't read this file at all.. With
>>> Matlab2007b I get
>>>
>>>  x = csvread('test.csv')
>>> ??? Error using ==> textscan
>>> Mismatch between file and format string.
>>> Trouble reading number from file (row 1, field 1) ==> h11,h
>>>
>>> Error in ==> csvread at 52
>>>     m=dlmread(filename, ',', r, c);
>>>
>>> With Octave 3.0 + octave-forge or Octave 3.1.x I get
>>>
>>>  x = csvread("test.csv")
>>> x =
>>>
>>>     0    0    0    0    0
>>>     0    1    2    3    0
>>>     0    0    4    5    6
>>>     0    7    8    9    0
>>>     0   10   11   12    0
>>>
>>> Yes it is ignoring the quotes in reading the comma, though I don't think
>>> this is a reasonable file format to expect csvread to accept.
>>>
>>> D.
>>>
>>>
>>>
>> Dear David,
>>
>> Thanks for your prompt response.
>>
>> A more useful comparison for me would be to test whether Matlab can
>> correctly read the above test file, skipping the text header
>> rows/columns with:
>> csvread(test.csv, 1,1);
>> (I do not have access to Matlab just now so cannot test this myself.)
>> Would you be willing to test this?
>>
>> (Also Matlab provides the functionality we need in xlsread:
>> http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html
>> which (if I understand the docs correctly) can skip text header
>> rows/columns either detecting non-numeric rows/columns or by user
>> specified range.)
>>
>> I'm keen to persuade my colleagues that Octave is a viable alternative
>> to Matlab for our project and a resolution of this issue would help.
>>
>> Thanks
>>
>> Julian
> 
> Matlab fails to read this case as well. See
> 
>>>  csvread('test.csv', 1,1)
> ??? Error using ==> textscan
> Mismatch between file and format string.
> Trouble reading number from file (row 2, field 2) ==> c",4,
> 
> Error in ==> csvread at 52
>     m=dlmread(filename, ',', r, c);
> 
> How does having a feature that even support convince your colleagues
> that Octave is a viable alternative to Matlab? If you want to support
> both then the fix is in your file format in any case.
> 
> D.
> 

Dear David,

Thanks for checking Matlab's handling of csvread(test.csv, 1,1);

I see from web searches that reading csv numeric data with text header 
rows/columns is a common requirement for researchers.
I've proposee that we replace commans in our headers by semi-colons but 
that workaround leaves us vulnerable to data corruption if a comma 
creeps in at a later date.

Anyway thanks very much for your development time and responses.

Regards

Julian


More information about the Bug-octave mailing list