Statistics function incorrectly computing median

Ben Abbott bpabbott at mac.com
Sun Jan 6 00:42:12 CST 2008


On Jan 6, 2008, at 2:14 PM, Ben Abbott wrote:

>
> On Jan 6, 2008, at 1:30 PM, Miguel Garcia-Blanco wrote:
>
>> Octave 3.0.0 (i686-pc-msdosmsvc)
>>
>> The statistics function seems to be incorrectly computing the median:
>>> x = 0:1;
>>> x_stat = statistics( x );
>>> x_med = x_stat( 3 );
>>> x_med == median( x )
>> ans = 0
>>
>> -Miguel
>
> The bug appears to be buried here
>
>> discrete_inv([0.25; 0.5; 0.75],[0:1],[1 1]/3)
>> ans =
>>
>>   0
>>   0
>>   1
>
> The correct answer would be
>
>> ans =
>>   0
>>   0.5
>>   1
>
> From what I can tell, this problem only occurs when the population
> has two members.
>
> Ben

Regarding my last comment ... scratch that. The bug(?) is present for  
any population of sequential integers  with an even number of members.

The bug/feature(?) results form this loop in discrete_inv.m

     for q = 1:n
       inv(k(q)) = v(sum (x(k(q)) > s) + 1);
     endfor

Which returns a "0" for the median, for the example given. This is  
because this function returns the sample which it finds most  
representative of the median.

If modified to

     for q = 1:n
       inv(k(q)) = v(sum (x(k(q)) >= s) + 1);
     endfor

The median becomes "1"

It would be possible to split the difference, for example;

     for q = 1:n
       inv(k(q)) = (v(sum (x(k(q)) > s) + 1) + v(sum (x(k(q)) >= s) +  
1))/2;
     endfor

In which case the median becomes 0.5. I've checked this approach out  
on a few examples, with and without sequential integers and with  
populations with differing numbers of members.

However, I'm not an expert in the calculation of quartiles and medians  
for finite populations, so please comment on this if you are.

Ben




More information about the Bug-octave mailing list