Statistics function incorrectly computing median
Ben Abbott
bpabbott at mac.com
Sun Jan 6 00:42:12 CST 2008
On Jan 6, 2008, at 2:14 PM, Ben Abbott wrote:
>
> On Jan 6, 2008, at 1:30 PM, Miguel Garcia-Blanco wrote:
>
>> Octave 3.0.0 (i686-pc-msdosmsvc)
>>
>> The statistics function seems to be incorrectly computing the median:
>>> x = 0:1;
>>> x_stat = statistics( x );
>>> x_med = x_stat( 3 );
>>> x_med == median( x )
>> ans = 0
>>
>> -Miguel
>
> The bug appears to be buried here
>
>> discrete_inv([0.25; 0.5; 0.75],[0:1],[1 1]/3)
>> ans =
>>
>> 0
>> 0
>> 1
>
> The correct answer would be
>
>> ans =
>> 0
>> 0.5
>> 1
>
> From what I can tell, this problem only occurs when the population
> has two members.
>
> Ben
Regarding my last comment ... scratch that. The bug(?) is present for
any population of sequential integers with an even number of members.
The bug/feature(?) results form this loop in discrete_inv.m
for q = 1:n
inv(k(q)) = v(sum (x(k(q)) > s) + 1);
endfor
Which returns a "0" for the median, for the example given. This is
because this function returns the sample which it finds most
representative of the median.
If modified to
for q = 1:n
inv(k(q)) = v(sum (x(k(q)) >= s) + 1);
endfor
The median becomes "1"
It would be possible to split the difference, for example;
for q = 1:n
inv(k(q)) = (v(sum (x(k(q)) > s) + 1) + v(sum (x(k(q)) >= s) +
1))/2;
endfor
In which case the median becomes 0.5. I've checked this approach out
on a few examples, with and without sequential integers and with
populations with differing numbers of members.
However, I'm not an expert in the calculation of quartiles and medians
for finite populations, so please comment on this if you are.
Ben
More information about the Bug-octave
mailing list