Statistics function incorrectly computing median

Ben Abbott bpabbott at mac.com
Sun Jan 20 13:22:58 CST 2008


On Jan 20, 2008, at 7:25 AM, Miguel Garcia-Blanco wrote:

>> It is clear, that a change is merited. I agree it makes sense to   
>> consider
>> "discrete",  and "continuous" distributions.
>>
>> There is no "statistics()" function in Matlab, but there is a
>> "quantile()". It uses a continuous representation, I'd like to  
>> confirm
>> our implementation is consistent with theirs, but otherwise there  
>> is  no
>> compatibility issue.
>>
>> The current continuous method is consistent with both R and Maxima,  
>> so
>> we're good there.
>>
>> What we need to decide is;
>>
>> (1) What algorithm should be used in the discrete case?
>>
>> I'd prefer the current implementation, which mirror of R's 1st    
>> method
>> ... less work for me;-)
>>
>
> While I have a preference for method 2 (because of the median), I  
> suppose
> all three discrete methods are valid, so I'm not particularly fussed.

After much reading, I've concluded that the median has as widely  
accepted definition. For the sample population [1, 2, 3, 4], the  
proper answer is 2.5.

Thus, I do think it appropriate that our implementation is consistent  
with the widely accepted definition.

The problem is with the definition of the quantiles. There isn't one  
widely accepted method. However, I've decided to favor R's method #2  
for discrete populations. Regarding continuous distributions, is R's  
#7 consistent with what Maxima gives?

> Regardless of which method is chosen, I think it should be clearly  
> indicated
> somewhere (perhaps in the function description), so that users don't
> continually file bug reports, simply because they were expecting the  
> results
> of one of the other methods.

Once we've settled on an algorithm(s) I'll make the changes to the  
functions' descriptions. The other functions in the statistics toolbox  
will also need to be checked for consistency and impact.

> Of course, it's also possible to implement all three methods and let  
> the
> user choose. But this requires more work ;)
>
>> However, changing to the second method should be simple. Please post
>> results for some other examples; x = [1:5], x = [1, 2, 5, 9], and x  
>> =  [1,
>> 2, 5, 9, 11]; ... I'd do it myself, but am not so familiar with R.
>>
>
> See the attached file: Examples.txt
>
>> (2) Are "empirical" samples to be handled as "continuous" or  
>> "discrete"?
>>
>> ... I assume "continuous" is correct?
>>
>> Ben
>
> I'm not entirely sure what you mean. I think you might be confusing  
> "sample"
> with "population". Samples consist of observations and, hence, are  
> empirical
> by definition.

You are correct, I'm referring to populations, and have been a bit  
sloppy in how I've expressed myself.

Regarding your new examples, those are both good tests! ... it took a  
few hours of effort, but I'm now able to mirror the results for method  
#2 for x = [16, 11, 15, 12, 15, 8, 11, 12, 6, 10]. For the last test,  
I get [-0.572 -0.068 0.107] for the [1st quartile, median, 3rd  
quartile].

I am hopeful, that the "discrete" result mirrors method #2 of R, and  
the "continuous" mirrors method #7 (same as Maxima?) ... in all cases.

I've attached the modified version of discrete_inv.m as well as the  
new quantile.m script. The latter is intended to be compatible with  
the script of  the same name in Matlab's statistics toolbox. However,  
since I don't have access to Matlab's version, I can't be sure the  
algorithms are equivalent.

Please do some testing and let me know what you find (some test  
scripts are included).

Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: discrete_inv.m
Type: application/octet-stream
Size: 3596 bytes
Desc: not available
Url : https://www.cae.wisc.edu/pipermail/bug-octave/attachments/20080120/c58ade9f/attachment.obj 
-------------- next part --------------


-------------- next part --------------
A non-text attachment was scrubbed...
Name: quantile.m
Type: application/octet-stream
Size: 3469 bytes
Desc: not available
Url : https://www.cae.wisc.edu/pipermail/bug-octave/attachments/20080120/c58ade9f/attachment-0001.obj 
-------------- next part --------------





More information about the Bug-octave mailing list