Statistics function incorrectly computing median
Ben Abbott
bpabbott at mac.com
Sun Jan 20 13:22:58 CST 2008
On Jan 20, 2008, at 7:25 AM, Miguel Garcia-Blanco wrote:
>> It is clear, that a change is merited. I agree it makes sense to
>> consider
>> "discrete", and "continuous" distributions.
>>
>> There is no "statistics()" function in Matlab, but there is a
>> "quantile()". It uses a continuous representation, I'd like to
>> confirm
>> our implementation is consistent with theirs, but otherwise there
>> is no
>> compatibility issue.
>>
>> The current continuous method is consistent with both R and Maxima,
>> so
>> we're good there.
>>
>> What we need to decide is;
>>
>> (1) What algorithm should be used in the discrete case?
>>
>> I'd prefer the current implementation, which mirror of R's 1st
>> method
>> ... less work for me;-)
>>
>
> While I have a preference for method 2 (because of the median), I
> suppose
> all three discrete methods are valid, so I'm not particularly fussed.
After much reading, I've concluded that the median has as widely
accepted definition. For the sample population [1, 2, 3, 4], the
proper answer is 2.5.
Thus, I do think it appropriate that our implementation is consistent
with the widely accepted definition.
The problem is with the definition of the quantiles. There isn't one
widely accepted method. However, I've decided to favor R's method #2
for discrete populations. Regarding continuous distributions, is R's
#7 consistent with what Maxima gives?
> Regardless of which method is chosen, I think it should be clearly
> indicated
> somewhere (perhaps in the function description), so that users don't
> continually file bug reports, simply because they were expecting the
> results
> of one of the other methods.
Once we've settled on an algorithm(s) I'll make the changes to the
functions' descriptions. The other functions in the statistics toolbox
will also need to be checked for consistency and impact.
> Of course, it's also possible to implement all three methods and let
> the
> user choose. But this requires more work ;)
>
>> However, changing to the second method should be simple. Please post
>> results for some other examples; x = [1:5], x = [1, 2, 5, 9], and x
>> = [1,
>> 2, 5, 9, 11]; ... I'd do it myself, but am not so familiar with R.
>>
>
> See the attached file: Examples.txt
>
>> (2) Are "empirical" samples to be handled as "continuous" or
>> "discrete"?
>>
>> ... I assume "continuous" is correct?
>>
>> Ben
>
> I'm not entirely sure what you mean. I think you might be confusing
> "sample"
> with "population". Samples consist of observations and, hence, are
> empirical
> by definition.
You are correct, I'm referring to populations, and have been a bit
sloppy in how I've expressed myself.
Regarding your new examples, those are both good tests! ... it took a
few hours of effort, but I'm now able to mirror the results for method
#2 for x = [16, 11, 15, 12, 15, 8, 11, 12, 6, 10]. For the last test,
I get [-0.572 -0.068 0.107] for the [1st quartile, median, 3rd
quartile].
I am hopeful, that the "discrete" result mirrors method #2 of R, and
the "continuous" mirrors method #7 (same as Maxima?) ... in all cases.
I've attached the modified version of discrete_inv.m as well as the
new quantile.m script. The latter is intended to be compatible with
the script of the same name in Matlab's statistics toolbox. However,
since I don't have access to Matlab's version, I can't be sure the
algorithms are equivalent.
Please do some testing and let me know what you find (some test
scripts are included).
Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: discrete_inv.m
Type: application/octet-stream
Size: 3596 bytes
Desc: not available
Url : https://www.cae.wisc.edu/pipermail/bug-octave/attachments/20080120/c58ade9f/attachment.obj
-------------- next part --------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: quantile.m
Type: application/octet-stream
Size: 3469 bytes
Desc: not available
Url : https://www.cae.wisc.edu/pipermail/bug-octave/attachments/20080120/c58ade9f/attachment-0001.obj
-------------- next part --------------
More information about the Bug-octave
mailing list