Statistics function incorrectly computing median

Miguel Garcia-Blanco miguel.01 at ihug.com.au
Sat Jan 19 04:13:33 CST 2008


> I looked at the routines in R and Maxima and have concluded that they
> assume the probability of each sample is a constant; i.e. 1/(# of
> samples).
>
> The routine "discrete_inv (x, v, p)" permits the user the specify the
> probability of each sample.
>
> Thus, it took some time to determine a method that was consistent with
> constant probability and variable probability.
>
> Before, submitting a formal patch, I thought it wise to attach a
> script for testing.
>
> The equivalent commands in Octave, R, and Maxima are
>
> Octave: discrete_inv ([1/4 2/4 3/4], <v>, ones (size (<v>)) / numel
> (<v>))
>
> GNU R: summary ( c( <v> ) )
>
> Maxima: load( discriptive )$
>         quantile ( <v>, 1/4), numer;
>         quantile ( <v>, 2/4), numer;
>         quantile ( <v>, 3/4), numer;
>
> Where <v> is to be replaced by a sequence like; [1, 2, 3, 4].
>
> I've compared the result for a few examples. It appears to now be
> working properly. However, I've been wrong before, so ....
>
> Ben

(Disclaimer: I don't know much about quantiles, so what I'm about to say may 
not be correct.)

I don't know what the summary function actually does (I haven't seen the 
code), but I suspect it makes a call to the quantile function. R has 9 
different methods for calculating sample quantiles. By default, the extreme 
values and quartiles are returned, calculated using method 7. Here's an 
example:

> # Example to compare different methods
> x <- c( 1, 2, 3, 4 )
> res <- matrix( as.numeric(NA), 9, 5 )
> for( iType in 1:9 ) res[iType, ] <- y <- quantile( x, type = iType )
> dimnames( res ) <- list( 1:9, names( y ) )
> round( res, 2 )
  0%  25% 50%  75% 100%
1  1 1.00 2.0 3.00    4
2  1 1.50 2.5 3.50    4
3  1 1.00 2.0 3.00    4
4  1 1.00 2.0 3.00    4
5  1 1.50 2.5 3.50    4
6  1 1.25 2.5 3.75    4
7  1 1.75 2.5 3.25    4
8  1 1.42 2.5 3.58    4
9  1 1.44 2.5 3.56    4

The appropriateness of a particular sample quantile probably depends on 
purpose and whether the parent distribution is continuous or discrete. 
(Discrete: methods 1, 2, and 3. Continuous: methods 4 through 9.)

Since discrete_inv.m is for a discrete rv, I think it should give results 
that match with one of R's discrete methods. (My preference is for 2, simply 
because it gives the median such that Pr(X <= m) = Pr(X >= m) = 0.5).

In light of this, perhaps statistics.m needs a continuous/discrete flag?

I've attached the help file for R's quantile function for people who don't 
have R.

-Miguel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.cae.wisc.edu/pipermail/bug-octave/attachments/20080119/321576a7/attachment-0002.html 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.cae.wisc.edu/pipermail/bug-octave/attachments/20080119/321576a7/attachment-0003.html 


More information about the Bug-octave mailing list