Statistics function incorrectly computing median

Ben Abbott bpabbott at mac.com
Tue Jan 22 06:39:40 CST 2008


On Jan 22, 2008, at 5:50 AM, Miguel Garcia-Blanco wrote:

>> ... So what I plan to do is to leave discrete_???.m as it is, and  
>> have
>> empirical_???.m work as R's method #7. I'll likely have the latter
>> optionally support other algorithms as well (Matlab's for sure).
>>
>> From there other Matlab style compatible functions can be added
>> (quantile, prctile, etc).
>>
>> I'd change discrete_inv.m to work as R's method #2, but I'm not  
>> sure  it
>> will give the proper result to hygeinv.m ... thoughts?
>
> I don't think there is any need to change empirical_inv. As far as I  
> can
> tell, it's working just as it should (*). Likewise, I think  
> discrete_inv can
> probably stay unchanged.
>
> I think the best thing to do would be to write quantile/prctile  
> functions
> implementing method 7 (R's default), and have statistics.m call  
> quantile
> instead of empirical_inv. (I think it would be nice to implement the  
> other
> methods too, so that the user has a choice, but there's no need to  
> do them
> all at once.)
>
> The point I made about the importance of the discrete/continuous  
> nature of
> the parent distribution is nonsense, so ignore it. (I misunderstood  
> the help
> file for R's quantile function. The methods themselves are
> discrete/continuous, not the parent distributions. Sorry about the
> confusion.)
>
> (*) I have noticed that empirical_inv returns -Inf when X = 0,  
> whereas R
> returns min(DATA). I can see why R's behaviour could be useful, but
> returning -Inf is not inconsistent with the definition of the EDF,  
> so I
> don't think this is necessarily a major problem.
>
> -Miguel

Due to the need for consistency between the group of functions,  
empirical_inv/pdf/cdf/rnd, I'm leaning your way.

Although, I'm still unhappy that discrete_inv does not determine a  
median that is consistent with the commonly accepted definition. So  
I'll try to resolve that as well.

How is the following list for a short plan?

(1) Add prctile.m
(2) Add quantile.m
(3) Change to statistic.m (call quantile.m)
(4) Modify discrete_inv.m to use R's method #2
(5) Verify consistency of discrete_pdf/cdf/rnd.

I have some code for (1) that is working for all 9 of R's methods, but  
still needs to be cleaned up. I'll try to post it later today.

Ben





More information about the Bug-octave mailing list