Statistics function incorrectly computing median

Ben Abbott bpabbott at mac.com
Sun Jan 6 08:58:15 CST 2008


On Jan 6, 2008, at 10:40 PM, Vercelli wrote:

>
> ----- Original Message ----- From: "Ben Abbott" <bpabbott at mac.com>
> To: "Vercelli" <ororo at email.it>
> Sent: Sunday, January 06, 2008 11:31 AM
> Subject: Re: Statistics function incorrectly computing median
>
>
>>
>> On Jan 6, 2008, at 6:21 PM, Vercelli wrote:
>>
>>> I think that's not a bug, just an incoherence. The 2 functions  
>>> (statistics and median) use two different definitions of median,   
>>> which are both used in statistic books.
>>>
>>> Luca
>>
>>
>> f you don't mind can you (someone else?) answer this question for me?
>>
>> What would the 1st quartile, median and 3rd quartile be for a   
>> population of [0, 1, 2, 3, 4, 5]?
>>
>> My first impression would be 1.25, 2.5, and 3.75.
>>
>> However, the CDF for those values is [1/6, 2/6, 3/6, 4/6, 5/6,  
>> 6/6]. Which implies the answer is 0.5, 2.0, and 3.5.
>>
>> Which is it? ... do you imply that *both* are correct, or  
>> something  else?
>>
>> What about quantiles approaching zero or unity?
>>
>> Ben
>>
>>
>
> My first impression would be: [1,2,4] (as 'statistics' says)
> If you have a finite population, it's quite useless consider values / 
> different/ from the original ones. You should just consider the  
> index of the elements.
> Anyway, if one allows also different values, I don't know which is  
> the right answer.
>
> Luca

My father who spent his career working with statistics agrees with you.

I did some checking on line and fount that there is no universal  
agreement. The answers waver between the members of the population  
that neighbor the 1st/2nd/3rd quartile percentages.

I'd personally favor an approach that minimizes the amount of  
expertise required of the user, but have been told that experts would  
not favor that approach.

In any event, the routine does appear to be functioning ... meaning  
that;

	(1) 25% of the samples are less than or equal (<=) to the lower  
quartile..

	(2) Half the samples are less than or equal to the median.

	(3) 75% of the samples are less than or equal to the upper quartile.

The second conditions is true for median.m as well. Although median.m  
and statistics.m give different answers their difference is less than  
the distance between the members adjacent to the 50% percentile.

Since there is no Matlab version of statistics.m and Matlab's median.m  
produces the same result as Octave's, I don't think any action is  
needed.

Ben


More information about the Bug-octave mailing list