Statistics function incorrectly computing median
Ben Abbott
bpabbott at mac.com
Sun Jan 6 08:58:15 CST 2008
On Jan 6, 2008, at 10:40 PM, Vercelli wrote:
>
> ----- Original Message ----- From: "Ben Abbott" <bpabbott at mac.com>
> To: "Vercelli" <ororo at email.it>
> Sent: Sunday, January 06, 2008 11:31 AM
> Subject: Re: Statistics function incorrectly computing median
>
>
>>
>> On Jan 6, 2008, at 6:21 PM, Vercelli wrote:
>>
>>> I think that's not a bug, just an incoherence. The 2 functions
>>> (statistics and median) use two different definitions of median,
>>> which are both used in statistic books.
>>>
>>> Luca
>>
>>
>> f you don't mind can you (someone else?) answer this question for me?
>>
>> What would the 1st quartile, median and 3rd quartile be for a
>> population of [0, 1, 2, 3, 4, 5]?
>>
>> My first impression would be 1.25, 2.5, and 3.75.
>>
>> However, the CDF for those values is [1/6, 2/6, 3/6, 4/6, 5/6,
>> 6/6]. Which implies the answer is 0.5, 2.0, and 3.5.
>>
>> Which is it? ... do you imply that *both* are correct, or
>> something else?
>>
>> What about quantiles approaching zero or unity?
>>
>> Ben
>>
>>
>
> My first impression would be: [1,2,4] (as 'statistics' says)
> If you have a finite population, it's quite useless consider values /
> different/ from the original ones. You should just consider the
> index of the elements.
> Anyway, if one allows also different values, I don't know which is
> the right answer.
>
> Luca
My father who spent his career working with statistics agrees with you.
I did some checking on line and fount that there is no universal
agreement. The answers waver between the members of the population
that neighbor the 1st/2nd/3rd quartile percentages.
I'd personally favor an approach that minimizes the amount of
expertise required of the user, but have been told that experts would
not favor that approach.
In any event, the routine does appear to be functioning ... meaning
that;
(1) 25% of the samples are less than or equal (<=) to the lower
quartile..
(2) Half the samples are less than or equal to the median.
(3) 75% of the samples are less than or equal to the upper quartile.
The second conditions is true for median.m as well. Although median.m
and statistics.m give different answers their difference is less than
the distance between the members adjacent to the 50% percentile.
Since there is no Matlab version of statistics.m and Matlab's median.m
produces the same result as Octave's, I don't think any action is
needed.
Ben
More information about the Bug-octave
mailing list