[CHANGESET]: Statistics function incorrectly computing median

Ben Abbott bpabbott at mac.com
Thu Mar 6 06:28:11 CST 2008


On Mar 6, 2008, at 2:46 AM, Jaroslav Hajek wrote:

> On Thu, Mar 6, 2008 at 3:44 AM, Ben Abbott <bpabbott at mac.com> wrote:
>>
>> On Mar 5, 2008, at 4:50 PM, John W. Eaton wrote:
>>
>>> On 28-Feb-2008, Ben Abbott wrote:
>>>
>>> | changeset is attached.
>>>
>>> | +2008-02-28  Ben Abbott <bpabbott at mac.com>
>>> | +
>>> | +   * statistics/base/statistics.m: Modified to calculate median  
>>> and
>>> | +     quantiles in a manner consistent with method #7 used by  
>>> GNU's R.
>>> | +   * statistics/base/__quantile__.m: New function.
>>> | +   * statistics/base/quantile.m: New function. Matlab compatible.
>>> | +   * statistics/base/prctile.m: New function. Matlab compatible.
>>> | +   * miscellaneous/dimfunc.m: New function. Operate on a specific
>>> | +     dimension of an N-d array.
>>>
>>> The part of this patch that I'm not sure about is dimfunc.  Is that
>>> really necessary?  If I understand the way it works, it seems that  
>>> it
>>> will be really slow to have nested loops and calling a function
>>> repeatedly instead of working on the full array.  Is there no way to
>>> avoid this using permute/ipermute to rearrange the data before/after
>>> processing?
>>>
>>> jwe
>>
>> Ok, I spent some time with permute, and did manage a cleaner
>> implementation. However, it still relies on a similar concepts ...
>> meaning I couldn't find an method to directly work on the full array.
>>
>> The problem lies in two details regarding "func"
>>
>> (1) "func" is assumed to only operate on vectors.
>> (2) "func" is assumed to return a vector, whose length is not
>> generally known ahead of time.
>>
>> I could eliminate the dimfunc.m, but that would only result in  
>> placing
>> the loop in __quantile__m. In the future if another script requires
>> such functionality, duplication of similar code will be needed.
>>
>> John or anyone else, any ideas for advice? Is there a better  
>> approach?
>>
>
> Maybe __quantile__ could be changed to operate on all columns of a
> matrix instead of a single vector (as many core functions do, e.g.
> sort, mean, std). I've only looked at the changeset, but it does not
> seem that hard a task, at least for methods 1 and >=4 it looked simple
> (but it was just a quickscan). It might, admittedly, obscure the code
> somewhat.
> The dimfunc can the be replaced by a sequence of permute,
> __quantile__, ipermute.
>
> Personally, I find vectorization rather entertaining :)
>
> regards

I considered that for a bit, but gave up after struggling with a  
couple of the methods ... if I recall correctly methods 2 & 3 were my  
greatest concern (which is consistent with your comment)

In any event, it is possible that different approaches to 2 and 3 can  
work.

I'd appreciate you help ... vectoring such diverse algorithms gives me  
a headache :-(

Ben


More information about the Bug-octave mailing list