[CHANGESET]: Statistics function incorrectly computing median

Jaroslav Hajek highegg at gmail.com
Thu Mar 6 01:46:45 CST 2008


On Thu, Mar 6, 2008 at 3:44 AM, Ben Abbott <bpabbott at mac.com> wrote:
>
> On Mar 5, 2008, at 4:50 PM, John W. Eaton wrote:
>
> > On 28-Feb-2008, Ben Abbott wrote:
> >
> > | changeset is attached.
> >
> > | +2008-02-28  Ben Abbott <bpabbott at mac.com>
> > | +
> > | +   * statistics/base/statistics.m: Modified to calculate median and
> > | +     quantiles in a manner consistent with method #7 used by GNU's R.
> > | +   * statistics/base/__quantile__.m: New function.
> > | +   * statistics/base/quantile.m: New function. Matlab compatible.
> > | +   * statistics/base/prctile.m: New function. Matlab compatible.
> > | +   * miscellaneous/dimfunc.m: New function. Operate on a specific
> > | +     dimension of an N-d array.
> >
> > The part of this patch that I'm not sure about is dimfunc.  Is that
> > really necessary?  If I understand the way it works, it seems that it
> > will be really slow to have nested loops and calling a function
> > repeatedly instead of working on the full array.  Is there no way to
> > avoid this using permute/ipermute to rearrange the data before/after
> > processing?
> >
> > jwe
>
> Ok, I spent some time with permute, and did manage a cleaner
> implementation. However, it still relies on a similar concepts ...
> meaning I couldn't find an method to directly work on the full array.
>
> The problem lies in two details regarding "func"
>
> (1) "func" is assumed to only operate on vectors.
> (2) "func" is assumed to return a vector, whose length is not
> generally known ahead of time.
>
> I could eliminate the dimfunc.m, but that would only result in placing
> the loop in __quantile__m. In the future if another script requires
> such functionality, duplication of similar code will be needed.
>
> John or anyone else, any ideas for advice? Is there a better approach?
>

Maybe __quantile__ could be changed to operate on all columns of a
matrix instead of a single vector (as many core functions do, e.g.
sort, mean, std). I've only looked at the changeset, but it does not
seem that hard a task, at least for methods 1 and >=4 it looked simple
(but it was just a quickscan). It might, admittedly, obscure the code
somewhat.
The dimfunc can the be replaced by a sequence of permute,
__quantile__, ipermute.

Personally, I find vectorization rather entertaining :)

regards

-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


More information about the Bug-octave mailing list