[CHANGESET]: Statistics function incorrectly computing median
Jaroslav Hajek
highegg at gmail.com
Thu Mar 6 06:30:36 CST 2008
On Thu, Mar 6, 2008 at 1:28 PM, Ben Abbott <bpabbott at mac.com> wrote:
>
>
> On Mar 6, 2008, at 2:46 AM, Jaroslav Hajek wrote:
>
> > On Thu, Mar 6, 2008 at 3:44 AM, Ben Abbott <bpabbott at mac.com> wrote:
> >>
> >> On Mar 5, 2008, at 4:50 PM, John W. Eaton wrote:
> >>
> >>> On 28-Feb-2008, Ben Abbott wrote:
> >>>
> >>> | changeset is attached.
> >>>
> >>> | +2008-02-28 Ben Abbott <bpabbott at mac.com>
> >>> | +
> >>> | + * statistics/base/statistics.m: Modified to calculate median
> >>> and
> >>> | + quantiles in a manner consistent with method #7 used by
> >>> GNU's R.
> >>> | + * statistics/base/__quantile__.m: New function.
> >>> | + * statistics/base/quantile.m: New function. Matlab compatible.
> >>> | + * statistics/base/prctile.m: New function. Matlab compatible.
> >>> | + * miscellaneous/dimfunc.m: New function. Operate on a specific
> >>> | + dimension of an N-d array.
> >>>
> >>> The part of this patch that I'm not sure about is dimfunc. Is that
> >>> really necessary? If I understand the way it works, it seems that
> >>> it
> >>> will be really slow to have nested loops and calling a function
> >>> repeatedly instead of working on the full array. Is there no way to
> >>> avoid this using permute/ipermute to rearrange the data before/after
> >>> processing?
> >>>
> >>> jwe
> >>
> >> Ok, I spent some time with permute, and did manage a cleaner
> >> implementation. However, it still relies on a similar concepts ...
> >> meaning I couldn't find an method to directly work on the full array.
> >>
> >> The problem lies in two details regarding "func"
> >>
> >> (1) "func" is assumed to only operate on vectors.
> >> (2) "func" is assumed to return a vector, whose length is not
> >> generally known ahead of time.
> >>
> >> I could eliminate the dimfunc.m, but that would only result in
> >> placing
> >> the loop in __quantile__m. In the future if another script requires
> >> such functionality, duplication of similar code will be needed.
> >>
> >> John or anyone else, any ideas for advice? Is there a better
> >> approach?
> >>
> >
> > Maybe __quantile__ could be changed to operate on all columns of a
> > matrix instead of a single vector (as many core functions do, e.g.
> > sort, mean, std). I've only looked at the changeset, but it does not
> > seem that hard a task, at least for methods 1 and >=4 it looked simple
> > (but it was just a quickscan). It might, admittedly, obscure the code
> > somewhat.
> > The dimfunc can the be replaced by a sequence of permute,
> > __quantile__, ipermute.
> >
> > Personally, I find vectorization rather entertaining :)
> >
> > regards
>
> I considered that for a bit, but gave up after struggling with a
> couple of the methods ... if I recall correctly methods 2 & 3 were my
> greatest concern (which is consistent with your comment)
>
> In any event, it is possible that different approaches to 2 and 3 can
> work.
>
> I'd appreciate you help ... vectoring such diverse algorithms gives me
> a headache :-(
>
> Ben
>
OK, I'd love to give it a shot.
--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
More information about the Bug-octave
mailing list