[CHANGESET]: Statistics function incorrectly computing median

Jaroslav Hajek highegg at gmail.com
Thu Mar 6 14:05:03 CST 2008


On Thu, Mar 6, 2008 at 6:02 PM, Ben Abbott <bpabbott at mac.com> wrote:
>
> On Thursday, March 06, 2008, at 09:25AM, "Jaroslav Hajek" <highegg at gmail.com> wrote:
>  >On Thu, Mar 6, 2008 at 1:28 PM, Ben Abbott <bpabbott at mac.com> wrote:
>  >>
>  >>
>  >>  On Mar 6, 2008, at 2:46 AM, Jaroslav Hajek wrote:
>  >>
>  >>  > On Thu, Mar 6, 2008 at 3:44 AM, Ben Abbott <bpabbott at mac.com> wrote:
>  >>  >>
>  >>  >> On Mar 5, 2008, at 4:50 PM, John W. Eaton wrote:
>  >>  >>
>  >>  >>> On 28-Feb-2008, Ben Abbott wrote:
>  >>  >>>
>  >>  >>> | changeset is attached.
>  >>  >>>
>  >>  >>> | +2008-02-28  Ben Abbott <bpabbott at mac.com>
>  >>  >>> | +
>  >>  >>> | +   * statistics/base/statistics.m: Modified to calculate median
>  >>  >>> and
>  >>  >>> | +     quantiles in a manner consistent with method #7 used by
>  >>  >>> GNU's R.
>  >>  >>> | +   * statistics/base/__quantile__.m: New function.
>  >>  >>> | +   * statistics/base/quantile.m: New function. Matlab compatible.
>  >>  >>> | +   * statistics/base/prctile.m: New function. Matlab compatible.
>  >>  >>> | +   * miscellaneous/dimfunc.m: New function. Operate on a specific
>  >>  >>> | +     dimension of an N-d array.
>  >>  >>>
>  >>  >>> The part of this patch that I'm not sure about is dimfunc.  Is that
>  >>  >>> really necessary?  If I understand the way it works, it seems that
>  >>  >>> it
>  >>  >>> will be really slow to have nested loops and calling a function
>  >>  >>> repeatedly instead of working on the full array.  Is there no way to
>  >>  >>> avoid this using permute/ipermute to rearrange the data before/after
>  >>  >>> processing?
>  >>  >>>
>  >>  >>> jwe
>  >>  >>
>  >>  >> Ok, I spent some time with permute, and did manage a cleaner
>  >>  >> implementation. However, it still relies on a similar concepts ...
>  >>  >> meaning I couldn't find an method to directly work on the full array.
>  >>  >>
>  >>  >> The problem lies in two details regarding "func"
>  >>  >>
>  >>  >> (1) "func" is assumed to only operate on vectors.
>  >>  >> (2) "func" is assumed to return a vector, whose length is not
>  >>  >> generally known ahead of time.
>  >>  >>
>  >>  >> I could eliminate the dimfunc.m, but that would only result in
>  >>  >> placing
>  >>  >> the loop in __quantile__m. In the future if another script requires
>  >>  >> such functionality, duplication of similar code will be needed.
>  >>  >>
>  >>  >> John or anyone else, any ideas for advice? Is there a better
>  >>  >> approach?
>  >>  >>
>  >>  >
>  >>  > Maybe __quantile__ could be changed to operate on all columns of a
>  >>  > matrix instead of a single vector (as many core functions do, e.g.
>  >>  > sort, mean, std). I've only looked at the changeset, but it does not
>  >>  > seem that hard a task, at least for methods 1 and >=4 it looked simple
>  >>  > (but it was just a quickscan). It might, admittedly, obscure the code
>  >>  > somewhat.
>  >>  > The dimfunc can the be replaced by a sequence of permute,
>  >>  > __quantile__, ipermute.
>  >>  >
>  >>  > Personally, I find vectorization rather entertaining :)
>  >>  >
>  >>  > regards
>  >>
>  >>  I considered that for a bit, but gave up after struggling with a
>  >>  couple of the methods ... if I recall correctly methods 2 & 3 were my
>  >>  greatest concern (which is consistent with your comment)
>  >>
>  >>  In any event, it is possible that different approaches to 2 and 3 can
>  >>  work.
>  >>
>  >>  I'd appreciate you help ... vectoring such diverse algorithms gives me
>  >>  a headache :-(
>  >>
>  >>  Ben
>  >>
>  >
>  >The possible presence of NaNs makes the problem more of a challenge
>  >than it appeared, because m can already be different for different
>  >columns. I still feel up to it, though, but it'll probably last
>  >longer.
>  >However, the inner q loops in __quantile__.m can certainly be removed
>  >without much effort,
>  >(as David has just observed), so I'd suggest going with the
>  >single-vector-argument version for the time being, and I'll try to
>  >supply a matrix version operating on columns later.
>  >
>
>  So I'm not confused ... you'll be focusing on removing the inner q loops in __quantile__.m and for the time being, we'll keep dimfunc.m.
>
>  Did I get that correct?
>
>  Ben
>
No. Removing the inner loops is an easy task, but I'd like to make a
version of __quantile__
that can take a matrix as x and operates on each column (but jointly
instead of sequentially).
Dimfunc will hence be unnecessary - we'll just permute the dimension
to leading position,
use __quantile__, and permute back. Again, it will possibly need more
memory (as permute copies the entire array), but no interpreted loops.


-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


More information about the Bug-octave mailing list