[CHANGESET]: Statistics function incorrectly computing median

Ben Abbott bpabbott at mac.com
Thu Mar 6 07:35:29 CST 2008


On Mar 6, 2008, at 2:46 AM, Jaroslav Hajek wrote:

> On Thu, Mar 6, 2008 at 3:44 AM, Ben Abbott <bpabbott at mac.com> wrote:
>>
>> On Mar 5, 2008, at 4:50 PM, John W. Eaton wrote:
>>
>>> On 28-Feb-2008, Ben Abbott wrote:
>>>
>>> | changeset is attached.
>>>
>>> | +2008-02-28  Ben Abbott <bpabbott at mac.com>
>>> | +
>>> | +   * statistics/base/statistics.m: Modified to calculate median  
>>> and
>>> | +     quantiles in a manner consistent with method #7 used by  
>>> GNU's R.
>>> | +   * statistics/base/__quantile__.m: New function.
>>> | +   * statistics/base/quantile.m: New function. Matlab compatible.
>>> | +   * statistics/base/prctile.m: New function. Matlab compatible.
>>> | +   * miscellaneous/dimfunc.m: New function. Operate on a specific
>>> | +     dimension of an N-d array.
>>>
>>> The part of this patch that I'm not sure about is dimfunc.  Is that
>>> really necessary?  If I understand the way it works, it seems that  
>>> it
>>> will be really slow to have nested loops and calling a function
>>> repeatedly instead of working on the full array.  Is there no way to
>>> avoid this using permute/ipermute to rearrange the data before/after
>>> processing?
>>>
>>> jwe
>>
>> Ok, I spent some time with permute, and did manage a cleaner
>> implementation. However, it still relies on a similar concepts ...
>> meaning I couldn't find an method to directly work on the full array.
>>
>> The problem lies in two details regarding "func"
>>
>> (1) "func" is assumed to only operate on vectors.
>> (2) "func" is assumed to return a vector, whose length is not
>> generally known ahead of time.
>>
>> I could eliminate the dimfunc.m, but that would only result in  
>> placing
>> the loop in __quantile__m. In the future if another script requires
>> such functionality, duplication of similar code will be needed.
>>
>> John or anyone else, any ideas for advice? Is there a better  
>> approach?
>>
>
> Maybe __quantile__ could be changed to operate on all columns of a
> matrix instead of a single vector (as many core functions do, e.g.
> sort, mean, std). I've only looked at the changeset, but it does not
> seem that hard a task, at least for methods 1 and >=4 it looked simple
> (but it was just a quickscan). It might, admittedly, obscure the code
> somewhat.
> The dimfunc can the be replaced by a sequence of permute,
> __quantile__, ipermute.
>
> Personally, I find vectorization rather entertaining :)

I'd missed John's mention of ipermute ... relative to the prior diff,  
the one below cleans dimfunc.m up a bit more (not that it will be  
needed if __quantile__ can be vectorized).

diff -r 45b65c13170e -r be591ccfa5f3 scripts/miscellaneous/dimfunc.m
--- a/scripts/miscellaneous/dimfunc.m	Wed Mar 05 21:29:32 2008 -0500
+++ b/scripts/miscellaneous/dimfunc.m	Thu Mar 06 08:32:23 2008 -0500
@@ -45,18 +45,17 @@ function y = dimfunc (func, x, dim, vara

    ## Store the original size and order of the dimensions
    orig_size = size (x);
-  orig_order = 1:numel(orig_size);
    num_vectors = prod(orig_size) / orig_size(dim);

    ## Modify dimension order and shape.
-  new_order = orig_order;
-  new_order(orig_order<=dim) -= 1;
+  new_order = 1:numel(orig_size);
+  new_order(new_order<=dim) -= 1;
    new_order(1) = dim;
    x = permute (x, new_order);
    x = reshape (x, [orig_size(dim), num_vectors]);
    x = permute (x, [2, 1]);

-  ## Operate on dim
+  ## Operate on dim.
    for n = 1:num_vectors
      tmp = func (x(n,:), varargin{:});
      if (n == 1)
@@ -70,10 +69,7 @@ function y = dimfunc (func, x, dim, vara

    ## Restore dimension order and shape.
    y = reshape (y, new_size(new_order));
-  new_order = orig_order;
-  new_order(orig_order<dim) += 1;
-  new_order(dim) = 1;
-  y = permute (y, new_order);
+  y = ipermute (y, new_order);

  endfunction

Ben



More information about the Bug-octave mailing list