[CHANGESET]: Statistics function incorrectly computing median
Ben Abbott
bpabbott at mac.com
Thu Mar 6 07:35:29 CST 2008
On Mar 6, 2008, at 2:46 AM, Jaroslav Hajek wrote:
> On Thu, Mar 6, 2008 at 3:44 AM, Ben Abbott <bpabbott at mac.com> wrote:
>>
>> On Mar 5, 2008, at 4:50 PM, John W. Eaton wrote:
>>
>>> On 28-Feb-2008, Ben Abbott wrote:
>>>
>>> | changeset is attached.
>>>
>>> | +2008-02-28 Ben Abbott <bpabbott at mac.com>
>>> | +
>>> | + * statistics/base/statistics.m: Modified to calculate median
>>> and
>>> | + quantiles in a manner consistent with method #7 used by
>>> GNU's R.
>>> | + * statistics/base/__quantile__.m: New function.
>>> | + * statistics/base/quantile.m: New function. Matlab compatible.
>>> | + * statistics/base/prctile.m: New function. Matlab compatible.
>>> | + * miscellaneous/dimfunc.m: New function. Operate on a specific
>>> | + dimension of an N-d array.
>>>
>>> The part of this patch that I'm not sure about is dimfunc. Is that
>>> really necessary? If I understand the way it works, it seems that
>>> it
>>> will be really slow to have nested loops and calling a function
>>> repeatedly instead of working on the full array. Is there no way to
>>> avoid this using permute/ipermute to rearrange the data before/after
>>> processing?
>>>
>>> jwe
>>
>> Ok, I spent some time with permute, and did manage a cleaner
>> implementation. However, it still relies on a similar concepts ...
>> meaning I couldn't find an method to directly work on the full array.
>>
>> The problem lies in two details regarding "func"
>>
>> (1) "func" is assumed to only operate on vectors.
>> (2) "func" is assumed to return a vector, whose length is not
>> generally known ahead of time.
>>
>> I could eliminate the dimfunc.m, but that would only result in
>> placing
>> the loop in __quantile__m. In the future if another script requires
>> such functionality, duplication of similar code will be needed.
>>
>> John or anyone else, any ideas for advice? Is there a better
>> approach?
>>
>
> Maybe __quantile__ could be changed to operate on all columns of a
> matrix instead of a single vector (as many core functions do, e.g.
> sort, mean, std). I've only looked at the changeset, but it does not
> seem that hard a task, at least for methods 1 and >=4 it looked simple
> (but it was just a quickscan). It might, admittedly, obscure the code
> somewhat.
> The dimfunc can the be replaced by a sequence of permute,
> __quantile__, ipermute.
>
> Personally, I find vectorization rather entertaining :)
I'd missed John's mention of ipermute ... relative to the prior diff,
the one below cleans dimfunc.m up a bit more (not that it will be
needed if __quantile__ can be vectorized).
diff -r 45b65c13170e -r be591ccfa5f3 scripts/miscellaneous/dimfunc.m
--- a/scripts/miscellaneous/dimfunc.m Wed Mar 05 21:29:32 2008 -0500
+++ b/scripts/miscellaneous/dimfunc.m Thu Mar 06 08:32:23 2008 -0500
@@ -45,18 +45,17 @@ function y = dimfunc (func, x, dim, vara
## Store the original size and order of the dimensions
orig_size = size (x);
- orig_order = 1:numel(orig_size);
num_vectors = prod(orig_size) / orig_size(dim);
## Modify dimension order and shape.
- new_order = orig_order;
- new_order(orig_order<=dim) -= 1;
+ new_order = 1:numel(orig_size);
+ new_order(new_order<=dim) -= 1;
new_order(1) = dim;
x = permute (x, new_order);
x = reshape (x, [orig_size(dim), num_vectors]);
x = permute (x, [2, 1]);
- ## Operate on dim
+ ## Operate on dim.
for n = 1:num_vectors
tmp = func (x(n,:), varargin{:});
if (n == 1)
@@ -70,10 +69,7 @@ function y = dimfunc (func, x, dim, vara
## Restore dimension order and shape.
y = reshape (y, new_size(new_order));
- new_order = orig_order;
- new_order(orig_order<dim) += 1;
- new_order(dim) = 1;
- y = permute (y, new_order);
+ y = ipermute (y, new_order);
endfunction
Ben
More information about the Bug-octave
mailing list