[CHANGESET]: Statistics function incorrectly computing median
Ben Abbott
bpabbott at mac.com
Thu Mar 6 11:02:44 CST 2008
On Thursday, March 06, 2008, at 09:25AM, "Jaroslav Hajek" <highegg at gmail.com> wrote:
>On Thu, Mar 6, 2008 at 1:28 PM, Ben Abbott <bpabbott at mac.com> wrote:
>>
>>
>> On Mar 6, 2008, at 2:46 AM, Jaroslav Hajek wrote:
>>
>> > On Thu, Mar 6, 2008 at 3:44 AM, Ben Abbott <bpabbott at mac.com> wrote:
>> >>
>> >> On Mar 5, 2008, at 4:50 PM, John W. Eaton wrote:
>> >>
>> >>> On 28-Feb-2008, Ben Abbott wrote:
>> >>>
>> >>> | changeset is attached.
>> >>>
>> >>> | +2008-02-28 Ben Abbott <bpabbott at mac.com>
>> >>> | +
>> >>> | + * statistics/base/statistics.m: Modified to calculate median
>> >>> and
>> >>> | + quantiles in a manner consistent with method #7 used by
>> >>> GNU's R.
>> >>> | + * statistics/base/__quantile__.m: New function.
>> >>> | + * statistics/base/quantile.m: New function. Matlab compatible.
>> >>> | + * statistics/base/prctile.m: New function. Matlab compatible.
>> >>> | + * miscellaneous/dimfunc.m: New function. Operate on a specific
>> >>> | + dimension of an N-d array.
>> >>>
>> >>> The part of this patch that I'm not sure about is dimfunc. Is that
>> >>> really necessary? If I understand the way it works, it seems that
>> >>> it
>> >>> will be really slow to have nested loops and calling a function
>> >>> repeatedly instead of working on the full array. Is there no way to
>> >>> avoid this using permute/ipermute to rearrange the data before/after
>> >>> processing?
>> >>>
>> >>> jwe
>> >>
>> >> Ok, I spent some time with permute, and did manage a cleaner
>> >> implementation. However, it still relies on a similar concepts ...
>> >> meaning I couldn't find an method to directly work on the full array.
>> >>
>> >> The problem lies in two details regarding "func"
>> >>
>> >> (1) "func" is assumed to only operate on vectors.
>> >> (2) "func" is assumed to return a vector, whose length is not
>> >> generally known ahead of time.
>> >>
>> >> I could eliminate the dimfunc.m, but that would only result in
>> >> placing
>> >> the loop in __quantile__m. In the future if another script requires
>> >> such functionality, duplication of similar code will be needed.
>> >>
>> >> John or anyone else, any ideas for advice? Is there a better
>> >> approach?
>> >>
>> >
>> > Maybe __quantile__ could be changed to operate on all columns of a
>> > matrix instead of a single vector (as many core functions do, e.g.
>> > sort, mean, std). I've only looked at the changeset, but it does not
>> > seem that hard a task, at least for methods 1 and >=4 it looked simple
>> > (but it was just a quickscan). It might, admittedly, obscure the code
>> > somewhat.
>> > The dimfunc can the be replaced by a sequence of permute,
>> > __quantile__, ipermute.
>> >
>> > Personally, I find vectorization rather entertaining :)
>> >
>> > regards
>>
>> I considered that for a bit, but gave up after struggling with a
>> couple of the methods ... if I recall correctly methods 2 & 3 were my
>> greatest concern (which is consistent with your comment)
>>
>> In any event, it is possible that different approaches to 2 and 3 can
>> work.
>>
>> I'd appreciate you help ... vectoring such diverse algorithms gives me
>> a headache :-(
>>
>> Ben
>>
>
>The possible presence of NaNs makes the problem more of a challenge
>than it appeared, because m can already be different for different
>columns. I still feel up to it, though, but it'll probably last
>longer.
>However, the inner q loops in __quantile__.m can certainly be removed
>without much effort,
>(as David has just observed), so I'd suggest going with the
>single-vector-argument version for the time being, and I'll try to
>supply a matrix version operating on columns later.
>
So I'm not confused ... you'll be focusing on removing the inner q loops in __quantile__.m and for the time being, we'll keep dimfunc.m.
Did I get that correct?
Ben
More information about the Bug-octave
mailing list