slow access to consecutive matrix elements
Jaroslav Hajek
highegg at gmail.com
Tue Nov 11 07:35:56 CST 2008
On Tue, Nov 11, 2008 at 10:44 AM, Francesco Potortì <Potorti at isti.cnr.it> wrote:
>>On Sat, Oct 11, 2008 at 10:22 AM, Francesco Potorti`
>><Potorti at isti.cnr.it> wrote:
>>> This is a real life example that demonstrates how access to matrix
>>> elements is not optimised for the case of complete ranges in the first
>>> dimensions. I am sending it here and not to the bug list because this
>>> example may be useful to others, at least until this cases is optimised
>>> in the sources.
>>>
>>> # This is a big 5-dim matrix, about 100MB size
>>> octave> kk=rand(156,222,1,44,8);
>>>
>>> # I access a slice of it, which is sequential in memory
>>> octave> t=cputime; for ii=1:44, for jj=1:8
>>> mm=kk(:,:,:,ii,jj); endfor, endfor, cputime-t
>>> ans = 5.9124
>>>
>>> # Using a linear index is much faster
>>> octave> span=(1:156*222);
>>> t=cputime; for ii=1:44, for jj=1:8
>>> mm=kk(sub2ind(size(kk),1,1,1,ii,jj)+span-1); endfor, endfor, cputime-t
>>> ans = 2.6642
>>>
>>> # Removing the call to sub2ind reaches top speed
>>> octave> cp=[1,cumprod(size(kk)(1:end-1))]; span=(1:156*222);
>>> t=cputime; for ii=1:44, for jj=1:8
>>> mm=kk(sum(([1,1,1,ii,jj]-1).*cp)+span); endfor, endfor, cputime-t
>>> ans = 1.4001
>>>
>>> Still, I wish access were faster yet. Is there a reason why copying
>>> consecutive memory is so slow? I wish I could help with optimising
>>> this, even if I am certainly not the most qualified person to do it.
>>>
>>
>>I guess the indexing routines do deserve some attention w.r.t
>>performance. Reducing code duplication would also be nice. I have this
>>on my TODO list, but I don't think it's a good idea to do it before
>>3.2.x is forked, as such changes are, IMO, likely to introduce bugs.
>
> Follwoing up on this, I realised that there is room for further
> significant speedup:
>
> # Be careful to not convert ranges into matrices
> octave> cp=[1,cumprod(size(kk)(1:end-1))]; len=156*222;
> t=cputime; for ii=1:44, for jj=1:8
> base=sum(([1,1,1,ii,jj]-1).*cp); mm=kk(base+1:base+len); endfor, endfor, cputime-t
> ans = 0.26402
>
> The fact is, I was discounting the fact that a range remains a range
> even after linear transformation, while this does not appear to be the
> case:
>
> octave3.1> typeinfo(1:4)
> ans = range
> octave3.1> typeinfo(4+(1:4))
> ans = matrix
> octave3.1> typeinfo(4*(1:4))
> ans = matrix
>
> From the depth of my ignorance about Octave's internals, I dare say that
> it should not be too difficult to keep ranges as ranges even after sum
> or product with a scalar. Maybe even after sum with a range with the
> same number of elements. Am I wrong?
>
A good suggestion.
Btw, the dense array indexing was recently almost completely
rewritten. Please use a development snapshot for any further
benchmarks, as benchmarking 3.0.x is probably meaningless for further
development.
> --
> Francesco Potortì (ricercatore) Voice: +39 050 315 3058 (op.2111)
> ISTI - Area della ricerca CNR Fax: +39 050 315 2040
> via G. Moruzzi 1, I-56124 Pisa Email: Potorti at isti.cnr.it
> (entrance 20, 1st floor, room C71) Web: http://fly.isti.cnr.it/
>
--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
More information about the Bug-octave
mailing list