More efficient MEX or MEX-like interface

David Bateman David.Bateman at motorola.com
Tue Sep 30 08:08:49 CDT 2008


Fredrik Lingvall wrote:
> David Bateman wrote:
>   
>> Fredrik Lingvall wrote:
>>     
>>> Does this mean that BLAS/LAPACK expects complex data stored in the same
>>> way as complex data is currently stored in Octave?
>>>   
>>>       
>> Yes, Octave calls lapack/blas functions like [cz]GEMM, for complex
>> arguments. Greping the matlab libraries with
>>
>> nm -C --dynamic  libmwnumerics.so | grep [gG][eE][mM][mM]
>>
>> gives
>>
>>                 U dgemm_
>>                 U sgemm_
>>                 U zgemm_
>>
>> So it appears that the double precision complex matrix multiple is
>> linked in but the single precision version isn't. I'd think this means
>> that in fact ZGEMM is used by the SuiteSparse code (which doesn't have
>> single precision), and that in fact matlab internally uses four calls
>> to DGEMM on the real and imaginary parts of the matrix to simulate
>> ZGEMM as that means that they never explicitly have to form the
>> complex matrix. Given the speed differences (or lack thereof) for such
>> operations between Octave and Matlab I believe that pretty much
>> confirms what they do.
>>     
> How did you test this?
>
> I did a quick test (Pentium-M laptop):
>
> 1) Octave 3.0.2
>
> octave:1> n=2000; B = randn(n,n) + i*randn(n,n);
> octave:2> n=2000; A = randn(n,n) + i*randn(n,n);
> octave:3> tic, C=A*B;toc
> Elapsed time is 37.4009 seconds.
> octave:4> tic, C=A*B;toc
> Elapsed time is 37.3448 seconds.
> octave:5>
>
> 2) Matlab 7.3
>
>   
>>> n=2000; A = randn(n,n) + i*randn(n,n);
>>> n=2000; B = randn(n,n) + i*randn(n,n);
>>> tic, C=A*B;toc
>>>       
> Elapsed time is 36.725418 seconds.
>   
>>> tic, C=A*B;toc
>>>       
> Elapsed time is 36.786914 seconds.
>
> Both Matlab and Octave use Goto BLAS v. 1.19
>
> It looks like there is no speed penalty  for storing real and imag
> parts  separated (at least not for the size of the problem that I tested).
>
> /F
>
>
>
>   
This is exactly my point... If matlab had to reform a C99 complex matrix 
and then call zGEMM rather than use four calls to dGEMM and do the 
additions then it would be slower as the creation and copying of the 
data to a C99 complex array comes at a cost. In fact xGEMM does the operator

  C = alpha * A * B + beta * C

and so you might for example call dGEMM once passing the two imaginary 
parts as A, B and have beta of zero, then have a second call with that 
result as C, beta of -1 and A and B being the imaginary parts, thus 
giving you the real part of the complex matrix multiply with two calls 
to dGEMM on the real and imaginary parts of the matrix. The same goes 
for the calculation of the imaginary part of the matrix multiply. The 
underlying code of zGEMM has to do something similar in any case, it 
just does the four multiplications element by element instead, so there 
is no surprise it is much the same speed as what matlab does.

The absence of cGEMM from the symbols of the numeric is a pretty good 
indication that the above is exactly what mathworks does as I see no 
reason to handle matrix multiplies different between double and single 
precision values.


D.

-- 
David Bateman                                David.Bateman at motorola.com
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph) 
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob) 
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax) 

The information contained in this communication has been classified as: 

[x] General Business Information 
[ ] Motorola Internal Use Only 
[ ] Motorola Confidential Proprietary



More information about the Octave-maintainers mailing list