-ffast-math option at compling octave in FreeBSD ports ?

Jaroslav Hajek highegg at gmail.com
Sun Dec 7 09:58:36 CST 2008


On Sun, Dec 7, 2008 at 9:12 AM, Tatsuro MATSUOKA <tmacchant at yahoo.co.jp> wrote:
> Hello
>
> In an octave thread in Japan, there was a report that asked the meaning -ffast-math option in FreeBSD ports.
>
> It will be glad for me if there are some peple who will give me information about it.
>
> Regards
>
> Tatsuro
>
> --------------------------------------
> Power up the Internet with Yahoo! Toolbar.
> http://pr.mail.yahoo.co.jp/toolbar/
> _______________________________________________
> Help-octave mailing list
> Help-octave at octave.org
> https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
>

I'm not exactly an expert, but I'll try to explain:
-ffast-math in GCC enables certain optimizations that can dramatically
boost performance, but may slightly violate the expected semantics of
a computation.

To get an idea what is allowed under -ffast-math, try this simple
function with g++:

void dscal (double *x, int n, double a)
{
  for (int i = 0; i < n; i++)
    x[i] /= a;
}

compiled (to assembler) using "-O3 -fomit-frame-pointer"
(I intentionally omit -funroll-loops so that the assembler stays readable)
I get (g++ 4.3, old Intel Celeron):
	movl	8(%esp), %ecx
	movl	4(%esp), %edx
	fldl	12(%esp)
	testl	%ecx, %ecx
	jle	.L8
	xorl	%eax, %eax
	.p2align 4,,7
.L4:
	fldl	(%edx,%eax,8)
	fdiv	%st(1), %st
	fstpl	(%edx,%eax,8)
	addl	$1, %eax
	cmpl	%ecx, %eax
	jne	.L4
.L8:
	fstp	%st(0)
	ret

whereas with "-O3 -fomit-frame-pointer -ffast-math" I get:
	movl	8(%esp), %ecx
	movl	4(%esp), %edx
	fldl	12(%esp)
	testl	%ecx, %ecx
	jle	.L8
	fld1
	xorl	%eax, %eax
	fdivp	%st, %st(1)
	.p2align 4,,7
.L4:
	fldl	(%edx,%eax,8)
	fmul	%st(1), %st
	fstpl	(%edx,%eax,8)
	addl	$1, %eax
	cmpl	%ecx, %eax
	jne	.L4
.L8:
	fstp	%st(0)
	ret


If you can read assembler at the basic level (like I do), you see that
in the second case, the compiler essentially transformed the function
like this:
void dscal (double *x, int n, double a)
{
  double ainv = 1.0/a;
  for (int i = 0; i < n; i++)
    x[i] *= ainv;
}

This is much faster, because division is much slower than
multiplication, and can also be better vectorized using SSE
instructions and loop unrolling.
However, it may produce slightly different results, because, for instance, while
x / x is exactly 1 for any finite nonzero x, x * (1/x) is not (in FP math).
Another thing is that with -ffast-math, compiler is allowed to assume
that NaNs and Infs do not occur in expressions, and thus, for
instance, replace "x-x" by 0. (which does not hold for x=NaN).

HTH,

-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


More information about the Help-octave mailing list