Top gcc Questions

List of Tags

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and will actually call the library function pow, which greatly slows down the performance. (In contrast, Intel C++ Compiler, executable icc, will eliminate the library call for pow(a,6).)

What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options "-O3 -lm -funroll-loops -msse4", it uses 5 mulsd instructions:

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13

while if I write (a*a*a)*(a*a*a), it will produce

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm13, %xmm13

which reduces the number of multiply instructions to 3. icc has similar behavior.

Why do compilers not recognize this optimization trick?

Answered By: Lambdageek ( 1069)

Because Floating Point Math is not Associative. The way you group the operands in floating point multiplication has an effect on the numerical accuracy of the answer.

As a result, most compilers are very conservative about reordering floating point calculations unless they can be sure that the answer will stay the same, or unless you tell them you don't care about numerical accuracy. For example: the -ffast-math option of gcc.

Here is the extract from the program in question. The matrix img[][] has the size SIZE×SIZE, and is initialized at:

img[j][i] = 2 * j + i

Then, you make a matrix res[][], and each field in here is made to be the average of the 9 fields around it in the img matrix. The border is left at 0 for simplicity.

for(i=1;i<SIZE-1;i++) 
    for(j=1;j<SIZE-1;j++) {
        res[j][i]=0;
        for(k=-1;k<2;k++) 
            for(l=-1;l<2;l++) 
                res[j][i] += img[j+l][i+k];
        res[j][i] /= 9;
}

That's all there's to the program. For completeness' sake, here is what comes before. No code comes after. As you can see, it's just initialization.

#define SIZE 8192
float img[SIZE][SIZE]; // input image
float res[SIZE][SIZE]; //result of mean filter
int i,j,k,l;
for(i=0;i<SIZE;i++) 
    for(j=0;j<SIZE;j++) 
        img[j][i] = (2*j+i)%8196;

Basically, this program is slow when SIZE is a multiple of 2048, e.g. the execution times:

SIZE = 8191: 3.44 secs
SIZE = 8192: 7.20 secs
SIZE = 8193: 3.18 secs

The compiler is GCC. From what I know, this is because of memory management, but I don't really know too much about that subject, which is why I'm asking here.

Also how to fix this would be nice, but if someone could explain these execution times I'd already be happy enough.

I already know of malloc/free, but the problem is not amount of memory used, it's merely execution time, so I don't know how that would help.

Answered By: Mysticial ( 600)

The difference is caused by the same super-alignment issue from the following related questions:

But that's only because there's one other problem with the code.

Starting from the original loop:

for(i=1;i<SIZE-1;i++) 
    for(j=1;j<SIZE-1;j++) {
        res[j][i]=0;
        for(k=-1;k<2;k++) 
            for(l=-1;l<2;l++) 
                res[j][i] += img[j+l][i+k];
        res[j][i] /= 9;
}

First notice that the two inner loops are trivial. They can be unrolled as follows:

for(i=1;i<SIZE-1;i++) {
    for(j=1;j<SIZE-1;j++) {
        res[j][i]=0;
        res[j][i] += img[j-1][i-1];
        res[j][i] += img[j  ][i-1];
        res[j][i] += img[j+1][i-1];
        res[j][i] += img[j-1][i  ];
        res[j][i] += img[j  ][i  ];
        res[j][i] += img[j+1][i  ];
        res[j][i] += img[j-1][i+1];
        res[j][i] += img[j  ][i+1];
        res[j][i] += img[j+1][i+1];
        res[j][i] /= 9;
    }
}

So that leaves the two outer-loops that we're interested in.

Now we can see the problem is the same in this question: Why does the order of the loops affect performance when iterating over a 2D array?

You are iterating the matrix column-wise instead of row-wise.


To solve this problem, you should interchange the two loops.

for(j=1;j<SIZE-1;j++) {
    for(i=1;i<SIZE-1;i++) {
        res[j][i]=0;
        res[j][i] += img[j-1][i-1];
        res[j][i] += img[j  ][i-1];
        res[j][i] += img[j+1][i-1];
        res[j][i] += img[j-1][i  ];
        res[j][i] += img[j  ][i  ];
        res[j][i] += img[j+1][i  ];
        res[j][i] += img[j-1][i+1];
        res[j][i] += img[j  ][i+1];
        res[j][i] += img[j+1][i+1];
        res[j][i] /= 9;
    }
}

This eliminates all the non-sequential access completely so you no longer get random slow-downs on large powers-of-two.


Core i7 920 @ 3.5 GHz

Original code:

8191: 1.499 seconds
8192: 2.122 seconds
8193: 1.582 seconds

Interchanged Outer-Loops:

8191: 0.376 seconds
8192: 0.357 seconds
8193: 0.351 seconds
214
Jamie Schembri

Most questions regarding this problem are due to missing Xcode; I have Xcode 4.2 installed.

Install attempt:

rvm install 1.9.3
Installing Ruby from source to: /Users/jamie/.rvm/rubies/ruby-1.9.3-p0, this may take a while depending on your cpu(s)...

ruby-1.9.3-p0 - #fetching 
ruby-1.9.3-p0 - #extracted to /Users/jamie/.rvm/src/ruby-1.9.3-p0 (already extracted)
Fetching yaml-0.1.4.tar.gz to /Users/jamie/.rvm/archives
Extracting yaml-0.1.4.tar.gz to /Users/jamie/.rvm/src
Configuring yaml in /Users/jamie/.rvm/src/yaml-0.1.4.
Compiling yaml in /Users/jamie/.rvm/src/yaml-0.1.4.
Installing yaml to /Users/jamie/.rvm/usr
ruby-1.9.3-p0 - #configuring 
ERROR: Error running ' ./configure --prefix=/Users/jamie/.rvm/rubies/ruby-1.9.3-p0 --enable-shared --disable-install-doc --with-libyaml-dir=/Users/jamie/.rvm/usr ', please read /Users/jamie/.rvm/log/ruby-1.9.3-p0/configure.log
ERROR: There has been an error while running configure. Halting the installation.

configure.log:

[2011-11-07 04:32:17]  ./configure --prefix=/Users/jamie/.rvm/rubies/ruby-1.9.3-p0 --enable-shared --disable-install-doc --with-libyaml-dir=/Users/jamie/.rvm/usr 
configure: WARNING: unrecognized options: --with-libyaml-dir
checking build system type... x86_64-apple-darwin11.2.0
checking host system type... x86_64-apple-darwin11.2.0
checking target system type... x86_64-apple-darwin11.2.0
checking whether the C compiler works... no
configure: error: in `/Users/jamie/.rvm/src/ruby-1.9.3-p0':
configure: error: C compiler cannot create executables
See `config.log' for more details

GCC is available:

gcc -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.1~1/src/configure --disable-checking --enable-werror --prefix=/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.1~1/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)

ls /usr/bin | grep gcc         
gcc
i686-apple-darwin11-llvm-gcc-4.2
llvm-gcc
llvm-gcc-4.2

Based on config.log (posted at bottom due to size) I tried symlinking gcc-4.2 to gcc and then installing:

rvm install 1.9.3                       
ERROR: The autodetected CC(/usr/bin/gcc-4.2) is LLVM based, it is not yet fully supported by ruby and gems, please read `rvm requirements`, and set CC=/path/to/gcc .

So I could probably just grab gcc elsewhere, but I'm mostly concerned as to why this is happening. Shouldn't installing Xcode be enough?

config.log:

This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by configure, which was
generated by GNU Autoconf 2.68.  Invocation command line was

  $ ./configure --prefix=/Users/jamie/.rvm/rubies/ruby-1.9.3-p0 --enable-shared --disable-install-doc --with-libyaml-dir=/Users/jamie/.rvm/usr

## --------- ##
## Platform. ##
## --------- ##

hostname = Wilson.local
uname -m = x86_64
uname -r = 11.2.0
uname -s = Darwin
uname -v = Darwin Kernel Version 11.2.0: Tue Aug  9 20:54:00 PDT 2011; root:xnu-1699.24.8~1/RELEASE_X86_64

/usr/bin/uname -p = i386
/bin/uname -X     = unknown

/bin/arch              = unknown
/usr/bin/arch -k       = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo      = Mach kernel version:
     Darwin Kernel Version 11.2.0: Tue Aug  9 20:54:00 PDT 2011; root:xnu-1699.24.8~1/RELEASE_X86_64
Kernel configured for up to 4 processors.
4 processors are physically available.
4 processors are logically available.
Processor type: i486 (Intel 80486)
Processors active: 0 1 2 3
Primary memory available: 8.00 gigabytes
Default processor set: 110 tasks, 546 threads, 4 processors
Load average: 1.28, Mach factor: 2.71
/bin/machine           = unknown
/usr/bin/oslevel       = unknown
/bin/universe          = unknown

PATH: /Users/jamie/.rvm/usr/bin
PATH: /usr/bin
PATH: /bin
PATH: /usr/sbin
PATH: /sbin
PATH: /usr/local/bin
PATH: /usr/X11/bin
PATH: /Users/jamie/bin
PATH: /Users/jamie/.rvm/bin
PATH: /Users/jamie/.rvm/bin


## ----------- ##
## Core tests. ##
## ----------- ##

configure:2764: checking build system type
configure:2778: result: x86_64-apple-darwin11.2.0
configure:2849: checking host system type
configure:2862: result: x86_64-apple-darwin11.2.0
configure:2882: checking target system type
configure:2895: result: x86_64-apple-darwin11.2.0
configure:3376: checking for C compiler version
configure:3385: gcc-4.2 --version >&5
./configure: line 3387: gcc-4.2: command not found
configure:3396: $? = 127
configure:3385: gcc-4.2 -v >&5
./configure: line 3387: gcc-4.2: command not found
configure:3396: $? = 127
configure:3385: gcc-4.2 -V >&5
./configure: line 3387: gcc-4.2: command not found
configure:3396: $? = 127
configure:3385: gcc-4.2 -qversion >&5
./configure: line 3387: gcc-4.2: command not found
configure:3396: $? = 127
configure:3416: checking whether the C compiler works
configure:3438: gcc-4.2    conftest.c  >&5
./configure: line 3440: gcc-4.2: command not found
configure:3442: $? = 127
configure:3480: result: no
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME ""
| #define PACKAGE_TARNAME ""
| #define PACKAGE_VERSION ""
| #define PACKAGE_STRING ""
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define CANONICALIZATION_FOR_MATHN 1
| /* end confdefs.h.  */
| 
| int
| main ()
| {
| 
|   ;
|   return 0;
| }
configure:3485: error: in `/Users/jamie/.rvm/src/ruby-1.9.3-p0':
configure:3487: error: C compiler cannot create executables
See `config.log' for more details

## ---------------- ##
## Cache variables. ##
## ---------------- ##

ac_cv_build=x86_64-apple-darwin11.2.0
ac_cv_env_CCC_set=
ac_cv_env_CCC_value=
ac_cv_env_CC_set=
ac_cv_env_CC_value=
ac_cv_env_CFLAGS_set=
ac_cv_env_CFLAGS_value=
ac_cv_env_CPPFLAGS_set=
ac_cv_env_CPPFLAGS_value=
ac_cv_env_CPP_set=
ac_cv_env_CPP_value=
ac_cv_env_CXXFLAGS_set=
ac_cv_env_CXXFLAGS_value=
ac_cv_env_CXX_set=
ac_cv_env_CXX_value=
ac_cv_env_LDFLAGS_set=
ac_cv_env_LDFLAGS_value=
ac_cv_env_LIBS_set=
ac_cv_env_LIBS_value=
ac_cv_env_build_alias_set=
ac_cv_env_build_alias_value=
ac_cv_env_host_alias_set=
ac_cv_env_host_alias_value=
ac_cv_env_target_alias_set=
ac_cv_env_target_alias_value=
ac_cv_host=x86_64-apple-darwin11.2.0
ac_cv_prog_CC=gcc-4.2
ac_cv_target=x86_64-apple-darwin11.2.0

## ----------------- ##
## Output variables. ##
## ----------------- ##

ALLOCA=''
AR=''
ARCHFILE=''
ARCH_FLAG=''
AS=''
ASFLAGS=''
BASERUBY='ruby'
BUILTIN_ENCOBJS=''
BUILTIN_TRANSOBJS=''
BUILTIN_TRANSSRCS=''
CAPITARGET=''
CC='gcc-4.2'
CCDLFLAGS=''
CFLAGS=''
CHDIR=''
COMMON_HEADERS=''
COMMON_LIBS=''
COMMON_MACROS=''
COUTFLAG=''
CP=''
CPP=''
CPPFLAGS=''
CPPOUTFILE=''
CXX='g++-4.2'
CXXFLAGS=''
DEFS=''
DLDFLAGS=''
DLDLIBS=''
DLEXT2=''
DLEXT=''
DLLWRAP=''
DOT=''
DOXYGEN=''
ECHO_C='\c'
ECHO_N=''
ECHO_T=''
EGREP=''
ENABLE_SHARED=''
EXECUTABLE_EXTS=''
EXEEXT=''
EXPORT_PREFIX=''
EXTOUT=''
EXTSTATIC=''
GCC=''
GNU_LD=''
GREP=''
INSTALLDOC=''
INSTALL_DATA=''
INSTALL_PROGRAM=''
INSTALL_SCRIPT=''
LDFLAGS=''
LDSHARED=''
LDSHAREDXX=''
LIBEXT=''
LIBOBJS=''
LIBPATHENV=''
LIBPATHFLAG=''
LIBRUBY=''
LIBRUBYARG=''
LIBRUBYARG_SHARED=''
LIBRUBYARG_STATIC=''
LIBRUBY_A=''
LIBRUBY_ALIASES=''
LIBRUBY_DLDFLAGS=''
LIBRUBY_LDSHARED=''
LIBRUBY_RELATIVE=''
LIBRUBY_SO=''
LIBS=''
LINK_SO=''
LN_S=''
LTLIBOBJS=''
MAINLIBS=''
MAJOR='1'
MAKEDIRS=''
MAKEFILES=''
MANTYPE=''
MINIOBJS=''
MINIRUBY=''
MINOR='9'
MKDIR_P=''
NM=''
NROFF=''
NULLCMD=''
OBJCOPY=''
OBJDUMP=''
OBJEXT=''
OUTFLAG=''
PACKAGE=''
PACKAGE_BUGREPORT=''
PACKAGE_NAME=''
PACKAGE_STRING=''
PACKAGE_TARNAME=''
PACKAGE_URL=''
PACKAGE_VERSION=''
PATH_SEPARATOR=':'
PKG_CONFIG=''
PREP=''
RANLIB=''
RDOCTARGET=''
RI_BASE_NAME=''
RM=''
RMALL=''
RMDIR=''
RMDIRS=''
RPATHFLAG=''
RUBYW_BASE_NAME='rubyw'
RUBYW_INSTALL_NAME=''
RUBY_BASE_NAME='ruby'
RUBY_INSTALL_NAME=''
RUBY_PROGRAM_VERSION='1.9.3'
RUBY_RELEASE_DATE='2011-10-30'
RUBY_SO_NAME=''
RUNRUBY=''
SET_MAKE=''
SHELL='/bin/sh'
SOLIBS=''
STATIC=''
STRIP=''
SYMBOL_PREFIX=''
TEENY='1'
TEST_RUNNABLE=''
THREAD_MODEL=''
TRY_LINK=''
UNIVERSAL_ARCHNAMES=''
UNIVERSAL_INTS=''
USE_RUBYGEMS=''
WERRORFLAG=''
WINDRES=''
XCFLAGS=''
XLDFLAGS=''
XRUBY=''
XRUBY_LIBDIR=''
XRUBY_RUBYHDRDIR=''
XRUBY_RUBYLIBDIR=''
ac_ct_CC=''
ac_ct_CXX=''
ac_ct_OBJCOPY=''
ac_ct_OBJDUMP=''
arch=''
bindir='${exec_prefix}/bin'
build='x86_64-apple-darwin11.2.0'
build_alias=''
build_cpu='x86_64'
build_os='darwin11.2.0'
build_vendor='apple'
cflags=' ${optflags} ${debugflags} ${warnflags}'
configure_args=''
cppflags=''
cxxflags=' ${optflags} ${debugflags} ${warnflags}'
datadir='${datarootdir}'
datarootdir='${prefix}/share'
debugflags=''
docdir='${datarootdir}/doc/${PACKAGE}'
dvidir='${docdir}'
exec=''
exec_prefix='NONE'
host='x86_64-apple-darwin11.2.0'
host_alias=''
host_cpu='x86_64'
host_os='darwin11.2.0'
host_vendor='apple'
htmldir='${docdir}'
includedir='${prefix}/include'
infodir='${datarootdir}/info'
libdir='${exec_prefix}/lib'
libexecdir='${exec_prefix}/libexec'
localedir='${datarootdir}/locale'
localstatedir='${prefix}/var'
mandir='${datarootdir}/man'
oldincludedir='/usr/include'
optflags=''
pdfdir='${docdir}'
prefix='/Users/jamie/.rvm/rubies/ruby-1.9.3-p0'
program_transform_name='s&^&&'
psdir='${docdir}'
ridir=''
ruby_pc=''
ruby_version=''
rubyhdrdir=''
rubylibprefix=''
rubyw_install_name=''
sbindir='${exec_prefix}/sbin'
setup=''
sharedstatedir='${prefix}/com'
sitearch=''
sitedir=''
sitehdrdir=''
sysconfdir='${prefix}/etc'
target='x86_64-apple-darwin11.2.0'
target_alias=''
target_cpu='x86_64'
target_os='darwin11.2.0'
target_vendor='apple'
try_header=''
vendordir=''
vendorhdrdir=''
warnflags=''

## ----------- ##
## confdefs.h. ##
## ----------- ##

/* confdefs.h */
#define PACKAGE_NAME ""
#define PACKAGE_TARNAME ""
#define PACKAGE_VERSION ""
#define PACKAGE_STRING ""
#define PACKAGE_BUGREPORT ""
#define PACKAGE_URL ""
#define CANONICALIZATION_FOR_MATHN 1

configure: exit 77
Answered By: Arkku ( 461)

This answer was edited multiple times and now contains several alternative solutions. Try the simple “Edit 3” solution first.

Ruby 1.9.3-p125 and later have official support for clang, so if you are installing such a version you should not need GCC. If you’re installing an older version of Ruby, read on.

To compile Ruby with GCC, you need a non-LLVM version of GCC, which is no longer included with XCode 4.2. Install it yourself (or downgrade to XCode 4.1 temporarily), then do CC=/usr/local/bin/gcc-4.2 rvm install 1.9.3 --enable-shared (substituting the path to your non-LLVM gcc).

Edit: https://github.com/kennethreitz/osx-gcc-installer/downloads may help for installing GCC. There is also some info available by running rvm requirements.

Edit 2: For an easier solution, you can try adding --with-gcc=clang to the arguments to configure for Ruby to use clang instead of GCC.

Edit 3: rvm install 1.9.3 --with-gcc=clang does that for you.

Note: With current versions of XCode you need to install the command-line tools separately from the XCode menu -> Preferences -> Downloads -> Components. This is a pre-requisite for doing any compiling with XCode on the command-line, not just Ruby.

Note 2: If something doesn't work after following the steps, try doing a reboot or re-login to ensure that the environment gets set correctly.

Note 3: Ruby versions prior to 1.9.3-p125 may not always be fully compatible with clang, so test your software thoroughly if using the “edit 3” solution in a production environment.

164
KPexEA

When my c++ app crashes I would like to generate a stacktrace.

I already asked this but I guess I needed to clarify my needs.

My app is being run by many different users and it also runs on Linux, Windows and Macintosh ( all versions are compiled using gcc ).

I would like my program to be able to generate a stack trace when it crashes and the next time the user run's it, it will ask them if it is ok to send the stack trace to me so I can track down the problem. I can handle the sending the info to me but I don't know how to generate the trace string. Any ideas?

Answered By: tgamblin ( 137)

For Linux and I believe Mac OS X, if you're using gcc, or any compiler that uses glibc, you can use the backtrace() functions in execinfo.h to print a stacktrace and exit gracefully when you get a segmentation fault. Documentation can be found in the libc manual.

Here's an example program that installs a SIGSEGV handler and prints a stacktrace to stderr when it segfaults. The baz() function here causes the segfault that triggers the handler:

#include <stdio.h>
#include <execinfo.h>
#include <signal.h>
#include <stdlib.h>


void handler(int sig) {
  void *array[10];
  size_t size;

  // get void*'s for all entries on the stack
  size = backtrace(array, 10);

  // print out all the frames to stderr
  fprintf(stderr, "Error: signal %d:\n", sig);
  backtrace_symbols_fd(array, size, 2);
  exit(1);
}

void baz() {
 int *foo = (int*)-1; // make a bad pointer
  printf("%d\n", *foo);       // causes segfault
}

void bar() { baz(); }
void foo() { bar(); }


int main(int argc, char **argv) {
  signal(SIGSEGV, handler);   // install our handler
  foo(); // this will call foo, bar, and baz.  baz segfaults.
}

Compiling with -g -rdynamic gets you symbol info in your output, which glibc can use to make a nice stacktrace:

$ gcc -g -rdynamic ./test.c -o test

Executing this gets you this output:

$ ./test
Error: signal 11:
./test(handler+0x19)[0x400911]
/lib64/tls/libc.so.6[0x3a9b92e380]
./test(baz+0x14)[0x400962]
./test(bar+0xe)[0x400983]
./test(foo+0xe)[0x400993]
./test(main+0x28)[0x4009bd]
/lib64/tls/libc.so.6(__libc_start_main+0xdb)[0x3a9b91c4bb]
./test[0x40086a]

This shows the load module, offset, and function that each frame in the stack came from. Here you can see the signal handler on top of the stack, and the libc functions before main in addition to main, foo, bar, and baz.