Skip to content

Instantly share code, notes, and snippets.

@kaityo256
Created February 5, 2020 14:36
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kaityo256/d41481467df504c5039715d38954ab8a to your computer and use it in GitHub Desktop.
Save kaityo256/d41481467df504c5039715d38954ab8a to your computer and use it in GitHub Desktop.
Intel Compiler vs. GCC
#include <iostream>
#include <random>
struct myrand {
uint32_t operator()() {
return 0;
}
uint32_t max(){
return std::mt19937::max();
}
uint32_t min(){
return 0;
}
};
double run(void) {
myrand mt;
double r = 0.0;
std::uniform_real_distribution<> ud(-1.0, 1.0);
for (int j = 0; j <10000; j++) {
for (int i = 0; i < 10000; i++) {
if (i%2) r += ud(mt);
}
}
return r;
}
int main(){
std::cout << run() << std::endl;
}
@kaityo256
Copy link
Author

Environment

  • CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
  • OS: SUSE Linux Enterprise Server 11 (x86_64)
  • g++ (GCC) 5.4.0
  • icpc (ICC) 18.0.5 20180823
$ g++ -O3 -march=native -Wall -Wextra -std=c++11  test.cpp -o gcc.out
$ icpc -O3 -xHOST -Wall -Wextra -std=c++11  test.cpp -o icpc.out
$ time ./gcc.out
-5e+07
./gcc.out  0.06s user 0.00s system 83% cpu 0.077 total

$ time ./icpc.out
-5e+07
./icpc.out  4.43s user 0.00s system 99% cpu 4.444 total

@equal-l2
Copy link

equal-l2 commented Feb 6, 2020

Would you try this myrand definition?
(As per the C++11 spec, G::min() and G::max() are static and constexpr functions, given G is a type which satisfies the UniformRandomBitGenerator requirement)

struct myrand {
  uint32_t operator()() {
    return 0;
  }
  static constexpr uint32_t max(){
    return std::mt19937::max();
  }
  static constexpr uint32_t min(){
    return 0;
  }
};

@kaityo256
Copy link
Author

@equal-l2
Thank you for your suggestion. I have added static constexpr, but the results did not change.

$ time ./gcc.out
-5e+07
./gcc.out  0.06s user 0.00s system 94% cpu 0.064 total

$ time ./icpc.out
-5e+07
./icpc.out  4.43s user 0.00s system 99% cpu 4.442 total

Actually, the compilers generated identical assembly codes.

@equal-l2
Copy link

equal-l2 commented Feb 6, 2020

I compiled and ran the test program on macOS 10.15.3.
(CPU: Intel(R) Core(TM) i7-8569U CPU @ 2.80GHz)

$ g++-9 --version
g++-9 (Homebrew GCC 9.2.0_3) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ icpc --version
icpc (ICC) 19.1.0.166 20191121
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.

$ g++-9 -O3 -march=native -Wall -Wextra -std=c++11 test.cpp -o gcc.out
$ icpc -O3 -xHOST -Wall -Wextra -std=c++11  test.cpp -o icpc.out
$ time ./gcc.out
-5e+07

real	0m0.057s
user	0m0.052s
sys	0m0.003s
$ time ./icpc.out
-5e+07

real	0m0.011s
user	0m0.008s
sys	0m0.002s

The binary built with icpc is faster than the one with g++, under my environment.

@kaityo256
Copy link
Author

kaityo256 commented Feb 6, 2020

Thank you @equal-l2.

I have tried on Linux.

  • CentOS Linux release 7.6.1810 (Core)
  • Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
$ g++ --version
g++ (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ icpc --version
icpc (ICC) 19.0.4.243 20190416
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.

$ g++ -O3 -march=native -Wall -Wextra -std=c++11 test.cpp -o gcc.out
$ icpc -O3 -xHOST -Wall -Wextra -std=c++11  test.cpp -o icpc.out
$ time ./gcc.out
-5e+07
./gcc.out  0.05s user 0.00s system 99% cpu 0.054 total

$ time ./icpc.out
-5e+07
./icpc.out  2.72s user 0.00s system 99% cpu 2.727 total

While I used the latest version of the Intel compiler, I observed similar behavior.
It's weird...

@dc1394
Copy link

dc1394 commented Feb 6, 2020

I measured the execution time of the code modified by @equal-l2. And I also measured the execution time of that code with Clang and show the results.

Environment

  • CPU: Intel(R) Core(TM) i7-7820X @ 3.60GHz
  • OS: Linux Mint 19.3 Tricia (x86_64)
  • g++ (gcc) 9.2.1 20191102
  • icpc (ICC) 19.1.0.166
  • clang++ (Clang) 8.0.0-3~ubuntu18.04.2
$ g++ -O3 -march=native -Wall -Wextra -std=c++11  test.cpp -o gcc.out
$ time ./gcc.out
-5e+07

real	0m0.058s
user	0m0.058s
sys	0m0.000s

$ icpc -O3 -xHOST -Wall -Wextra -std=c++11  test.cpp -o icpc.out
$ time ./icpc.out
-5e+07

real	0m2.914s
user	0m2.914s
sys	0m0.000s

$ clang++-8 -O3 -march=native -Wall -Wextra -std=c++11  test.cpp -o clang.out
$ time ./clang.out
-5e+07

real	0m3.358s
user	0m3.354s
sys	0m0.004s

As you can see, under my environment, the Intel compiler is about 50 times slower than gcc and Clang is about 58 times slower than gcc.

@kaityo256
Copy link
Author

Thanks @dc1394. That's interesting.

In that sense, we should say "GCC generates faster executables" instead of "Intel compiler generates slower ones"...

@uTnOJkji5quPSNE5
Copy link

I think this code doesn't measure the code-gen quality of those two compilers but compares the performance of the Mersenne twister implementation and tuning...

@kaityo256
Copy link
Author

Yep, you are right, @uTnOJkji5quPSNE5.

I should say, "the Mersenne Twister implementation included in GCC was fast". Anyway, I'm not sure why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment