I downloaded it from www.threadbuildingblocks.org
, which is the official site I think. The dowload is called "tbb42_20131118oss_win". I took a look at the header file, and there are two different atomics defined there, one with and one without fetch_and_add. As far as I can tell the latter should only be used for atomic<void>. But clearly it isn't. I can only assume that's a bug in tbb, though I'm not sure exactly under which circumstances it occurs.
Anyway, these are my results so far:
Code: Select all
Short Algoritm, calculation of pi (100000000 steps):
Using Serial: pi = 3.141592653590426, 905 ms
Using OpenMP: pi = 3.141592653589683, 453 ms
Using PPL: pi = 3.141592653589739, 1123 ms
Using TBB: pi = 3.141592653590731, 7504 ms
Long Algorithm: calculation of sum of primes (100000 steps):
Using Serial: Sum of Primes = 454396537, 2589 ms
Using OpenMP: Sum of Primes = 454396537, 1498 ms
Using PPL: Sum of Primes = 454396537, 1310 ms
Using TBB: Sum of Primes = 454396537, 1295 ms
The 'short algorithm' is a single term in a series for pi, which is done in parallel for a lot of terms. The 'long algorithm' is a, purposefully extremely inefficient, test whether a number is prime. Overhead should be pretty low on this long algorithm. And indeed all parallel packages score well. TBB and PPL are I think very similar in architecture, so no surprise they score the same. I'm a bit surprised openMP is slower though.
openMP wins in overhead, but that could be insufficient skill on my part. Certainly the terrible terrible performance of TBB is due to not having atomics. PPL should also score better if you use a newer version - I only got Visual Studio 2010, and that misses a few PPL constructs that should simplify things. I also use the default scheduler for both, which probably is not ideal for such a short algorithm.
In the end, all three choices are probably workable. Which means I have to look at them in more detail to make a choice.