Avoiding duplicate code

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3069
Joined: Sat May 09, 2009 11:55 pm UTC

Avoiding duplicate code

Postby Qaanol » Thu Jul 23, 2015 4:02 pm UTC

I’m having trouble figuring out the best way to do perform a specified subset of available actions efficiently. Essentially, I have a few different calculations that may or may not need to be performed, in a loop over the same data set. Here is a naive example that avoids duplication at the expense of degraded performance:

Code: Select all

int v[k];
for (int i = 0; i < n; ++i) {
  getData(i, v, k);
  if (needResultA) { calculateA(v, k); }
  if (needResultB) { calculateB(v, k); }
}

Of course putting the conditional check inside the loop is slow, but if I pull it out then I end up with something like this:

Code: Select all

int v[k];
if (needResultA && needResultB) {
  for (int i = 0; i < n; ++i) {
    getData(i, v, k);
    calculateA(v, k);
    calculateB(v, k);
  }
} else if (needResultA) {
  for (int i = 0; i < n; ++i) {
    getData(i, v, k);
    calculateA(v, k);
  }
} else if (needResultB) {
  for (int i = 0; i < n; ++i) {
    getData(i, v, k);
    calculateB(v, k);
  }
}


That, obviously, has a lot of repetition—and if there were more than 2 calculations, any arbitrary combination of which might be needed, it would require stupid amounts of code duplication. Making separate functions for each of the 2|calculations|−1 possibilities would appear to have the same problem.

So what’s the right way to accomplish this?
wee free kings

mousewiz
Posts: 107
Joined: Wed Oct 26, 2011 6:50 pm UTC

Re: Avoiding duplicate code

Postby mousewiz » Thu Jul 23, 2015 5:02 pm UTC

I think if you need to get rid of a simple condition check for performance reasons, you need to either dynamically write the loop and then compile/execute it, or write something to statically generate all possible loops (or perhaps, just all possible sequences of calculations, and then dynamically pick from said sequences, but same ugliness either way).

Another way to move *those* condition checks out of the loop would be to create a list of calculations that need executing before the loop, then loop through those inside your loop, but I'd assume that's more expensive than what you've got because then you're stuck both incrementing in your inner loop and checking for its termination.

I guess you could try setting up a data structure like:

Code: Select all

Calculation {
    Calculation next;
    function a(v,k); //Override this
    function b(v,k){
        a(v,k)
        next.a(v,k);
    }
}

Then you can create a dummy calculation that also overrides b to do nothing for your termination. But even then I'd assume the overhead of calling an overridden function would blow away the overhead of a simple condition check.

I also assume that your condition checks are as simple as your pseudo code makes them look. If the checks themselves are actually expensive, then precalculating them seems like the way to go.

schapel
Posts: 244
Joined: Fri Jun 13, 2014 1:33 am UTC

Re: Avoiding duplicate code

Postby schapel » Thu Jul 23, 2015 5:06 pm UTC

Qaanol wrote:Of course putting the conditional check inside the loop is slow...

Why do you say that?

Remember the rules of optimization:
1. Don't do it.
2. (For experts only!) Don't do it yet.

User avatar
ucim
Posts: 6888
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: Avoiding duplicate code

Postby ucim » Thu Jul 23, 2015 5:11 pm UTC

schapel wrote:Remember the rules of optimization:
1. Don't do it.
2. (For experts only!) Don't do it yet.

I have painfully come to appreciate these rules. Following them is still hard, but I'm learning.

If it's easier, ask what it is you are optimizing for - are you trying to make life easy for you, or for the computer?

I'd assume the conditional has low overhead, and put it in the loop, avoiding duplicate code at the expense of the conditional. That ultimately makes it easier for me.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3069
Joined: Sat May 09, 2009 11:55 pm UTC

Re: Avoiding duplicate code

Postby Qaanol » Thu Jul 23, 2015 5:22 pm UTC

I am trying to make the program faster for the end user by avoiding branches. This is the heart of a very tight loop in a simulation, and as such it will run billions of times. This part of the program is written in straight C code, and the conditions are boolean properties of Objective-C classes, which will be assigned to const C variables prior to the loop.

Anyway, the saying is to avoid premature optimization, and to forget about small efficiencies 97% of the time, not to avoid optimization entirely. This is that 3% where small efficiencies matter. I will implement both the naive and prechecked versions to compare their actual performance, but I’m still interested in hearing if there is a better solution.

So for the sake of the discussion, let’s assume that in some other project it would be a non-trivial calculation to determine which subset of computations are needed, and therefore extremely important to lift it out of the loop. Assigning the results of that calculation to consts and hoping the branches are fast is one option. Automatic code generation is another. Is anything else likely to work well?
wee free kings

User avatar
Flumble
Yes Man
Posts: 2261
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: Avoiding duplicate code

Postby Flumble » Thu Jul 23, 2015 6:56 pm UTC

Does the language allow you to bind a function (pointer) to a variable? If so, you can move the conditional outside the loop and have a single loop.
However, it does require 2n-n extra combiner functions, so it starts getting out of hand if there's more than 2 conditions.

[edit] Of course, if calculateA and calculateB are bodies of code or inlined functions in reality, a function call won't beat a (couple of) conditional(s).
Last edited by Flumble on Thu Jul 23, 2015 7:19 pm UTC, edited 1 time in total.

mousewiz
Posts: 107
Joined: Wed Oct 26, 2011 6:50 pm UTC

Re: Avoiding duplicate code

Postby mousewiz » Thu Jul 23, 2015 7:18 pm UTC

Oh, one other possibly relevant thing: what's the ratio of needed:total calculations going to be like for the most part? If you're typically going to be executing, say, 5 out of 100 possible calculations, then I could imagine it being worthwhile to put function pointers in a list outside the loop. A few more instructions per executed calculation might still be worth fewer total instructions.

Also, by the time you're low enough level to be worrying about some jump instructions, I have to wonder if it's worthwhile to get into writing assembly, making sure your other functions are inlined, etc.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Avoiding duplicate code

Postby EvanED » Thu Jul 23, 2015 7:53 pm UTC

I suspect function pointers will lead to a slower solution than original version.

Branch predictors are pretty good, and unless CalculateA or CalculateB is quite long (in which case a mispredict would be a small percentage of the loop's runtime) I'd expect it to get most predictors correct. In contrast, an indirect jump, unless it gets resolved by the compiler, (1) also needs a prediction (2) adds the function prologue and epilogue, and (3) inhibits inlining and subsequent optimization if the compiler would have done this. (3) can be significantly detrimental to performance; a large minority of the benefit of optimization goes away if you disable inlining.

Are you in C or C++? If you're in C you won't find a better solution. With C++ you could turn the body of the loop into a function template that takes a functor. Conceptually similar to the function pointer case, but much easier for the compiler to optimize.

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3069
Joined: Sat May 09, 2009 11:55 pm UTC

Re: Avoiding duplicate code

Postby Qaanol » Thu Jul 23, 2015 8:39 pm UTC

I am in C, and for my project the calculateA() functions are extremely short, to the point that I am manually inlining them (ie. copy-and-paste). For example, one calculation simply adds each element of the fetched data to its corresponding entry in an array of running totals.

EvanED: are you saying that, since the branches are testing the same const variables each time through the loop, the processor is quite likely to predict correctly almost every time?

It’s probably worth mentioning that my program is multithreaded (using NSOperation and NSOperationQueue), so there will be half a dozen or so simulators running concurrently, each needing to use these functions with different preliminary adjustments to the data and different combinations of calculations to perform.
wee free kings

User avatar
PeteP
What the peck?
Posts: 1451
Joined: Tue Aug 23, 2011 4:51 pm UTC

Re: Avoiding duplicate code

Postby PeteP » Thu Jul 23, 2015 9:00 pm UTC

Do the calculateA(v, k) functions every change what you get via later getData calls later in the loop? If not and n/the amount of data isn't too high you could first fetch all the data and then you only need one loop per function. (Or wasn't that the reason there had to be a A+B loop beside the separate A and B loops?) Though more data operations in exchange for less conditions might be a bad trade.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Avoiding duplicate code

Postby Yakk » Thu Jul 23, 2015 9:12 pm UTC

Code: Select all

int v[k];
for (int i = 0; i < n; ++i) {
  getData(i, v, k);
  if (needResultA) { calculateA(v, k); }
  if (needResultB) { calculateB(v, k); }
}

Invert the test and the loop:

Code: Select all

#define LOOP( cond_A, cond B ) \
for (int i = 0; i < n; ++i) { \
  getData(i, v, k); \
  if (cond_A) { calculateA(v, k); } \
  if (cond_B) { calculateB(v, k); } \
}

int v[k];
if(needResultA) {
  if (needResultB)
    LOOP(1,1);
  else
    LOOP(1,0);
} else if (needResultB) {
  LOOP(0,1);
} else {
  LOOP(0,0);
}

#undef LOOP

This should force the loop and condition to be inverted, even on a really dumb compiler. Dead code elimination, ie `if(0) statement;` is a really, really basic optimization.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Avoiding duplicate code

Postby EvanED » Fri Jul 24, 2015 2:26 am UTC

Qaanol wrote:EvanED: are you saying that, since the branches are testing the same const variables each time through the loop, the processor is quite likely to predict correctly almost every time?
If your actual computations are small, I would think so.

It’s probably worth mentioning that my program is multithreaded (using NSOperation and NSOperationQueue), so there will be half a dozen or so simulators running concurrently, each needing to use these functions with different preliminary adjustments to the data and different combinations of calculations to perform.
I suspect that's probably OK. I don't know whether the branch history is stored on a per-core basis or shared, but I strongly suspect it's per-core. If true, you wouldn't see interference between things running concurrently (unless things running in SMT interfere and you are using that). There would be a probably-brief period after a context switch where it could get some wrong predictions, but it would probably be comparatively short (especially considering, by my understanding, the TLB has to be dumped at a context switch if you're on x86).

Anyway, I'm not necessarily saying that you wouldn't see a small speedup with your lifting of the conditionals to outside the loop; that was the "the current solution isn't as bad as you think" half of my argument that dispatching on a function pointer is very likely to be rather worse.

Yakk wrote:This should force the loop and condition to be inverted, even on a really dumb compiler. Dead code elimination, ie `if(0) statement;` is a really, really basic optimization.
Unfortunately it's really ugly and makes dealing with compiler error messages and debugging absolutely a huge pain in the ass. I'd rather manually duplicate the code if it was just the three cases, personally.

User avatar
ucim
Posts: 6888
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: Avoiding duplicate code

Postby ucim » Fri Jul 24, 2015 3:49 am UTC

Can you use the preprocessor to insert the (inline) code? Then, you only have to write it once, and it can be inserted by the preprocessor wherever needed. You get the benefit of multiple copies, and the maintainability of one copy.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

User avatar
WanderingLinguist
Posts: 237
Joined: Tue May 22, 2012 5:14 pm UTC
Location: Seoul
Contact:

Re: Avoiding duplicate code

Postby WanderingLinguist » Fri Jul 24, 2015 5:30 am UTC

If you're talking about Objective-C, can I assume you're targeting either iOS or OS X? In that case, Objective-C++ is an option, and it's well supported (Apple uses it in sample code in cases just like this). You don't have to make the whole project Objective-C++, you could do it just for this component. Then you can use templates... which I think would be a pretty effective solution to the problem.

Tub
Posts: 475
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: Avoiding duplicate code

Postby Tub » Fri Jul 24, 2015 9:55 pm UTC

Qaanol wrote:Anyway, the saying is to avoid premature optimization, and to forget about small efficiencies 97% of the time, not to avoid optimization entirely. This is that 3% where small efficiencies matter. I will implement both the naive and prechecked versions to compare their actual performance, but I’m still interested in hearing if there is a better solution.

Always measure first.

You can do a C++ template (or lots of copy/paste, or maybe even macros) to get 2^n different functions with just the right functionality, but then you'll increase your total code size and increase cache pressure. Since you're running different versions of the function on different cores at the same time, the reduced cache locality may be more expensive than the branches in your original version.


However, if your code is really as simplistic AND data-parallel as you make it sound, I suggest taking a look at openCL and moving it onto the GPU. Don't worry about micro-optimizations until you've exhausted the macro-optimizations.

schapel
Posts: 244
Joined: Fri Jun 13, 2014 1:33 am UTC

Re: Avoiding duplicate code

Postby schapel » Sat Jul 25, 2015 1:59 am UTC

Qaanol wrote:Anyway, the saying is to avoid premature optimization, and to forget about small efficiencies 97% of the time, not to avoid optimization entirely.

You're right, there is a time and a place for optimization, but nearly everyone does it way too early and poorly. That's the point of the saying I posted.
If you're considering using the optimization you suggested, you should at least measure how much faster the faster version is.
Qaanol wrote:I am in C, and for my project the calculateA() functions are extremely short, to the point that I am manually inlining them (ie. copy-and-paste).

Did you measure how much faster that makes the code than using inline functions?
Tub wrote:Always measure first.

Yes!

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3069
Joined: Sat May 09, 2009 11:55 pm UTC

Re: Avoiding duplicate code

Postby Qaanol » Thu Aug 06, 2015 4:37 pm UTC

Tub wrote:However, if your code is really as simplistic AND data-parallel as you make it sound, I suggest taking a look at openCL and moving it onto the GPU. Don't worry about micro-optimizations until you've exhausted the macro-optimizations.

Excellent idea! I have never done any GPU programming before, but I think you are entirely correct that my current project is well-suited for it. Thanks!

Follow-up question: one of the preliminaries for my simulation, before any of these calculations are done, is to generate a large number of random values (mostly from a Gaussian). Does anyone know a good GPU-friendly algorithm for doing so? In particular, is there a high-quality, high-speed, GPU-optimized library available for use to generate pseudorandom values?
wee free kings

User avatar
Sizik
Posts: 1260
Joined: Wed Aug 27, 2008 3:48 am UTC

Re: Avoiding duplicate code

Postby Sizik » Thu Aug 06, 2015 8:31 pm UTC

Qaanol wrote:
Tub wrote:However, if your code is really as simplistic AND data-parallel as you make it sound, I suggest taking a look at openCL and moving it onto the GPU. Don't worry about micro-optimizations until you've exhausted the macro-optimizations.

Excellent idea! I have never done any GPU programming before, but I think you are entirely correct that my current project is well-suited for it. Thanks!

Follow-up question: one of the preliminaries for my simulation, before any of these calculations are done, is to generate a large number of random values (mostly from a Gaussian). Does anyone know a good GPU-friendly algorithm for doing so? In particular, is there a high-quality, high-speed, GPU-optimized library available for use to generate pseudorandom values?


Here's a nearly-constant time algorithm that should work on almost any GPU.
she/they
gmalivuk wrote:
King Author wrote:If space (rather, distance) is an illusion, it'd be possible for one meta-me to experience both body's sensory inputs.
Yes. And if wishes were horses, wishing wells would fill up very quickly with drowned horses.

User avatar
Flumble
Yes Man
Posts: 2261
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: Avoiding duplicate code

Postby Flumble » Thu Aug 06, 2015 8:44 pm UTC

Qaanol wrote:Follow-up question: one of the preliminaries for my simulation, before any of these calculations are done, is to generate a large number of random values (mostly from a Gaussian). Does anyone know a good GPU-friendly algorithm for doing so? In particular, is there a high-quality, high-speed, GPU-optimized library available for use to generate pseudorandom values?

Searching "opencl prng" gives a bunch of results, including (but not limited to) Mersenne Twister-like algorithms and the suggestion to simply generate numbers on the CPU.

My advice: only if your code requires loads of PRNG'ing compared to the rest of the calculations and the program needs heavy optimization despite being parallelized, you should look into the opencl libraries. Otherwise, just generate values on the CPU and push them to the GPU. It most likely makes the code easier to implement and more maintainable.

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3069
Joined: Sat May 09, 2009 11:55 pm UTC

Re: Avoiding duplicate code

Postby Qaanol » Fri Aug 07, 2015 3:17 pm UTC

Flumble wrote:Searching "opencl prng" gives a bunch of results, including (but not limited to) Mersenne Twister-like algorithms and the suggestion to simply generate numbers on the CPU.

My advice: only if your code requires loads of PRNG'ing compared to the rest of the calculations and the program needs heavy optimization despite being parallelized, you should look into the opencl libraries. Otherwise, just generate values on the CPU and push them to the GPU. It most likely makes the code easier to implement and more maintainable.

That makes sense. It is probably easiest to generate uniform variates on the CPU, then use the GPU to Box-Muller them into normal variates for the subsequent parallel calculations.
wee free kings

Derek
Posts: 2181
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: Avoiding duplicate code

Postby Derek » Fri Aug 07, 2015 6:00 pm UTC

I don't think you're going to get faster than copy-pasting (or using a macro) to write two loops. The trade off will be code size. What you're asking for can only be done with a function pointer, but as mentioned that's almost certainly going to be slower than a conditional in practice. If you really do need the speed, then I second doing the extra research to figure out how to move the calculation to the GPU.

Sizik wrote:Here's a nearly-constant time algorithm that should work on almost any GPU.

Ah yes, the reviews for that are a classic.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 5 guests