## Reverse engineering cyclical function from observed outputs

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

super_aardvark
Posts: 54
Joined: Tue Jan 22, 2008 9:26 am UTC

### Reverse engineering cyclical function from observed outputs

I've been struggling with this problem for a couple of days, and figured maybe some of the fine folks around here might be able to help. I have to admit I'm a bit of a tourist in these forums -- if there's a more appropriate place for this, please let me know.

There's a function f(x,y) where x and y are integers in the range [0,1000]. The function is a black box to me, and I'm trying to reverse-engineer it. The outputs are in the range [40, 90], and appear as integers, but I suspect there may be rounding going on behind the scenes. After looking at some large portions of the data, it's clear that the function is cyclical in some way, but there are perturbations I haven't been able to eliminate (hence my suspicion about rounding). For example, here's a screenshot of a spreadsheet showing some of the results (x values along the top, y values along the side): http://imgur.com/cp1Qj3n (the highlighting is simply to help reveal the pattern, and because I'm most interested in identifying the inputs which yield outputs at the top end of the range).

Any ideas on how to go about this? Finding the outputs experimentally is expensive, but not prohibitively so, for this function f. I could record all million-plus data points in a few hours. However, there are several other similar functions for which experimental observations are prohibitively expensive (dozens of seconds per observation for function g, and going up from there). So, I'm hoping that if I'm able to figure out how function f works, I'll have an easier time figuring out the others based on much more limited observation. My ultimate goal is to find inputs that (nearly) maximize all five functions at once.

One interesting thing about f -- for any x in the domain (up to 999), f(x, 1000) == f(x+1, 0). I'm not too sure how to use that, though. Also, the left and right edges do not match up the way the top and bottom edges do. I've tried taking two long vertical swathes that look identical at the start (places where I can visualize the pattern repeating) and comparing them, but haven't found any that match up exactly. For example, starting from (62,109) and working down (increasing y values), and also from (78,140), the outputs match up except for being off by 1 every 100 or so inputs.

Some more context about the domain, if it helps. The numbers I'm interested in represent an attribute of hexes in a large grid (a map). Each hex has an (x,y) coordinate which I can see, and the value of the attribute can be revealed with a little bit of time (very small for the attribute represented by f, but it takes longer for the other attributes). The x-axis runs west-to-east, and the y-axis runs northwest-to-southeast. I'm only assuming that the x-y coordinates are the only inputs to this function, but it seems like a reasonable assumption.

Any suggestions or ideas on how to proceed would be much appreciated!
Last edited by super_aardvark on Fri Apr 15, 2016 6:27 pm UTC, edited 1 time in total.

jaap
Posts: 2094
Joined: Fri Jul 06, 2007 7:06 am UTC
Contact:

### Re: Reverse engineering cyclical function from observed outputs

The values seem to range from 40 to 90, not 50 to 90.

It looks to me like it increases by 21 or 22 as x increases, except that if that brings its value over 90 then 51 is subtracted to bring it back into the [40,90] range.
Similarly in the y direction, it increases by 26 or 27 each time.
So it is probably some function like (round(21.5*x + 26.5*y))%51 + 40.
Or possibly (round(21.5*x + 26.5*y + c))%51 + 40 for some translation constant c.

super_aardvark
Posts: 54
Joined: Tue Jan 22, 2008 9:26 am UTC

### Re: Reverse engineering cyclical function from observed outputs

Yes, 40-90, thanks.

Any ideas on how to get those scalars right? Trial and error is proving tedious... these could be arbitrary rational numbers chosen by an RNG. I guess that means any value I guessed could be off by an arbitrary amount, which would put the rounding errors in an arbitrary place on the grid.... which basically makes the problem intractable, right? The quality of my guess will always be limited by the number of points at which I can compare it to experimental data.

jaap
Posts: 2094
Joined: Fri Jul 06, 2007 7:06 am UTC
Contact:

### Re: Reverse engineering cyclical function from observed outputs

super_aardvark wrote:Yes, 40-90, thanks.

Any ideas on how to get those scalars right? Trial and error is proving tedious... these could be arbitrary rational numbers chosen by an RNG. I guess that means any value I guessed could be off by an arbitrary amount, which would put the rounding errors in an arbitrary place on the grid.... which basically makes the problem intractable, right? The quality of my guess will always be limited by the number of points at which I can compare it to experimental data.

Yes, it's tricky.
From the section of values you've given, with a lot of trial and error in a spreadsheet I got the formula: [21.691*(x-71)+26.645*(y-38)+39.048]%51+40
where [] is the round function. Obviously the -71 and -38 are just to translate the coordinates to make it easier.

If the constant 39.048 is changed to 39.047, it is only wrong at one cell - (83,46) gives 42 instead of 43.
If the constant 39.048 is changed to 39.050, it is again only wrong at one cell - (71,48) gives 40 instead of 90.
So it seems as if you could be wrong in any cell, anywhere on the grid. However, I think it is possible to do a more structured search. I have not tried it, and it is a bit hard to explain, so I hope this makes sense.

We know the formula is something like this: [a*(x-71)+b*(y-38)+c]%51+40

If you plug in the coordinates of the four corners, we get
top-left: [c] = 39
top-right: [20a+c] = 9*51+14 = 473
bottom-left: [20b+c] = 11*51+11 = 572
bottom-right: [20a+20b+c] = 19*51+37 = 1006
Unfortunately you will need to have sampled enough cells along the edges of the grid to know how often the values go 'around the clock' along these edges to give you the numbers 9,11,19 in this case.

From these values we get:
38.5<=c<39.5
472.5<=20a+c<473.5 => 433<20a<435 => 21.65<a<21.75
571.5<=20b+c<572.5 => 532<=20b<534 => 26.6<b<26.7
1005.5<=20a+20b+c<1006.5 => 966<=20(a+b)<968 => 48.3<a+b<48.4

This gives you a range of parameter values for which the the formula gives the correct values for the corner cells.
You could then choose values in the middle of that range, e.g. a=21.7, b=26.65, c=39.0 for a decent first approximation.

The formula can still be wrong for many of the unsampled cells. However, it is more likely to be right for cells where a*(x-71)+b*(y-38)+c is close to an integer, and more likely wrong if it is further away from an integer. So (have the computer) calculate the values of a*(x-71)+b*(y-38)+c for all the cells, sort them according to how close to integer they are, and then do a binary search. The search is to find the cell which is closest to integer, but for which the formula gives the wrong answer. Each cell you sample during this search will give you additional inequalities for the parameters, but especially the last sample of the search gives you the tightest inequalities. Then you can update your parameters to have a better formula.

I am not sure how often you would need to do this search and update your parameters, but I think you can probably stop once the cell that rounds down the most and the cell that rounds up the most are both correct.

super_aardvark
Posts: 54
Joined: Tue Jan 22, 2008 9:26 am UTC

### Re: Reverse engineering cyclical function from observed outputs

Thanks, that was helpful.

Return to “Computer Science”

### Who is online

Users browsing this forum: No registered users and 7 guests