In a previous post, I introduced an extremely fast prediction algorithm that appears to have an average run time complexity of significantly less than O(log(n)), where n is the number of data points in the underlying data set. In this post, I’m going to show how this algorithm can also be used as a fast and generalized method for numerical function approximation, regardless of the function’s type and complexity.

Unlike interpolation methods, this algorithm does not attempt to approximate the underlying function using continuous functions. Instead, the algorithm is rooted in assumptions derived from information theory that allow it to work with the natural domain and range of the underlying function, facilitating extremely precise approximations of a wide variety of smooth, and discontinuous functions, all under one umbrella algorithm.

In this post, I’m going to explain how you can use the algorithm to approximate any function you’d like, and I’ll also present some data that demonstrates its performance and precision. For a full explanation as to how and why this algorithm works, see the post below entitled, “Predicting and Imitating Complex Data Using Information Theory”. In a follow up post, I’ll show how we can combine this algorithm with my image partition algorithm to construct a fast, unsupervised image recognition algorithm.

**How to Use the Algorithm**

The only thing you need in order to use the algorithm is a dataset of Matlab / Octave vectors containing the X, Y, and Z values of your function in three separate row vectors.

I’ll give some more examples below that make use of noisy data to demonstrate the versatility of the algorithm, but let’s begin with the simple linear function z = x + y + 1, evaluated along the line y = x, over the interval [-3,3].

First, we generate the x,y, and z vectors using the following code:

pkg load image

pkg load communications

clear x y z

x = -3 : .012 : 3;

y = -3 : .012 : 3;

z = x .+ y .+ 1;

figure, scatter3(x,y,z)

view(70,15)

The x and y vectors contain the domain values of the function we’d like to approximate, and the z vector contains the corresponding range values of the function. In this case, we’re approximating the function z = x + y + 1, but you could of course choose any function you’d like, whether or not it has a closed form expression, so long as the data is represented as three separate column vectors.

Once we’ve generated the data for the function we’d like to approximate, we can instruct the algorithm to process the data, generating a set of data trees that it will ultimately use to approximate the function.

This step is done using the following code:

data_array = shape_to_data(x,y,z);

[category_tree delta_tree anchor_tree] = generate_data_tree(data_array);

This is by far the most time-consuming part of the algorithm, but it is nonetheless polynomial in time as a function of the number of data points. Moreover, it’s something we’ll only need to do once, since this is the part of the algorithm that actually constructs our approximating function.

At a very high level, what the algorithm does is construct a data tree that can be used to approximate the underlying function at certain key locations within the domain of the function that are selected by the algorithm using assumptions rooted in information theory. New points in the domain that are subsequently fed to the algorithm are then quickly mapped into the data tree, resulting in very fast and very accurate function approximations.

In the previous post, I showed that this method is so accurate that it can reproduce complex three-dimensional objects it has analyzed when subsequently given random inputs, since the algorithm can quickly discern between new data points that belong to the data set / object, and data points that do not.

Continuing with our example above, let’s begin by approximating the underlying function at the point (x,y) = [2.5 2.5], and compare it to the result given by the original function.

This is done using the following code:

new_data_item{1} = [2.5 2.5 0];

[category_index predicted_z final_delta] = predict_best_fit_tree(anchor_tree, delta_tree, new_data_item, 1, 1);

In short, the algorithm takes the new data point (x,y) = [2.5 2.5], and attempts to place it into the data tree constructed from the domain and range of the underlying function.

In this case, we have,

predicted_z = 5.9920;

The original function gives a value of z = 2.5 + 2.5 + 1 = 6, for an error of approximately 0.008.

The last two arguments to the function are toggles that determine how the prediction algorithm behaves. The first toggle tells the algorithm to ignore the input z-value, but I nonetheless always use a z-value of 0 for consistency. The second toggle must be -1, 0, or 1, and determines how much time is spent searching the data tree for a good fit for the input. In this case, I’ve set the toggle to 1, which instructs the algorithm to search the entire data tree for the absolute best fit possible, in order to maximize the precision of our approximation. This of course increases the run time of the algorithm, but since we’re trying to approximate a function, precision takes precedence, especially given that the algorithm is still very fast.

If instead we set the toggle to -1, then we have,

predicted_z = 6.0400.

If we set the toggle to 0, then we have,

predicted_z = 5.9920.

As I mentioned above, this algorithm can also be used to imitate complex three-dimensional shapes, which requires a much higher volume of random inputs, in which case it might be advantageous to quickly process a large number of new data points approximately, rather than spend the incremental time required for greater precision. If you’d like to do that, simply set the toggle to -1 for the fastest placement and lowest precision, or to 0, for intermediate speed and intermediate precision.

Now let’s test the accuracy of the prediction over a large number of points that were not part of the original domain. The original domain had an increment of .012, so we’ll double the number of points by using an increment of .006, to ensure that we’re testing the algorithm at points in the domain that it didn’t evaluate previously, using the following code:

clear prediction_vector error_vector x y z

search_toggle = 1;

cnt = 1;

x = -3 : .006 : 3;

y = i = -3 : .006 : 3;

[temp num_data_points] = size(x);

tic;

for i = 1 : num_data_points

new_data_item{1} = [x(i) y(i) 0];

z(i) = x(i) + y(i) + 1;

[category_index predicted_z final_delta] = predict_best_fit_tree(anchor_tree, delta_tree, new_data_item, 1, search_toggle);

prediction_vector(i) = predicted_z;

error_vector(i) = z(i) – predicted_z;

endfor

toc

figure, scatter3(x,y,z)

view(40,15)

figure, scatter3(x,y,prediction_vector)

view(40,15)

Superficially, the two plots look very similar, so let’s plot the errors and have a look at some related statistics:

figure, plot(1:num_data_points, error_vector)

std(abs(error_vector(:)))

mean(abs(error_vector(:)))

mean(abs(error_vector(:)))/mean(abs(z(:)))

In this case, the errors have an average absolute value of 0.012659, and are generally tightly bounded, though they are clearly not continuously distributed, since again, the algorithm does not use any type of smooth approximation, but instead makes use of a discrete data tree.

Now let’s apply the algorithm to a discontinuous, noisy sine wave function generated by the following code:

clear x y z

cnt = 1;

for i = 0 : .0125 : 3*pi

x(cnt) = i + .25*rand();

y(cnt) = i + .25*rand();

z(cnt) = sin(x(cnt));

cnt = cnt + 1;

endfor

figure, scatter3(x,y,z)

view(70,15)

data_array = shape_to_data(x,y,z);

[category_tree delta_tree anchor_tree] = generate_data_tree(data_array);

This is a sine wave with a noisy domain around the line y = x, over the interval [0,3*pi]. Since this will create a patchy domain and range, we’ll need to actually monitor the instances where the algorithm fails to map an input to the data tree. We do this by including a conditional that tests for whether the algorithm returns a predicted_z of infinity, which is how the algorithm signals that it thinks that an input is outside of the original domain of the function.

This can be done using the following code:

clear prediction_vector error_vector x y z

search_toggle = 1;

cnt = 1;

fail_count = 0;

x = 0 : .00625 : 3*pi;

y = 0 : .00625 : 3*pi;

[temp num_data_points] = size(x);

tic;

for i = 1 : num_data_points

x(cnt) = x(cnt) + .25*rand();

y(cnt) = y(cnt) + .25*rand();

z(cnt) = sin(x(cnt));

new_data_item{1} = [x(cnt) y(cnt) 0];

[category_index predicted_z final_delta] = predict_best_fit_tree(anchor_tree, delta_tree, new_data_item, 1, search_toggle);

if(predicted_z != Inf)

prediction_vector(cnt) = predicted_z;

error_vector(cnt) = z(cnt) – predicted_z;

cnt = cnt + 1;

else

fail_count = fail_count + 1;

endif

endfor

toc

We can generate the output and related statistics using the following code:

fail_count

x = x(1 : cnt – 1);

y = y(1 : cnt – 1);

z = z(1 : cnt – 1);

figure, scatter3(x,y,z)

view(40,15)

figure, scatter3(x,y,prediction_vector)

view(40,15)

figure, plot(1 : cnt – 1, error_vector)

std(abs(error_vector(:)))

mean(abs(error_vector(:)))

mean(abs(error_vector(:)))/mean(abs(z(:)))

In this case, it turns out the algorithm failed to place the new data 24 times, which is about 1.6% of the total number of data points. But, this can vary upon each run since the domain and range include some random noise. The average absolute error is about 0.027, with oscillating peaks that appear to follow the underlying function.

Finally, let’s apply the same technique to a polynomial with a noisy domain, also around the line y = x, using the following code:

clear x y z

cnt = 1;

for i = 1 : .009 : 8

x(cnt) = i + .25*rand();

y(cnt) = i + .25*rand();

z(cnt) = x(cnt)^7 + y(cnt)^7 + 10;

cnt = cnt + 1;

endfor

figure, scatter3(x,y,z)

view(40,15)

data_array = shape_to_data(x,y,z);

[category_tree delta_tree anchor_tree] = generate_data_tree(data_array);

We can generate the approximation using the following code:

search_toggle = 1;

clear prediction_vector error_vector x y z

cnt = 1;

fail_count = 0;

x = 1 : .007 : 8;

y = 1 : .007 : 8;

[temp num_data_points] = size(x);

tic;

for i = 1 : num_data_points

x(cnt) = x(cnt) + .25*rand();

y(cnt) = y(cnt) + .25*rand();

z(cnt) = x(cnt)^7 + y(cnt)^7 + 10;

new_data_item{1} = [x(cnt) y(cnt) 0];

[category_index predicted_z final_delta] = predict_best_fit_tree(anchor_tree, delta_tree, new_data_item, 1, search_toggle);

if(predicted_z != Inf)

prediction_vector(cnt) = predicted_z;

error_vector(cnt) = z(cnt) – predicted_z;

cnt = cnt + 1;

else

fail_count = fail_count + 1;

endif

endfor

toc

And, finally, we can generate the graphs and error statistics using the following code:

fail_count

x = x(1 : cnt – 1);

y = y(1 : cnt – 1);

z = z(1 : cnt – 1);

figure, scatter3(x,y,z)

view(70,20)

figure, scatter3(x,y,prediction_vector)

view(70,20)

figure, plot(1 : cnt – 1, error_vector)

std(abs(error_vector(:)))

mean(abs(error_vector(:)))

mean(abs(error_vector(:)))/mean(abs(z(:)))

As you can see, in all three cases, the approximations are true to the original functions, both in terms of the domain and range values, with very small errors compared to the range values, despite the fact that the functions are clearly very different in nature.

*Octave Scripts*: