Updated Touch Screen Algorithm (Including Dataset)

Following up on my previous note, I’ve now adjusted the technique of the camera-based touch screen method, not the algorithm itself, to make use of an external webcam pointed at the screen. This is for obvious reasons, since it focuses on your hand, though the previous note demonstrated that you get decent precision without even looking at the user’s hand. In this case, you get sensitivity to 1 centimeter of motion, with 95% accuracy, which I’ve tested using the on-screen ruler below:

A screen shot of my desktop, with the ruler on the top right.

The dataset was generated by placing my pointer finger at two points on the screen, each point, one centimeter from the other, adjusting my posture for each photograph. I did this for the top left of the screen, and the top right of the screen. There are ten images per class, for a total of forty images in the dataset. The prediction algorithm is simply nearest neighbor, though fully vectorized, with vectorized pre-processing of each image, which will allow for real-time processing on an iMac. The immediate goal is to translate this to a language that is native to Apple machines, which should allow for even faster processing, and direct access to devices.


If two classes of images are different to the human eye, then as a general matter, they will be different to the algorithm as well. This implies that a closer shot, taken from, for example, the four corners of a monitor, should produce even greater precision (i.e., sensitivity beyond 1cm), and greater accuracy at those precisions. Note that, you can easily produce a single total vector by just listing the images captured by each of the four cameras in some order. As a result, there’s no need for complex analysis to produce a single composite image from the four cameras –

You don’t do that, you keep the four images separate, and use each of the four resultant image matrices, that are then flattened in some order into a single row vector, with a number of columns equal to four times the number of columns you’d get with one flattened image matrix. For context, in Octave, the command to flatten a matrix into a row vector is M(:), where M is the matrix in question. So in this case, for each image capture (i.e., all four cameras firing), you would just have a row vector given by v = [M1(:) M2(:) M3(:) M4(:)].

Below you can find the dataset, and the command line code:

Touch Screen Dataset


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s