Below is an algorithm that does real-time video classification, at an average rate of approximately 3 frames per second, running on an iMac.
Each video consists of 10 frames of HD images, roughly 700 KB per frame. The individual unprocessed frames are assumed to be available in memory, simulating reading from a buffer.
This algorithm requires no prior training, and learns on the fly as new frames become available.
The particular task solved by this algorithm is classifying the gestures in the video:
I raise either my left-hand or my right-hand in each video.
The accuracy is in this case 95.455%, in that the algorithm correctly classified 42 of the 44 videos.
Though the algorithm is generalized, this particular instance is used to do gesture classification in real-time, allowing for human-machine interactions to occur, without the machine having any prior knowledge about the gestures that will be used.
That is, this algorithm can autonomously distinguish between gestures in real-time, at least when the motions are sufficiently distinct, as they are in this case.
This is the same classification task that I presented here:
The only difference is that in this case, I used the real-time prediction methods I’ve been working on.
The image files from the video frames are available here:
Though there is a testing and training loop in the code, this is just a consequence of the dataset, which I previously used in a supervised model. That is, predictions are based upon only the data that has already been “observed”, and not the entire dataset.
Note that you’ll have to adjust the file path and number of files in the attached scripts for your machine.
The time-stamps printed to the command line represent the amount of time elapsed per video classification, not the amount of time elapsed per frame. Simply divide the time per video by 10 to obtain the average time per frame.