In addition to vectorizing my algorithms, I’ve been trying to understand why vectorization matters on a consumer device, and the obvious answer is vectorization makes use of the capacity of a machine to compute in parallel, rather than serial.
However, I realized that I screwed up the code in my first note on the subject, producing what looks like a constant run time –
This is simply wrong, so I took the article down.
What I now believe is that there is parallel capacity, even in consumer devices, but that it’s so small, you end up with a linear run time, not a constant run time, for vectorized operators. Specifically, it seems that vectorized operators can’t process an entire pair of vectors at the same time, but many of the operations are processed simultaneously, possibly implemented as a loop with a drastically smaller number of iterations at a lower level in the language.
I’m now conducting a more careful analysis of the runtimes of basic operations in Octave / Matlab, and I’m also going to have a look at the documentation. The bottom line is, vectorized operators are expressed as array operators, and are plainly more efficiently implemented than iteratively applying operators.