In my first formal paper on Machine Learning [1], I introduced a dataset of random walks, which now consists of 600 paths, each comprised of 1,000 observations. See Vectorized Deep Learning [2] and Analyzing Dataset Consistency [3], for an updated, and truly formal treatment of the methods introduced in [1]. Every path in the dataset is either upward trending, or downward trending, in that the probability of the path increasing in y-value from a given point is either greater than .5, or less than .5, respectively, producing two distinct classes of paths. The chart below shows the entire original dataset (which had a dimension of 10,000), plotted in the plane. I showed in [1] that my software can predict with good accuracy whether or not a given path will be upward trending or downward trending, given an initial segment of the path. This could itself be useful, however, it dawned on me that the underlying prediction methods imply a potentially new method of pricing assets.

Specifically, the prediction algorithms introduced in [1], [2], and [3], build clusters associated with a given row of the dataset. In the case of the random walk dataset, this causes every path to be associated with a set of paths, that are sufficiently similar. Treating the y-value of the path as the price of some asset, we can then view the cluster of paths associated with a given initial path segment as possible paths that will be taken, as the given initial segment unfolds. Less formally, if you are, e.g., presented with the first 300 observations of a given path, which leaves 700 subsequent observations (since the total dimension is 1,000), the cluster generated from the initial 300 observations could be interpreted as the set of possible values for the remaining 700 observations. This is expressed in the chart below, which shows the cluster associated with the initial segment of a given path.

Discounted cashflows allow you to price an asset based upon its cashflows, and assuming you have the right interest rates, it’s basically impossible to argue that this will produce an incorrect price. However, market prices plainly deviate from theoretical prices, and as a consequence, the objective price of an asset is not the market price, which is instead its disposition value at a given moment in time. Moreover, not all assets have cashflows beyond disposition (e.g., a painting), and as a consequence, we cannot price such an asset using cashflows. However, you can price such an asset using the set of price paths over time. Specifically, from the present, given some information about present conditions, and the asset in question, we can then retrieve a cluster of possible price paths (based upon prior observations) for the asset in question. The simple average over that set of paths at any moment in time gives you the price that will minimize expected error, which is arguably the correct expected disposition value of the asset. This is very simple, but very difficult to argue with, and could therefore produce accurate prices very quickly, for otherwise difficult to price assets. For example, imagine pricing a structured note using this method. Ordinarily, you would have to first model the underlying assets, then account for the waterfall, and finally that would produce a price. In contrast, this method allows you to use Machine Learning to pull a cluster of similar assets, and simply take the average at the time of planned disposition.

The real magic is of course in the production of the cluster, specifically, ensuring that the paths produced are generated from sufficiently similar assets. My software Black Tree AutoML as a general matter produces accurate, and meaningful clusters, plainly suggesting that it can be used to price assets, in addition to everything else it does. Note that because this method uses a distribution to predict a single outcome, you will have to repeat this process over time to achieve the average, which can be done using a portfolio of trades, rather than a single trade.

Applying this to options, consider a Bermuda call option that is exercisable on exactly one date, and fix a strike price. Now pull the cluster of possible price paths for the underlying security. We then look at the set of possible prices on the exercise date, take each such price from the price paths on that date, subtract the strike price from each, and if the amount is negative, set it zero (i.e., the option is worth zero if the market price is below the strike price). Take the average over that set of prices, discount it to the present, and that’s the option premium from the perspective of the option buyer. For the option writer, you set any price below the strike price to the premium, and then solve for the premium using the same average. This creates an asymmetry and therefore a bargaining range, which could at times be empty.

This is (to my knowledge) a completely new method of pricing options, that’s obviously correct. However, it’s at odds with intuition, since volatility doesn’t matter at all, and it is instead only the average over the curve at a moment in time that’s relevant to the premium. This is because the method uses a distribution of outcomes, effectively assuming the same trade is effectuated many times over different possible paths. This can be simulated as a practical matter by buying or selling different yet similar securities, thereby effectuating many trades, and if they’re sufficient in number, you should get close to the average. If there’s any difference between the premiums charged in the markets, and your theoretical premiums, you’ll make money.

However, volatility is relevant to uncertainty, in the literal sense that multiplicity of outcome creates measurable uncertainty. See Information, Knowledge, and Uncertainty [4], generally. As a consequence, when deciding between two options that have the same expected returns on their premiums (as calculated above), you will select the one with lower volatility. Therefore, volatility should reduce the premium. I think it is going to be a function of the number of outcomes necessary to achieve a given expected tolerance around the average. For example, tossing a coin should converge to the uniform distribution faster than tossing a six sided dice. Similarly, an asset class with a wide range of possibilities (i.e., high volatility) should require a larger number of trades than an asset class with a narrow range of possibilities (i.e., low volatility), if the goal is to achieve the average. As a general matter, if the outcome space is real-valued, then the standard deviation should provide information about the rate of convergence, and if instead the outcome space is discrete (e.g., a coin toss), then the entropy should provide information about the rate of convergence. In both cases, the slower the rate of convergence, the greater the discount to the premium, for the simple reason that you have a wider range of outcomes, which again produces measurable uncertainty, and therefore requires a bigger portfolio.

This could be implemented as a practical matter by taking a Monte Carlo sample over the cluster of paths. So e.g., if the cluster contains 100 paths, you would randomly sample 10 out of the 100 paths, some large number of times, which will produce a set of averages, which will be indicative of your outcome space with 10 trades. You could then increase the number of trades, until the distribution looks good from a risk perspective. If that requires too many trades, then you don’t execute on the strategy. It turns out it’s really easy to test whether or not two variables are drawn from the same distribution, using normalization, since they’ll have the same average. This is important, because what you want in this case is a set of assets that are all drawn from sufficiently similar distributions, allowing you to simulate repeated trials of a single random variable.

Note that in this view, increased volatility should reduce liquidity, which is at least anecdotally consistent with how markets really work. Specifically, in the case of options, as volatility increases, uncertainty increases, which will cause a seller to increase the spread over the premium (as calculated above), for the simple reason that the seller is assuming greater risk, in order to achieve the minimum number of trades required to bring the uncertainty to a manageable level. That is, in order to get sufficiently close to the average, the actual dollar value of the portfolio has to be larger, as volatility increases. At a certain point, as volatility increases, both sellers and buyers will lose interest in the trade, because there’s only so much demand for a particular trading strategy. The net conclusion being, this way of thinking is perfectly consistent with the idea that increased volatility reduces liquidity, since both sellers and buyers will need to put up more capital to manage the risk of the trade.

To actually calculate the total premium, I think the only relevant factor in addition to the average considered above, is the cost of financing the portfolio of trades associated with a given trade. For example, assume a market maker M has two products, A and B, and because of volatility, 30% of M’s funds is allocated to trades related to product A, and 70% is allocated to trades related to product B. That is, B is a higher volatility product, and so more trades are required to produce a portfolio that is within some tolerance of the average discussed above. The fair premium for both A and B would simply be the premium calculated above, plus a pro rata portion of the applicable costs of financing the relevant portfolio. For example, if someone put on a trade in product A, equal to 1% of the total outstanding principle for product A, they would be charged the premium (as calculated above), plus 1% of the cost of funds associated with the portfolio for product A. This can of course be discounted to a present value, producing a total premium, that should be acceptable to the seller (subject to a transaction fee, obviously). I’m fairly confident this is not how people think about financing trades and calculating prices, but it’s nonetheless plainly correct, provided you get the Machine Learning step right.