In a previous article, we looked into how the Kelly criterion could be solved numerically in the case of multiple simultaneous and independent bets. This time, we will look into applying the same principles for bets on events that generate multiple and mutually exclusive outcomes.

## Setting the scene

You have recently designed a groundbreaking model that gives probabilities for all possible outcomes of an English Premier league game. There are 3 possible final results of a league game: *Home win*, *Draw*, or *Away win*. The results are mutually exclusive as only one of them can happen at a time. You check the betting odds at your favourite bookmaker and conclude that there is edge in betting on a Home win and the Draw. Being a big believer of Kelly’s approach to bet sizing you would like to use it, but how should you? Should you use the Kelly formula and apply it to each outcome of interest? Let’s investigate.

## The mathematical view

We will keep it very general using Kelly’s original notations and considering an event with \( N \) possible outcomes. Each outcome occurs with a probability \( p_i \), is linked to a profit-to-loss ratio \( b_i \), and \( f_i \) is the wager put against it. The expectation \( E \) of the logarithm of our wealth is then written:

This only works if the term inside the log function is strictly positive of course. Under this condition, we can express the partial derivative of our expectation w.r.t. a wager \( f_i \):

So basically we now have a formula to calculate our gradient, what’s next? Well, we are going to modify our wager estimates iteratively going along the gradient so that we can aim for a larger expectation at every step. This method is called *Gradient Ascent*.

## Ascending the gradient

If our problem did not have any constraints on the wager values, this problem would all be a walk in the park to solve (assuming the hyperplan of the function is relatively smooth). But the fact that the sum of all our wagers has to be less than 1 - we can not bet more than 100% of the account - brings all the difficulty in this case. So here is our coarse plan of action: At each update of the wagers estimate, we check their sum. If the sum is less than 1, we use these in the next iteration. If they are greater than 1, we scale them all by the same factor to get the sum back to 1.
Let’s see have a look at a piece of code in Scala 3 that implements these ideas. Why *Scala 3* you will ask? Probably because it has not been used in this blog before and because it is quite expressive for mathematical purposes (like most functional languages):

```
def clip(x: Double) : Double = max( min(x, 1.0), 0.0 )
def next_estimate(opt_fs: Option[List[Double]], ps: List[Double], bs: List[Double]) : Option[List[Double]] =
opt_fs.flatMap(fs => {
val value = expectationLogWealth(fs, ps, bs)
val grad = gradient(fs, ps, bs)
val fs_candidate = fs.zip(grad)
.map{ case (f,g) => clip(f + 1e-4 * g) }
val total = fs_candidate.sum
val value_candidate = if total >= 1.0 then
expectationLogWealth(fs_candidate.map{_ / total}, ps, bs)
else
expectationLogWealth(fs_candidate, ps, bs)
if value_candidate < value then
None
else
Some(fs_candidate)
})
def loop(ps: List[Double], bs: List[Double]) : List[List[Double]] =
val initial_fs : List[Double] = List.fill(ps.length)(0.0)
val opt_initial_fs : Option[List[Double]] = Some(initial_fs)
val res : List[Option[List[Double]]] = List.iterate(opt_initial_fs,5000){next_estimate(_,ps,bs)}
.takeWhile{!_.isEmpty}
res.map{_.get}
val fs = loop(ps, bs)
```

The iterative loop can only stop modifying the wager estimates for 2 reasons: the new proposed set of wagers does not bring any improvement on the expectation value or the maximum number of iterations - 5000 in this code - has been reached. The `expectationLogWealth`

and `gradient`

functions are simply implementing the 2 formulas we have talked about previously. Let’s run through a few examples to get a sense of the actual performance of this method.

## The model in action

At the time of writing, the premier league season has already started so why not taking some real odds to run these tests. I have randomly picked a game as our source of odds and put down 3 different sets of probabilities that we will assume have come from our prediction models. Let’s run each through our solver and compare the expectations of the wages recommended by this model with those coming from the Kelly formula individually applied to each outcome.

The model beats straightforward Kelly on case II and case III only. The theme seems to be that when there is a lot of edge embedded in the bets (high overall expectation), the model seems to be performing better than a straightforward Kelly. However, when the overall edge level is moderate, it is doing a worse job. Why would that be? My guess is that in the case of high expectations the value added by the optimisation routine outweigh the error caused by its flawed logic.

## Conclusion

This model is clearly useful to have if you are currently using straightforward Kelly in this type of settings. It is very quick to implement so there are no real reasons for not having it. In the real world, you would want to also consider slippage cost as a strategy requiring a bet on every single outcome could become quite prohibitive from this standpoint.

If you like this post, follow me on Twitter and get notified on the next posts.