Random Number Distributions (JS)

Last week, I talked about a method for choosing random values within a set, one of the many uses for random numbers in a game. This time, I’ll go deeper into the notion of probability distributions, and how you can make use of them in game development and other kinds of programming. A word of warning: this post will have code (usually Javascript) and might use a lot of math!

Simply Random

The uniform distribution is the one you already know. It’s behind all the simple RNG functions we saw before, like C’s rand() or Javascript’s Math.random(), and it’s what you get when you roll a single die: every number has an equal chance of coming up. In math terms, if there are n possibilities, then the chance of getting any specific one is 1/n. Couldn’t be simpler, really. For most game uses, this is your main tool, and the “weighted set” from the other post is a handy add-on.

The Bell Curve

I mentioned the bell curve and the normal distribution last time, but here I’ll go into a little bit more detail. The bell curve, of course, is often an early introduction to the world of statistics and terms like “standard deviation”, and (as we saw in the other post) it’s what you start getting when you roll more and more dice at a time.

Obviously, that’s one way to get a normal distribution: roll a bunch of dice and add them together:

var dieRoll = function() { return Math.floor(Math.random() * 6) + 1; }
var bellCurveRoll = 0;
for (var i = 0; i < 10; i++) {
    bellCurveRoll += dieRoll();
}

This gives us something close (though not perfect): random numbers will range from 10 to 60, with 35 being the most common. If we need something better (i.e., more accurate), we’ll need to delve into probability theory. Don’t worry, it’s only a short dive.

Side Quest

The mean might be familiar to you, since it’s not much more than another name for “average”. For a normal distribution, the mean is the center point of the curve, the part where it’s at its highest.

Less familiar is the standard deviation, although you may remember that from high school or early college. It’s important in a lot of stats-based fields, because it measures how “clustered” a group of data points is.

Now, for a set of data, you can calculate the mean and standard deviation. That’s pretty much how things like grading curves work: you take the data and make a distribution that fits. For RNGs, however, we have to work backwards. Instead of calculating the mean and standard deviation for the data, we choose them at the start, and our RNG uses them to provide the proper distribution of values. For the normal distribution, this means that about 68% of the values will be within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3. (By the way, the standard deviation is usually identified in math by the Greek letter sigma: σ. “3-sigma” confidence, then, is 99.7% likelihood that something is true. “Six sigma” encompasses everything within 6 standard deviations of the mean, or about 99.9999998%; it’s not quite “one in a billion”, but it’s pretty close.)

Back to Normal

So, knowing all this boring math, you can start getting your random numbers. Here’s one way of doing it in Javascript:

function normalDistribution (mu, sigma) {
    var u1 = Math.random();
    var u2 = Math.random();

    var z0 = Math.sqrt(-2.0 * Math.log(u1)) * Math.cos(Math.PI*2 * u2);
    var z1 = Math.sqrt(-2.0 * Math.log(u1)) * Math.sin(Math.PI*2 * u2);

    return z0 * sigma + mu;
}

This uses a method called the Box-Muller transform to generate a random number on a bell curve. (The astute reader will notice that the function actually generates two random numbers, but throws one of them away. If you like, you can make a better function that stores the second value and returns it as the next random number.) The two parameters are our mean (mu, because it is often written as the Greek letter μ) and the standard deviation (sigma). The Wikipedia link above explains the theory behind this method, as well as giving links to other ways, some of which might be faster.

Exponential Randomness

Normal distributions have their uses, and they’re about as far as most people get in studying randomness. But we won’t stop there. We’ll move on.

Next up is the exponential distribution. This one isn’t that hard to make in code:

function expoDistribution (lambda) {
    return -Math.log(1.0 - Math.random()) / lambda;
}

But you might wonder if it’s really useful. Well, as it turns out, it does come in handy sometimes. Basically, the exponential distribution can be used anywhere you need a “time between events” type of randomness, like random events firing in a strategy game.

The lambda parameter is what’s called a “rate parameter”, and it’s intimately related to the mean; in fact, it’s the reciprocal: the exponential distribution’s mean is 1 / lambda. But what does it mean? Let’s take an example: say you’re making a game where special power-ups appear, on average, twice every minute. Using the above function with lambda as 2, the result would be the time (in minutes) until the next power-up. (The mean would then be 1/2, or 30 seconds, which makes intuitive sense.)

Pareto, Principle, and Power

If you’ve ever heard of the “Pareto Principle” or the “80-20 rule”, then you’ve got a head start on learning about the Pareto distribution. The general idea is thus: there are a few with a lot, and a lot with a little. “The top 20% control 80% of the wealth” is the most common way to state this particular idea, but a random distribution with these properties can be surprisingly useful in games. Here’s the code:

function paretoDistribution (minimum, alpha) {
    var u = 1.0 - Math.random();
    return minimum / Math.pow(u, 1.0 / alpha);
}

We have two parameters we can control this time. minimum is the lowest possible value that can be returned, while alpha controls the “shape” of the distribution. (Generally, higher values of alpha have a faster drop-off, meaning that lower values have higher probability. The “80-20” distribution has an alpha of log(5) / log(4), or about 1.161.)

The Pareto distribution isn’t just used for wealth, though. Anywhere you have a “long tail”, it might be what you’re looking for. Random city sizes (or star sizes, for an interstellar game) follow a Pareto distribution, and it might be a good way to model “loot drops” in an RPG; less powerful objects are far more common than the Ultimate Sword of Smiting, after all.

In Closing

These aren’t all that’s available, but the four distributions I’ve shown are probably the most useful for programmers, especially game developers. As before, I didn’t originally come up with any of this; it’s all been known since before I was born. Some code is converted from Python’s extensive random module, specifically the functions random.expovariate() and random.paretovariate(). The code for the normal distribution is my Javascript conversion of the Box-Muller transform example at the Wikipedia link above. (By the way, if you want me to post examples for a different language, just ask!)

A lot of people already know all this material, and there are plenty of other sites detailing them better than I can, but I hope that this is enough to get you interested.