tl;dr: For uniform distribution over {0, 1, 2, …, M - 1}, use M * RANDOM / 32768 instead of RANDOM % M unless 32768 % M == 0 (meaning M = 2^m, where m is {0, 1, 2, …, 15}).

1   Property of RANDOM

From Bash manual:

RANDOM Each time this parameter is referenced, a random integer between 0 and 32767 is generated.

Although it does not state that RANDOM generates uniformly, however, from the following figure, it’s uniform enough for general uses.

http://farm8.staticflickr.com/7412/9274619094_377e32b9a4_z.jpg

for ((i=0; i<3276800; i++)); do echo $RANDOM; done | ./BashRANDOM.py --stdin

All outcomes have same 1 / 32768 probability.

2   RANDOM % M is non-uniform if 32768 % M != 0

2.1   Misuse of RANDOM

From time to time, I would see people write something like RANDOM % 10 and that’s not what the coder think it is, because the result is not uniform on {0, 1, 2, .., 9}. The probability of each value isn’t 1 / 10, but 0.1000061 for {0, 1, 2, .., 7} and 0.0997559 for {8, 9}.

When it’s misused, it creates a random variable with two different probabilities for two sets of values. Therefore, it is not uniform at all, although it may be ignorable.

The calculations of precise probabilities are:

C = 32768 % M
D = 32768 / M

# 0, 1, 2, ..., C - 1
P1 = (D + 1) / 32768

# C, C + 1, C + 2, ..., M - 1
P2 = D / 32768

I have even seen people suggested when they answered questions. I don’t know how many people have realized that it’s wrong, because people want to have a uniform random number generation even they don’t add that to their questions.

But it only clicked in my mind when I first tried to generate a random position between 1 and width of terminal. Although I always know RANDOM is ranged between 0 and 32767, but that’s the point when I finally realized that I was creating a non-uniform random variable if I coded like RANDOM % WIDTH.

2.2   Significate cases: larger M

When M is larger, the ideal probability 1 / M splits into two and the difference between ideal and each of two probabilities grows bigger as M gets larger.

The following figures are M = 200, 2000, and 20000:

http://farm6.staticflickr.com/5465/9271831955_c24b2180a8_z.jpg
http://farm6.staticflickr.com/5321/9271831891_8fcd6b66d4_z.jpg
http://farm3.staticflickr.com/2890/9274619262_3d91e20f6f_z.jpg

When M is low, P1 and P2 are still close to ideal P, but as M grows, P1 and P2 separate further from P:

M = 200, BINS = 200, #Samples = 20,000,000, stdin = False
Probability of ideal uniform            = 5.000000e-03
Probability of each value <  168        = 5.004883e-03
Probability of each value >= 168        = 4.974365e-03

M = 2000, BINS = 200, #Samples = 20,000,000, stdin = False
Probability of ideal uniform            = 5.000000e-04
Probability of each value <  768        = 5.187988e-04
Probability of each value >= 768        = 4.882812e-04

M = 20000, BINS = 200, #Samples = 20,000,000, stdin = False
Probability of ideal uniform            = 5.000000e-05
Probability of each value <  12768      = 6.103516e-05
Probability of each value >= 12768      = 3.051758e-05

2.3   Why?

Consider a simpler case, if you have a uniform random variable RANDOM generates {0, 1, .., 7} and M = 7:

RANDOM     = 0 1 2 3 4 5 6 7
RANDOM % 7 = 0 1 2 3 4 5 6 0

Outcome 0 has two chances to be the outcome, that is the reason why RANDOM % 7 creates a non-uniform random variable.

Now, if M = 4:

RANDOM     = 0 1 2 3 4 5 6 7
RANDOM % 4 = 0 1 2 3 0 1 2 3

Each value has two chances to be the outcome, so it’s still a uniform random variable.

If (# of RANDOM values) % M == 0, then it is uniform; if not, then it’s not uniform.

3   Solution: M * RANDOM / 32768

The solution for it is to re-scale, RANDOM / 32768 is actually like [0, 1), a float number, but not quite so since there is not float type in Bash, it will be zero if it’s not multiply by M first. M * RANDOM / 32768 is close to the following JavaScript:

parseInt(M * Math.random(), 10)

The different is Math.random() has much higher precision and can generate random numbers with M >> 32768 without any problems for most of cases. But it’s not the topic here.

The following figure shows that M * RANDOM / 32768 can resolve and keep distribution uniform:

http://farm4.staticflickr.com/3686/9271831747_2c3d905489_z.jpg

As the figure shows, the probability is close to the ideal 1 / M.

4   Note and Code

The calculations and figures in this post are generated by a Python script, not by Bash unless stated otherwise. The Python script simulate Bash’s random number generation by random module. Although the script can accept numbers through standard input, but it’s not used because of the speed.

Anyway, the following graph has same parameters as the one in Property of RANDOM, but using Python’s random number generator, because they are very close, therefore I use Python’s random number generator to calculate all the numbers and figures.

http://farm3.staticflickr.com/2834/9274619162_56f4fa366c_z.jpg

5   Conclusion

When M is low and randomness isn’t really that important, then it’s still safe to use RANDOM % M. However, if M is fairly large, then M * RANDOM / 32768 is the right method.

Unless it’s like RANDOM % 8 or M is one of {1, 2, 4, 8, 16, …, 32768}, then it does not matter, for instance, M = 16384:

http://farm8.staticflickr.com/7389/9274619340_418c180000_z.jpg

If uncertain, use M * RANDOM / 32768, it’s only two operations. However, if randomness becomes an important key of your script, then it might be better to write your code in other more suitable programming languages.

And don’t give out an answer like RANDOM % M without thinking even the askers don’t care or don’t know anything about it.