Pseudorandom Number Generation
The numpy.random module supplements the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions. For example, you can get a 4 × 4 array of samples from the standard normal distribution using normal:
In [1]: samples = np.random.normal(size=(4, 4))
In [2]: samples
Output: array([[ 0.14340521, -0.39313063, 0.23171811, -0.42243503],
[-0.11106257, -0.09632203, -0.75303053, 0.0169455 ],
[ 0.34445876, 1.04247109, 1.36548241, -0.78550323],
[ 0.32757408, 0.13460323, -1.03003595, 0.00847262]])
Python’s built-in random module, by contrast, only samples one value at a time. As you can see from this benchmark, numpy.random is well over an order of magnitude faster for generating very large samples:
In [3]: from random import normalvariate
In [4]: N = 1000000
In [5]: %timeit samples = [normalvariate(0, 1) for _ in range(N)]
output: 1.77 s +- 126 ms per loop (mean +- std. dev. of 7 runs, 1 loop each)
In [6]: %timeit np.random.normal(size=N)
Output: 61.7 ms +- 1.32 ms per loop (mean +- std. dev. of 7 runs, 10 loops each)
We say that these are pseudorandom numbers because they are generated by an algorithm with deterministic behavior based on the seed of the random number generator. You can change NumPy’s random number generation seed using np.random.seed:
In [7]: np.random.seed(1234)
The data generation functions in numpy.random use a global random seed. To avoid global state, you can use numpy.random.RandomState to create a random number generator isolated from others:
In [8]: rng = np.random.RandomState(1234)
In [9]: rng.randn(10)
Output: array([ 0.47143516, -1.19097569, 1.43270697, -0.3126519 , -0.72058873,
0.88716294, 0.85958841, -0.6365235 , 0.01569637, -2.24268495])
I’ll give some examples of leveraging these functions’ ability to generate large arrays of samples all at once in the next section.