You are currently viewing Python for Data Analysis Part-7

Python for Data Analysis Part-7

Pseudorandom Number Generation

The numpy.random module supplements the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions. For example, you can get a 4 × 4 array of samples from the standard normal distribution using normal:

In [1]: samples = np.random.normal(size=(4, 4))
In [2]: samples
Output: array([[ 0.14340521, -0.39313063,  0.23171811, -0.42243503],
       [-0.11106257, -0.09632203, -0.75303053,  0.0169455 ],
       [ 0.34445876,  1.04247109,  1.36548241, -0.78550323],
       [ 0.32757408,  0.13460323, -1.03003595,  0.00847262]])

Python’s built-in random module, by contrast, only samples one value at a time. As you can see from this benchmark, numpy.random is well over an order of magnitude faster for generating very large samples:

In [3]: from random import normalvariate
In [4]: N = 1000000
In [5]: %timeit samples = [normalvariate(0, 1) for _ in range(N)]
output: 1.77 s +- 126 ms per loop (mean +- std. dev. of 7 runs, 1 loop each)
In [6]: %timeit np.random.normal(size=N)
Output: 61.7 ms +- 1.32 ms per loop (mean +- std. dev. of 7 runs, 10 loops each)

We say that these are pseudorandom numbers because they are generated by an algorithm with deterministic behavior based on the seed of the random number generator. You can change NumPy’s random number generation seed using np.random.seed:

In [7]: np.random.seed(1234)

The data generation functions in numpy.random use a global random seed. To avoid global state, you can use numpy.random.RandomState to create a random number generator isolated from others:

In [8]: rng = np.random.RandomState(1234)
In [9]: rng.randn(10)
Output: array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

I’ll give some examples of leveraging these functions’ ability to generate large arrays of samples all at once in the next section.

Function Description
seedSeed the random number generator
permutationReturn a random permutation of a sequence, or return a permuted range
shuffleRandomly permute a sequence in-place
randDraw samples from a uniform distribution
randintDraw random integers from a given low-to-high range
randnDraw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)
binomialDraw samples from a binomial distribution
normalDraw samples from a normal (Gaussian) distribution
betaDraw samples from a beta distribution
chisquareDraw samples from a chi-square distribution
gammaDraw samples from a gamma distribution
uniformDraw samples from a uniform [0, 1) distribution
Partial list of numpy.random functions

Leave a Reply