You are currently viewing Python for Data Analysis Part-8

Python for Data Analysis Part-8

Random Walks

The simulation of random walks provides an illustrative application of utilizing array operations. Let’s first consider a simple random walk starting at 0 with steps of 1 and –1 occurring with equal probability.
Here is a pure Python way to implement a single random walk with 1,000 steps using the built-in random module:

In [1]: import random 
          position = 0
          walk = [position]
          steps = 1000
          for i in range(steps): 
              step = 1 if random.randint(0, 1) else -1 
              position += step
              walk.append(position)
In [2]: plt.plot(walk[:100])
Output: [<matplotlib.lines.Line2D at 0x1fc267f9520>]
A simple random walk

You might make the observation that walk is simply the cumulative sum of the random steps and could be evaluated as an array expression. Thus, I use the np.random module to draw 1,000 coin flips at once, set these to 1 and –1, and compute the cumulative sum:

In [3]: nsteps = 1000
In [4]: draws = np.random.randint(0, 2, size=nsteps)
In [5]: steps = np.where(draws > 0, 1, -1)
In [6]: walk = steps.cumsum()

From this we can begin to extract statistics like the minimum and maximum value along the walk’s trajectory:

In [7]: walk.min()
Output: -9
In [8]: walk.max()
Output: 60

A more complicated statistic is the first crossing time, the step at which the random walk reaches a particular value. Here we might want to know how long it took the random walk to get at least 10 steps away from the origin 0 in either direction. np.abs(walk) >= 10 gives us a boolean array indicating where the walk has reached or exceeded 10, but we want the index of the first 10 or –10. Turns out, we can compute this using argmax, which returns the first index of the maximum value in the boolean array (True is the maximum value):

In [9]: (np.abs(walk) >= 10).argmax()
Output: 297

Note that using argmax here is not always efficient because it always makes a full scan of the array. In this special case, once a True is observed we know it to be the maximum value.

Simulating Many Random Walks at Once

If your goal was to simulate many random walks, say 5,000 of them, you can generate all of the random walks with minor modifications to the preceding code. If passed a 2-tuple, the numpy.random functions will generate a two-dimensional array of draws, and we can compute the cumulative sum across the rows to compute all 5,000 random walks in one shot:

In [10]: nwalks = 5000
In [11]: nsteps = 1000
In [12]: draws = np.random.randint(0, 2, size=(nwalks, nsteps)) # 0 or 1
In [13]: steps = np.where(draws > 0, 1, -1)
In [14]: walks = steps.cumsum(1)
In [15]: walks
Output: array([[  1,   2,   3, ...,  46,  47,  46],
       [  1,   0,   1, ...,  40,  41,  42],
       [  1,   2,   3, ..., -26, -27, -28],
       ...,
       [  1,   0,   1, ...,  64,  65,  66],
       [  1,   2,   1, ...,   2,   1,   0],
       [ -1,  -2,  -3, ...,  32,  33,  34]], dtype=int32)

Now, we can compute the maximum and minimum values obtained over all of the walks:

In [11]: walks.max()
Output: 122
In [12]: walks.min()
Output: -128

Out of these walks, let’s compute the minimum crossing time to 30 or –30. This is slightly tricky because not all 5,000 of them reach 30. We can check this using the any method:

In [13]: hits30 = (np.abs(walks) >= 30).any(1)
In [14]: hits30
Output: array([ True,  True,  True, ...,  True, False,  True])
In [14]: hits30.sum() 
Output: 3368

We can use this boolean array to select out the rows of walks that actually cross the absolute 30 level and call argmax across axis 1 to get the crossing times:


In [15]: crossing_times = (np.abs(walks[hits30]) >= 30).argmax(1)
In [16]: crossing_times.mean()
Output: 509.99762470308787

Feel free to experiment with other distributions for the steps other than equal-sized coin flips. You need only use a different random number generation function, like normal to generate normally distributed steps with some mean and standard deviation:

In [18]: steps = np.random.normal(loc=0, scale=0.25,
          size=(nwalks, nsteps))

Leave a Reply