The Simplest Stochastic Process — A Random Walk

Kyle P Rasku
6 min readOct 17, 2022

--

English: Neal Herbert, Public domain, via Wikimedia Commons

The random walk is a process that behaves in some surprising and interesting ways. It always begins with some entity, e, occupying a point, p, in space and time. This point can be in d-dimensional Euclidean space, and have some set of coordinates representing its location. Then, at each time unit, t, the entity, e, moves from its current location to another location within 2d nearest neighbors with equal probability, independent of its past movements. When d = 1 or 2, the entity certainly returns to its starting place, and to every possible position as the random walk continues. However, when d = 3 or more, at any time, t, the number of possible steps that increase the distance of the entity, e, from the starting location is much larger than the number of possible steps that would decrease it, resulting in the entity moving away from the starting location and never returning.

The Fair Random Walk and the CLT

The central limit theorem (CLT) establishes that when a random variable is summed, its normalized sum tends to be normally distributed, even when the original variable is not. We can play with this process by using some python code to take an arbitrarily large number of “walks” (equivalent to drawing a large # of samples) of n “steps” where:

  1. n is a discrete, randomly-chosen coordinate adjustment from the default ‘choices’ list, defined in the random walk definition
  2. each possible n has a 25% chance of being selected at random, and
  3. we assume that each “step” takes place at a discrete time-point, equivalent to the # of the steps, 0 — n
Python code defining the Random Walk process (Image by Author)

This process creates a sampling distribution of sample means, where:

  1. The samples are the walks, each having n steps
  2. The sample means are the mean distances traveled from the origin during each walk of n steps, and
  3. The number of times we sample is the total number of walks (generally, a very large number)

When we follow the rules of the fair random walk, the sampling distribution of the sample means follows the Central Limit Theorem (CLT). The population does not need to be normally distributed, but as long as the sample size (n) > 30, the sampling distribution of the sample means will approximate a normal distribution.

Running the Random Walk process 10,000 times, with 100 steps for each walk (Image by Author)
Graph of the 10,000 x 100 Fair Random Walk’s Deviations from Start in 2D Space (Image by Author)
Distribution of the 10,000 x 100 Fair Random Walk’s Deviations from Start in 2D Space, approximating Gaussian and illustrating the Central Limit Theorem (Image by Author)

Measures of Central Tendency

Measures of Central Tendency for x coordinates, y coordinates and distances of 10,000 x 100 Fair Random Walk (Image by Author)

Mean coordinates hover near zero for both x and y, and standard deviations of coordinate locations hover near 7, with the min at -26 from the origin (0,0) and max at about 26. The mean distance traversed is 11.26 steps from the origin (0,0) +/- 6 steps when 100 total steps were taken per walk. Note the difference in size between min/max and 25%/75% percentile for all statistics, consistent with the Gaussian curve.

Introducing Unfairness

What does it take to demonstrate a deterioration of the CLT within a random process like the random walk? Let’s experiment by introducing some unfairness into the walk in 2D space, by favoring the positive y direction by .1 and dis-favoring the negative y direction by the same amount.

A Weighted Random Walk, where the positive y direction is favored over the negative y direction; repetitions are still 10,000 x 100 (Image by Author)
While it is clear from the graph on the left that variance has increased with the addition of unfairness, the normality of the sampling distribution remains robust! (Image by Author)
While differences between x min v. max remain similar, by introducing the y coordinate weighting we can see a significant increase in y max, and a moderate decrease in y min. The mean for y no longer hovers around zero, but is now 5. Distance variation has increased only slightly. (Image by Author)

Here we have the same number of walks (10,000) with the same number of steps (100) — but when the positive y direction is favored over the negative y direction by .2, the variance increases by about 7. When the experiment is repeated with the positive y direction favored over the negative by 1, variance approximately doubles over what it was for the fair random walk.

Note the change in the y-axis range in the plot of the fair walk vs. the unfair walk. The range of the fair walk is about equal in every direction, while the range of the unfair walk predictably positively-skews along the y-axis.

Adding Wormholes

Not only do small changes in bias impact the measures and skew of many real world probabilities and outcomes, but randomness plays an additional part in modifying trajectories that might otherwise seem more predictable.

In the fair random walk, the mean and variance were pretty constant. While we couldn’t predict exactly where any individual walk would travel, overall we could easily calculate the predicted range of many walks with high probability. It doesn’t matter if the walk is fair or biased — if we sample enough walks, they will be normally distributed, and this quality is robust, even when we add weightiness, or cut down on the sample size of the walk / number of steps.

But what happens when we add “wormholes” to a fair random walk? Will the CLT hold? For how long, and under what circumstances?

Lets make a list of coordinates that, when travelled to, “teleport” the walker to another random location on the walk. We can begin with one, then add another, and then another and see how these “random acts” change our analysis of a random walk in 2 dimensions. We can make the impact more marked by adding some distance to the newly generated mapping points as well.

A Random Walk with Wormholes; what’s the process for ensuring denormalization? (Image by Author)
The distribution is no longer normal, and the variance has doubled in every direction (Image by Author)
While means and medians retain their centrality, there is a lot more chaos and variation (Image by Author)

The random walk process still tends towards the normal distribution, but once the process reaches a threshold, normalization is prevented.

How do we characterize the process that leads to chaos?

  1. Interestingly, the fewer wormholes there are, the more drastically the distribution strays from normal. Generating 1/2 wormholes makes wormholes the “new normal”, so there needs to be a maximum of 25 wormholes for every 100 walks for the distribution to begin to denormalize.
  2. The wormholes must be within the range sqrt(n), as an approximation of mean distance traveled — otherwise they are very unlikely to be hit, and the walk will function like the fair random walk.
  3. The “teleportation” mechanism must also possibly move the walker more than mean distance away from their current coordinates

If we meet these criteria, the wormhole experiment will produce an outcome that will successfully denormalize the distribution of the distances.

Code for a Fair Random Walk with n/5 wormholes with range sqrt(n) able to “teleport” greater than mean distance away (Image by Author)

Explore More

Explore more about random walks for yourself by downloading this Jupyter Notebook at https://github.com/krashr-ds/framingham-ms/blob/main/Random%20Walks.ipynb

--

--

Kyle P Rasku
Kyle P Rasku

Written by Kyle P Rasku

Nerd 📚 and Nurse 🩺 - Health Data Scientist, Research Enthusiast, Biostats & Quant Methods Instructor

No responses yet