# How does the coupling of Markov chains work

## The coupling of Markov chains and the random walk on the torus

Transcript

1 The coupling of Markov chains and the odyssey on the torus Verena Monschang Lecture This seminar lecture primarily deals with the coupling of Markov chains. For a better understanding we will consider the random walk on the n-circle in preparation and then use the random walk on the torus as an essential example in which we will try to find an upper bound for the mixing time on this. [LPW09] Chapter 5 is to be cited as the authoritative basis for this elaboration. Coupling of Markov chains We have already got to know the coupling of two probability distributions µ and ν; it was defined as a pair of random variables X and Y, which are defined on the same probability space, so that µ is the marginal distribution of X and ν is the marginal distribution of Y. We now want to go one step further and couple Markov chains, for which we first consider a simple example as an introduction. Example .. A simple random walk on a path with a set of nodes {0 ,, ..., n} is a Markov chain that moves either up or down by one with the same probability with every step. If the random walk tries to exceed one of the final states 0 or n, the journey remains in this. It is clear at a glance that for all x, y {0 ,, .., n} with x y always P t (x, n) P t (y, n) holds. In simple terms, this means that the probability of being at the peak value after t steps does not decrease if a higher value is selected for the starting state. We want to prove this mathematically, for this, let 2, ... be a sequence of u.i.v. {, -} - valued random variables with expected value 0, so that i with i {0 ,, ..., n} assumes the values ​​or - with the same probability. Let (X t) with start value x and (Y t) with start value y two random walks defined on {0 ,, ..., n}. Let P t (x,) denote the distribution of X t and P t (y,) the distribution of Y t, which we want to couple. We move both chains up by one value if possible,

2 if t = +, and down by one value if t =. In this way we move the two chains in lockstep, which means that once the chains have met, they continue to move together. A meeting is only possible at the end states, i.e. at 0 or n. So it is obvious that X t Y t holds when x is y. In particular, this means that X t = n cannot hold without Y t = n also holding. With this consideration we finally get that it really holds. P t (x, n) = P {X t = n} P {Y t = n} = P t (y, n) Using this example, we now motivate the definition of the coupling of two Markov chains: Definition.2. A process (X t, Y t) t = 0 is called coupling of two Markov chains with transition matrix P, if (X t) and (Y t) are two Markov chains with transition matrix P defined on the same probability space. It should be noted that the Markov chains can have different starting distributions. At this point it should be noted that any coupling of Markov chains with transition matrix P can be modified in such a way that the two chains move continuously and simultaneously after they have been in one position for the first time. Formulated a little more precisely: X s = Y s X t = Y tt s. (.) In order to construct a coupling that satisfies this, we can let the chains run after the original coupling until they meet, then we let them run together keep walking. Notation: For the case that (X t) and (Y t) are coupled Markov chains with X 0 = x and Y 0 = y, we often write P x, y for the probability on the space on which (X t) and (Y t) are both defined. 2

3 2 Restriction of the total variation distance Let (X t) and (Y t) continue to be Markov chains with irreducible transition matrix P on the state space Ω and the associated stationary distribution π. We now want to consider the total variation distance of the distributions P t (x,) and P t (y,) and find an upper limit for this distance in the following theorem and thus provide a key tool for this chapter. Theorem 2 .. Let {(X t, Y t)} be a coupling of the two Markov chains (X t) and (Y t) with X 0 = x and Y 0 = y, which suffices (.). Then we define τ couple as the first time at which the two Markov chains (X t) and (Y t) meet, so: τ couple: = min {t: X t = Y t}. (2.) Then: P t (x,) P t (y,) T V P x, y {τ couple> t}. (2.2) Proof. Let t N 0 be fixed, but chosen arbitrarily. Then (X t, Y t) is a coupling of P t (x,) and P t (y,), because P t (x, z) = P x, y {X t = z} and P t (y, z) = P x, y {Y t = z}. According to Proposition.7 [LPW09]: P t (x,) P t (y,) TV = inf {P {XY}: (X, Y) is coupling of P t (x,) and P t (y, )}. (X, Y) It follows immediately that P t (x,) P t (y,) TVP x, y {X t Y t} (2.3), where by (.) P x, y {X t Y t} = P x, y {τ couple> t} holds. All in all, we get (2.2), i.e. P t (x,) P t (y,) T V P x, y {τ couple> t}, i.e. what was to be shown. With the help of this theorem we can now limit the maximum distance between the distribution P t (x,) and the associated stationary distribution π upwards: Corollary 2.2. Assume that for every pair of states x, y Ω there is a coupling (X t, Y t) with X 0 = x and Y 0 = y. Then: d (t) max x, y Ω P x, y {τ couple> t}. 3

4 proof. Immediately follows from the connection of Lemma. [LPW09] with theorem The random walk on the torus Before we turn to the random walk on the torus, one of the main focuses of this lecture, we have to do a little preliminary work by looking at the random walk on the circle. Example 3 .. (The random walk on the circle) We consider the state space Ω = Z n = {0, ..., n}. The random walk on the n-circle is now a Markov chain with transition matrix 2, if k j + (mod n) P (j, k) = 2, if k j- (mod n) 0, otherwise. The graph on which this random walk is based has the set of vertices {, ..., n} and edges between j and k if and only if jk ± (mod n) holds. As one can easily see, this random walk is periodic for even n. To get around this problem we consider the corresponding lazy odyssey. The slow random walk on the n-circle is a Markov chain with transition matrix, if k j + (mod n) P (j, k) =, if k j- (mod n) 2, if kj (mod n) 0, otherwise. Theorem 3.2. For the slow random walk on the n-circle, t mix n applies. 2. Proof. We construct a coupling (X t, Y t) of two Markov chains (X t) and (Y t), which perform the slow random walk on Z n. We let (X t) start at position x and (Y t) at position y. In this coupling we will move the two chains separately from each other so that they cannot jump over each other when they are in adjacent positions. To do this we will toss a fair coin with each move to decide which chain to move. We want to move the chain (X t) one step when the coin is upside down and the

5 chain (Y t) when the coin shows number. We assume that the coin flips are independent of each other. After deciding which chain to move in this way, we toss the coin one more time to decide in which direction the selected chain should move. Once the chains are in the same position in the n-circle at the same point in time, they will move together from now on. We now define D t as the clockwise distance between X t and Y t. Then D t is an odyssey on a path with the node set {0 ,, ..., n}. The ride moves up or down with the same probability on all inner nodes. The states 0 or n are each absorbent, which means that the journey, once it has reached one of these two states, remains in it forever. This raises the question of how long it will take until this random walk has reached one of the states 0 or n. An answer to this is given in Proposition 2. [LPW09], there it is shown that E x, y (τ) = k (nk), where k is the clockwise distance between the starting positions x and y and τ = min {t 0: D t {0, n}}. Now it is obvious that τ = min {t 0: X t = Y t} =: τ couple. t t With the help of Corollary 2.2. and the Markov inequality we get: d (t) max P x, y {τ> t} max E x, y (τ) n2 x, y Z n t t. As one can easily see, the right-hand side of this inequality is exactly the same when t = n2 and thus t mix n 2. With the results collected in the previous theorem, we can now turn to the random walk on the torus. Example 3.3. (The random walk on the torus) We first consider the graph of the d-dimensional torus; its node set is the Cartesian product Z dn = Z n ... Z} {{n. Two points x = (x}, ..., xd) and y = (y, ..., yd) d times out the set of nodes are adjacent if for a j {, ..., n} it is true that xi = yiij and xjyj ± (mod n), ie if the two points differ by the value in exactly one coordinate. As one can easily see, the simple random walk on the graph Z d n is periodic if n is even. We will avoid this problem by considering the slow random walk on Z d n. 5

6 Theorem 3 .. For the slow random walk on the d-dim. Torus Z d n, t mix (ɛ) c (d) n 2 log 2 (ɛ), where c (d) is a constant depending on the dimension d. Proof. We first construct a coupling for every pair (x, y) of starting positions and restrict the expected value of the coupling time τ = τ couple, with regard to Corollary 2.2. to be able to apply. Let (X t) = (Xt, ..., Xt d) and (Y t) = (Yt, ..., Yt d) slow random walks on Z dn with X 0 = x and Y 0 = y . In order to couple these, we select one of the d coordinates at random and with equal probability. If the two random journeys agree in the selected coordinate, we move the two journeys simultaneously in this coordinate with probability by + or - or let them stay in their position with probability 2. If the position differs in the selected coordinate, we choose one of the two chains to move by a fair coin toss and leave the other fixed in the meantime. With another fair coin toss, we then decide on the direction in which the selected chain should be moved with + or -. Let τ i be the time required until the two chains match in coordinate i, and Dt i the clockwise distance between Xt i and Yt i, considered at the times when coordinate i was chosen. Then Dt i i {, ..., d} behaves exactly like the distance D t from the previous theorem 3.2. in the coupling of the sluggish random walk on the n-circle Z n. The time required until the two chains are in the same position in coordinate i is consequently less than n2 if we first ignore the waiting time between the coordinates Zk i is the waiting time between the movements in coordinate i, then Zk i is distributed geometrically (d), since coordinate i is selected with probability d for each movement, and of course the Zi k are distributed identically for all i. From this we get the expected value d for Zk i. With this we can write the time required until the two chains agree in coordinate i as τ i = τ (i) k = Zi k with τ (i) τ, where τ from Theorem 3.2. taken, is the time required for D t to reach 0 or n. With the formula of Wald () and the bound on the mixing time from the previous Theorem 3.2. we get: E x, y (τ i) = E x, y (τ k =. Z ik) () = E x, y (τ) E x, y (Z i) = d E x, y ( τ) 3.2. dn2. We are also interested in the expected coupling time of the random walk on Z d n, it is given by τ couple = max τ i, since the two coupled i d 6

7 chains at the end should match in every coordinate. So we consider E x, y (τ couple) = E x, y (max id τ i) E x, y (d τ i) = d E x, y (τ) d2 n 2 and thus get one from the starting position independent bound. With the help of the Markov inequality we can now show that i = P x, y {τ couple> t} E x, y (τ couple) t d2 n 2 t. The right side of the inequality is the same if t = n2 d 2 and thus t mix d 2 n 2. According to (.36) [LPW09], however, t mix (ɛ) (log 2 (ɛ)) t mix, so that we immediately get t mix (ɛ) d 2 n 2 (log 2 (ɛ)). Bibliography [LPW09] D.A. Levin, Y. Peres, and E.L. Wilmer. Markov chains and mixing times, American Mathematical Society, Providence, RI, With a chapter by James G. Propp and David B. Wilson. 7th