## Archive for December, 2009

### How many dovetail shuffles suffice?

December 17, 2009

How many shuffles are necessary to randomize a deck of cards? Mathematically, card-shuffling can be viewed as a random walk on a finite group and, thus, it can be modeled by a Markov chain. We measure the degree of “randomness” of a deck of $n$ cards after $k$ shuffles using the Kullback-Leibler distance between the probability distribution of the Markov chain after $k$ shuffles and the uniform probability distribution. Lastly, we discuss an intriguing phenomenon that resembles a phase transition.

__________

Introduction

Imagine that we have a deck of $n$ cards. We assign number $1$ to the card at the top of the deck, number $2$ to the next card, and so forth, until we assign number $n$ to the card at the bottom of the deck. This ordered arrangement of cards can be represented by an $n$-tuple $(1,2,\dots,n-1,n)$. The simplest possible shuffle would be to take the card at the top of the deck and insert it at some random position in the deck, say, at the bottom. The new ordered arrangement of cards would then be represented by the $n$-tuple $(2,3,\dots,n,1)$. This shuffling technique is called the top-in-at-random shuffle [1]. In this paper, we will consider the dovetail shuffle instead [2].

In order to model the shuffling process, we need to introduce some mathematical terminology first. We define $[n] := \{1, 2, \dots, n\}$. Let $\mathcal{S}_n$ denote the symmetric group on set $[n]$, i.e., $\mathcal{S}_n$ is the group that consists of all bijections $\pi : [n] \to [n]$ with function composition as the group operation. The elements $\pi \in \mathcal{S}_n$ are called permutations. Given $\pi, \sigma \in \mathcal{S}_n$, then $\pi \circ \sigma$ is the bijection obtained by applying first $\sigma$ and then $\pi$. Every ordered arrangement, or configuration, of a deck of $n$ cards can be associated with an element $\pi \in \mathcal{S}_n$. A sequence of $k$ shuffles can be thought of as a composition of $k$ bijections $\pi^{(0)}, \pi^{(1)}, \dots, \pi^{(k-1)} \in \mathcal{S}_n$, as follows

$\pi^{(k-1)} \circ \dots \circ \pi^{(1)} \circ \pi^{(0)} = \sigma^{(k)} \in \mathcal{S}_n$.

Hence, the card to which we assigned number $i \in [n]$, and that was at position $i$ initially, will be at position $\sigma^{(k)}(i)$ after $k$ shuffles. One can think of the shuffling process as a random walk on $\mathcal{S}_n$. Thus, we can define a Markov chain on $\mathcal{S}_n$. The shuffling method that one uses defines the transition probabilities of the Markov chain. It would be an interesting challenge to start with a desired transition probability matrix and try to design shuffling methods that yield a Markov chain as close as possible to the desired one.

Let us look again at the top-in-at-random shuffling example given above. If we take the top card and insert it at the bottom of the deck, then we know exactly which permutation $\pi \in \mathcal{S}_n$ we did apply. However, the goal of card-shuffling is to “randomize” a deck, and if we know which permutation was applied with each shuffle, then we know exactly the new ordering of the deck. This is desirable only if our purpose is cheating. Therefore, in order to randomize a deck of cards we must apply a sufficient number of permutations chosen at random from $\mathcal{S}_n$, so that there is enough uncertainty on what the new ordering of the deck of cards is. In this post we will study how many shuffles suffice to attain a desired level of uncertainty.

__________

The Dovetail Shuffle

The dovetail shuffle, or riffle shuffle, is a common card-shuffling method that consists of cutting a deck of cards in two packets, and then interleaving the two packets together. A variation is the $a$-shuffle, which consists of cutting the deck in $a$ packets and then interleaving them all together. In this post, we will consider dovetail $2$-shuffles only.

A rising sequence is defined by Bayer & Diaconis in [2] as a “maximal subset of an arrangement of cards, consisting  of successive face values displayed in order”. Initially, the ordered arrangement of the deck is $\pi(i) = i$ for all $i \in [n]$ and, hence, there is exactly one rising sequence. Each dovetail shuffle tends to double the number of rising sequences, until a saturation effect takes place due to the finiteness of the size of the deck. This is an “experimental” result that anyone can verify by playing with an actual deck of cards. Do note that:

• the minimum number of rising sequences in a deck of $n$ cards is $1$. A deck with a single rising sequence is one in which the ordering of the cards is $\pi(i) = i$ for all $i \in [n]$.
• the maximum number of rising sequences in a deck of $n$ cards is $n$. A deck with $n$ rising sequences is one in which the ordering of the cards is $\pi(i) = n + 1 - i$ for all $i \in [n]$.
• after $k$ dovetail shuffles, there are at most $\min\{2^k,n\}$ rising sequences.

We introduce function $r : \mathcal{S}_n \to [n]$, where $r(\pi)$ is defined to be the number of rising sequences in $\pi$. From Bayer & Diaconis’ paper [2], we know that the probability mass function on $\mathcal{S}_n$ after $k$ dovetail shuffles, $p^k : \mathcal{S}_n \to [0,1]$, is defined by

$p^k (\pi) = \displaystyle\binom{2^k + n - r(\pi)}{n} 2^{-n k}$.

Since we know the probability mass function on $\mathcal{S}_n$ after $k$ shuffles, we can study shuffling from an information-theoretic viewpoint. In particular, we are interested in how the entropy of the probability mass function $p^k$ evolves as the deck is shuffled.

__________

Entropy Generation

The Shannon entropy [3] of the probability mass function $p^k$ is

$H(p^k) = - \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2 p^k (\pi) = - \displaystyle\sum_{i=1}^n \sum_{\pi \in \mathcal{S}_n : r(\pi) = i} p^k (\pi) \log_2 p^k (\pi)$

After $k$ shuffles there are at most $\min\{2^k,n\}$ rising sequences in the deck. Therefore, we only need to sum $\min\{2^k,n\}$ terms

$H(p^k) = - \displaystyle\sum_{i=1}^{\min\{2^k,n\}} \sum_{\pi \in \mathcal{S}_n : r(\pi) = i} p^k (\pi) \log_2 p^k (\pi)$.

As mentioned in [2], the cardinality of the set $\{\pi \in \mathcal{S}_n : r(\pi) = i\}$ is the Eulerian number $\left \langle {n\atop i} \right \rangle$. Let $p_i^k$ denote the probability $p^k (\sigma)$ for any $\sigma \in \{\pi \in \mathcal{S}_n : r(\pi) = i\}$, as follows

$p_i^k = \displaystyle\binom{2^k + n - i}{n} 2^{-n k}$

for $i \in \{1,2,\dots, \min\{2^k,n\}\}$.  The entropy of the probability mass function $p^k$ is thus given by

$H(p^k) = - \displaystyle\sum_{i=1}^{\min\{2^k,n\}} \left \langle {n\atop i} \right \rangle p_i^k \log_2 p_i^k$.

Hence, we can compute $H(p^k)$ by summing $\min\{2^k,n\}$ terms (instead of summing $|\mathcal{S}_n| = n!$ terms), which makes the computation tractable for modestly large values of $n$, assuming that the Eulerian numbers can be computed efficiently, of course. A non-recursive formula for the Eulerian number $\left \langle {n\atop i} \right \rangle$ is

$\displaystyle \left \langle {n\atop i} \right \rangle = \displaystyle\sum_{j = 0}^i \binom{n+1}{j} (-1)^j (i-j)^n$.

Using the expression for $p_i^k$ above to compute the entropy $H(p^k)$ is potentially dangerous: the binomial coefficients may take values which are too big, and this might lead to numerical errors. Therefore, it would be wise to break up the binomial coefficients into products

$p_i^k =2^{- n k} \displaystyle\prod_{m=1}^n \left(1 + \frac{2^k - i}{m}\right)$

and to write logarithms of products as sums of logarithms

$\log_2 p_i^k = -n k + \displaystyle\sum_{m=1}^n \log_2 \left(1 + \frac{2^k - i}{m}\right)$.

However, do note that $2^{-n k}$ might take very small values as we might want to consider decks with, say, $n = 312$ cards. A safer way would be to “combine” $2^{-n k}$ with the Eulerian numbers

$2^{-n k} \left \langle {n\atop i} \right \rangle = \displaystyle\sum_{j = 0}^i \binom{n+1}{j} (-1)^j \left(\frac{i-j}{2^k}\right)^n$.

Let the uniform probability mass function on $\mathcal{S}_n$, $u : \mathcal{S}_n \to [0,1]$, be defined by $u(\pi) = \frac{1}{n!}$ for all $\pi \in \mathcal{S}_n$. The entropy of the uniform probability mass function $u$ is

$H(u) = - \displaystyle\sum_{\pi \in \mathcal{S}_n} u (\pi) \log_2 u(\pi) = \displaystyle\sum_{\pi \in \mathcal{S}_n} \frac{1}{n!} \log_2 (n!) = \log_2 (n!)$

and, since the uniform distribution maximizes entropy, we conclude that $0 \leq H(p^k) \leq \log_2 (n!)$. We denote $H_{max} = \log_2 (n!)$. In [1] Aldous & Diaconis use an algebraic argument to show that $p^k \rightarrow u$ as $k \rightarrow \infty$, i.e., the probability mass funtion converges to the uniform one as the number of shuffles tends to infinity. Therefore, we can conclude that $H(p^k) \rightarrow H(u) = H_{max}$ as $k \rightarrow \infty$ and, hence, for every $\varepsilon > 0$ there exists a $k^* \in \mathbb{N}_0$ such that $k \geq k^*$ implies that $(1 - \varepsilon) H_{max} \leq H(p^k) \leq H_{max}$. If one is given a value of $\varepsilon > 0$, then one can find how many shuffles suffice to attain an entropy higher than $(1 - \varepsilon) H_{max}$.

The question “how many shuffles suffice to randomize a deck of $n$ cards?” is ill-posed. One must specify a measure of “randomness” (in this post, we use the Shannon entropy to measure randomness) and a value of $\varepsilon$ to measure how much randomness is “enough” for us to say that a deck is mixed.

As mentioned before, each dovetail shuffle tends to double the number of rising sequences, until saturation occurs. Hence, after $k$ dovetail shuffles, where $k \in \mathbb{N}_0$ is “small”, there should be $r(\pi) \approx 2^k$ rising sequences. We can make the approximation

$p^k (\pi) = \displaystyle\binom{2^k + n - r(\pi)}{n} 2^{-n k} \approx \binom{n}{n} 2^{-n k} = 2^{-n k}$

which yields

$H(p^k) = - \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2 p^k (\pi) \approx - \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2 \left(2^{-n k}\right)$

and, thus

$H(p^k) \approx \underbrace{\left(\sum_{\pi \in \mathcal{S}_n} p^k (\pi)\right)}_{=1} n k = n k$

for “small” $k$. We conclude that dovetail shuffling generates entropy at a rate of approximately $n$ bits per shuffle, until the deck becomes saturated. Using the approximation $H(p^k) \approx n k$, we have that maximum entropy would be attained after approximately $k^{\dagger} = H_{max} / n = \log_2(n!) / n$ shuffles.

In figures 1 and 2 we plot (in blue) the normalized entropy $H(p^k) / H_{max}$ as a function of the number of dovetail shuffles, $k$, for decks of sizes $n = 52$ and $n = 104$, respectively. We plot also (in red) the approximation $H(p^k) \approx n k$, and we plot (in magenta) the maximum normalized entropy.

[ Figure 1: Normalized entropy as a function of the number of shuffles for a deck with $n = 52$ cards ]

[ Figure 2: Normalized entropy as a function of the number of shuffles for a deck with $n = 104$ cards ]

Note that, as predicted theoretically, the entropy increases in an almost linear fashion during the first few shuffles. In fact, the approximation $H(p^k) \approx n k$ is surprisingly good until $k = 3$ for $n = 52$, and until $k = 4$ for $n = 104$. For $n = 52$, we have $k^{\dagger} \approx 4.33$, which is a rough estimate of the number of shuffles after which the deck becomes saturated.

__________

Approach to Uniformity

As shown by Aldous & Diaconis in [1], the stationary probability mass function of the Markov chain is the uniform one. Therefore, the entropy generated by dovetail shuffling converges to the maximum entropy. We measure how “close” to the maximum entropy we are after $k$ shuffles using the Kullback-Leibler distance [3] between the probability mass functions $p^k$ and $u$, as follows

$\begin{array}{rl} D (p^k \| u) &= \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2\left[\frac{p^k (\pi)}{u (\pi)}\right]\\ &= \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \left(\log_2 p^k (\pi) - \log_2 u (\pi)\right)\\ &= \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2 p^k (\pi) - \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2 u (\pi)\\ &= \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2 p^k (\pi) + \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2 (n!)\\ &= - \underbrace{\left[- \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2 p^k (\pi)\right]}_{= H(p^k)} + \underbrace{\left(\displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi)\right)}_{=1} \log_2 (n!)\end{array}$

and, hence, we obtain

$D (p^k \| u) = \log_2 (n!) - H(p^k)$.

Since $0 \leq H(p^k) \leq \log_2 (n!)$, we have that

$0 \leq D (p^k \| u) \leq \log_2 (n!)$.

We can rewrite the equation above in the form

$D (p^k \| u) = H(u) - H(p^k)$,

or in the form

$D (p^k \| u) = \displaystyle\max_q H(q) - H(p^k)$,

because among all probability mass functions on $\mathcal{S}_n$, the one that maximizes the entropy is the uniform distribution $u$. Since the logarithm is a strictly concave function, we can use Jensen’s inequality [3] to obtain

$\begin{array}{rl} - H(p^k) &= \displaystyle\sum_{\pi \in \mathcal{S}_n} p^k (\pi) \log_2 p^k (\pi)\\ &\leq \log_2 \left[\displaystyle\sum_{\pi \in \mathcal{S}_n} \left(p^k (\pi)\right)^2\right] = \log_2 \left(\|p^k\|_2^2\right)\end{array}$

and, therefore, we have the following upper bound on the Kullback-Leibler distance

$\begin{array}{rl} D (p^k \| u) &= \log_2 (n!) - H(p^k)\\ &\leq \log_2 (n!) + \log_2 \left(\|p^k\|_2^2\right)\\ &= \log_2 \left(n! \|p^k\|_2^2\right)\\ &= 2 \log_2 \left(\sqrt{n!} \|p^k\|_2\right)\end{array}$.

Since $\|u\|_2 = 1 / \sqrt{n!}$, the inequality above can be rewritten as follows

$D (p^k \| u) \leq 2 \log_2 \left(\frac{\|p^k\|_2}{\|u\|_2}\right)$.

Imagine that our initial probability mass function is a unit mass at permutation $\sigma$, i.e., $p^0(\pi) = \delta_{\pi\sigma}$, where $\delta_{ij}$ is the Kronecker delta. If this is the case, then there is zero uncertainty and, hence, the entropy is $H(p^0) = 0$, while the $2$-norm is maximum, $\|p^0\|_2 = 1$. At the other extreme, the uniform probability mass function has maximum entropy $H_{max}$, but minimum $2$-norm, $\|u\|_2 = 1 / \sqrt{n!}$. Hence, we have that

$1 / \sqrt{n!} = \|u\|_2 \leq \|p^k\|_2 \leq \|\delta_{\pi\sigma}\|_2 = 1$.

Since $p^k \rightarrow u$ as $k \rightarrow \infty$, then we conclude that $\|p^k\|_2 \rightarrow \|u\|_2$ and, from inequality $D (p^k \| u) \leq 2 \log_2 \left(\|p^k\|_2 / \|u\|_2\right)$, we have that $D (p^k \| u) \rightarrow 0$. For $k$ sufficiently large, the decay of $\|p^k\|_2$ is given by the second largest (in absolute value) eigenvalue of the transition probability matrix of the Markov chain. Hence, for large $k$ there exist constants $c \in \mathbb{R}^{+}$ and $\mu \in (0,1)$, such that

$\|u\|_2 \leq \|p^k\|_2 \leq \|u\|_2 \left(1 + \frac{c}{2} \mu^k\right)$

and, thus, $1 \leq \|p^k\|_2 / \|u\|_2 \leq \left(1 + (c / 2) \mu^k\right)$. Applying logarithms, we obtain

$0 \leq \log_2\left(\frac{\|p^k\|_2}{\|u\|_2}\right) \leq \log_2 \left(1 + \frac{c}{2} \mu^k\right) \leq \frac{c}{2} \mu^k$

where we used the inequality $\log_2(1+x) \leq x$. From the inequality above and the inequality on the relative entropy, we conclude that $D (p^k \| u) \leq c \mu^k$ for sufficiently large $k$. Hence, the Kullback-Leibler distance converges to zero exponentially.

__________

Concluding Remarks

We have concluded that during the first $k$ shuffles, where $2^k < n$, the dovetail shuffling process generates entropy at a rate of approximately $n$ bits per shuffle. However, when $2^k > n$, then $\min\{2^k,n\}$ becomes $n$ and the sum in the formula for the entropy “saturates” in the sense that the number of terms being summed no longer increases with $k$. For $k > \log_2(n)$, the deck is saturated and entropy approaches the maximum entropy asymptotically, as $D (p^k \| u)$ decays exponentially. This interesting phenomenon resembles a phase transition and was pointed out by Trefethen & Trefethen in [4], while in [5] Stark et al. present detailed analytical results.

__________

References

[1] David Aldous, Persi Diaconis, Shuffling cards and stopping times, The American Mathematical Monthly, Vol. 93, No. 5, pp. 333–348, Mathematical Association of America, 1986.

[2] Dave Bayer, Persi Diaconis, Trailing the Dovetail Shuffle to its Lair, The Annals of Applied Probability, Vol. 2, No. 2, pp. 294-313, May 1992.

[3] Thomas M. Cover and Joy A. Thomas, Elements of Information Theory, John Wiley & Sons 2006.

[4] L. N. Trefethen, L. M. Trefethen, How many shuffles to randomize a deck of cards?, Proceedings of the Royal Society of London, Vol. 456, No. 2002, pp. 2561-2568, 2000.

[5] Dudley Stark, A. Ganesh, Neil O’Connell, Information Loss in Card Shuffling, HP Laboratories Bristol, September 1999.

__________

Related: