How many shuffles are necessary to randomize a deck of cards? Mathematically, card-shuffling can be viewed as a random walk on a finite group and, thus, it can be modeled by a Markov chain. We measure the degree of “randomness” of a deck of cards after shuffles using the Kullback-Leibler distance between the probability distribution of the Markov chain after shuffles and the uniform probability distribution. Lastly, we discuss an intriguing phenomenon that resembles a phase transition.
Imagine that we have a deck of cards. We assign number to the card at the top of the deck, number to the next card, and so forth, until we assign number to the card at the bottom of the deck. This ordered arrangement of cards can be represented by an -tuple . The simplest possible shuffle would be to take the card at the top of the deck and insert it at some random position in the deck, say, at the bottom. The new ordered arrangement of cards would then be represented by the -tuple . This shuffling technique is called the top-in-at-random shuffle . In this paper, we will consider the dovetail shuffle instead .
In order to model the shuffling process, we need to introduce some mathematical terminology first. We define . Let denote the symmetric group on set , i.e., is the group that consists of all bijections with function composition as the group operation. The elements are called permutations. Given , then is the bijection obtained by applying first and then . Every ordered arrangement, or configuration, of a deck of cards can be associated with an element . A sequence of shuffles can be thought of as a composition of bijections , as follows
Hence, the card to which we assigned number , and that was at position initially, will be at position after shuffles. One can think of the shuffling process as a random walk on . Thus, we can define a Markov chain on . The shuffling method that one uses defines the transition probabilities of the Markov chain. It would be an interesting challenge to start with a desired transition probability matrix and try to design shuffling methods that yield a Markov chain as close as possible to the desired one.
Let us look again at the top-in-at-random shuffling example given above. If we take the top card and insert it at the bottom of the deck, then we know exactly which permutation we did apply. However, the goal of card-shuffling is to “randomize” a deck, and if we know which permutation was applied with each shuffle, then we know exactly the new ordering of the deck. This is desirable only if our purpose is cheating. Therefore, in order to randomize a deck of cards we must apply a sufficient number of permutations chosen at random from , so that there is enough uncertainty on what the new ordering of the deck of cards is. In this post we will study how many shuffles suffice to attain a desired level of uncertainty.
The Dovetail Shuffle
The dovetail shuffle, or riffle shuffle, is a common card-shuffling method that consists of cutting a deck of cards in two packets, and then interleaving the two packets together. A variation is the -shuffle, which consists of cutting the deck in packets and then interleaving them all together. In this post, we will consider dovetail -shuffles only.
A rising sequence is defined by Bayer & Diaconis in  as a “maximal subset of an arrangement of cards, consisting of successive face values displayed in order”. Initially, the ordered arrangement of the deck is for all and, hence, there is exactly one rising sequence. Each dovetail shuffle tends to double the number of rising sequences, until a saturation effect takes place due to the finiteness of the size of the deck. This is an “experimental” result that anyone can verify by playing with an actual deck of cards. Do note that:
- the minimum number of rising sequences in a deck of cards is . A deck with a single rising sequence is one in which the ordering of the cards is for all .
- the maximum number of rising sequences in a deck of cards is . A deck with rising sequences is one in which the ordering of the cards is for all .
- after dovetail shuffles, there are at most rising sequences.
We introduce function , where is defined to be the number of rising sequences in . From Bayer & Diaconis’ paper , we know that the probability mass function on after dovetail shuffles, , is defined by
Since we know the probability mass function on after shuffles, we can study shuffling from an information-theoretic viewpoint. In particular, we are interested in how the entropy of the probability mass function evolves as the deck is shuffled.
The Shannon entropy  of the probability mass function is
After shuffles there are at most rising sequences in the deck. Therefore, we only need to sum terms
As mentioned in , the cardinality of the set is the Eulerian number . Let denote the probability for any , as follows
for . The entropy of the probability mass function is thus given by
Hence, we can compute by summing terms (instead of summing terms), which makes the computation tractable for modestly large values of , assuming that the Eulerian numbers can be computed efficiently, of course. A non-recursive formula for the Eulerian number is
Using the expression for above to compute the entropy is potentially dangerous: the binomial coefficients may take values which are too big, and this might lead to numerical errors. Therefore, it would be wise to break up the binomial coefficients into products
and to write logarithms of products as sums of logarithms
However, do note that might take very small values as we might want to consider decks with, say, cards. A safer way would be to “combine” with the Eulerian numbers
Let the uniform probability mass function on , , be defined by for all . The entropy of the uniform probability mass function is
and, since the uniform distribution maximizes entropy, we conclude that . We denote . In  Aldous & Diaconis use an algebraic argument to show that as , i.e., the probability mass funtion converges to the uniform one as the number of shuffles tends to infinity. Therefore, we can conclude that as and, hence, for every there exists a such that implies that . If one is given a value of , then one can find how many shuffles suffice to attain an entropy higher than .
The question “how many shuffles suffice to randomize a deck of cards?” is ill-posed. One must specify a measure of “randomness” (in this post, we use the Shannon entropy to measure randomness) and a value of to measure how much randomness is “enough” for us to say that a deck is mixed.
As mentioned before, each dovetail shuffle tends to double the number of rising sequences, until saturation occurs. Hence, after dovetail shuffles, where is “small”, there should be rising sequences. We can make the approximation
for “small” . We conclude that dovetail shuffling generates entropy at a rate of approximately bits per shuffle, until the deck becomes saturated. Using the approximation , we have that maximum entropy would be attained after approximately shuffles.
In figures 1 and 2 we plot (in blue) the normalized entropy as a function of the number of dovetail shuffles, , for decks of sizes and , respectively. We plot also (in red) the approximation , and we plot (in magenta) the maximum normalized entropy.
[ Figure 1: Normalized entropy as a function of the number of shuffles for a deck with cards ]
[ Figure 2: Normalized entropy as a function of the number of shuffles for a deck with cards ]
Note that, as predicted theoretically, the entropy increases in an almost linear fashion during the first few shuffles. In fact, the approximation is surprisingly good until for , and until for . For , we have , which is a rough estimate of the number of shuffles after which the deck becomes saturated.
Approach to Uniformity
As shown by Aldous & Diaconis in , the stationary probability mass function of the Markov chain is the uniform one. Therefore, the entropy generated by dovetail shuffling converges to the maximum entropy. We measure how “close” to the maximum entropy we are after shuffles using the Kullback-Leibler distance  between the probability mass functions and , as follows
and, hence, we obtain
Since , we have that
We can rewrite the equation above in the form
or in the form
because among all probability mass functions on , the one that maximizes the entropy is the uniform distribution . Since the logarithm is a strictly concave function, we can use Jensen’s inequality  to obtain
and, therefore, we have the following upper bound on the Kullback-Leibler distance
Since , the inequality above can be rewritten as follows
Imagine that our initial probability mass function is a unit mass at permutation , i.e., , where is the Kronecker delta. If this is the case, then there is zero uncertainty and, hence, the entropy is , while the -norm is maximum, . At the other extreme, the uniform probability mass function has maximum entropy , but minimum -norm, . Hence, we have that
Since as , then we conclude that and, from inequality , we have that . For sufficiently large, the decay of is given by the second largest (in absolute value) eigenvalue of the transition probability matrix of the Markov chain. Hence, for large there exist constants and , such that
and, thus, . Applying logarithms, we obtain
where we used the inequality . From the inequality above and the inequality on the relative entropy, we conclude that for sufficiently large . Hence, the Kullback-Leibler distance converges to zero exponentially.
We have concluded that during the first shuffles, where , the dovetail shuffling process generates entropy at a rate of approximately bits per shuffle. However, when , then becomes and the sum in the formula for the entropy “saturates” in the sense that the number of terms being summed no longer increases with . For , the deck is saturated and entropy approaches the maximum entropy asymptotically, as decays exponentially. This interesting phenomenon resembles a phase transition and was pointed out by Trefethen & Trefethen in , while in  Stark et al. present detailed analytical results.
 David Aldous, Persi Diaconis, Shuffling cards and stopping times, The American Mathematical Monthly, Vol. 93, No. 5, pp. 333–348, Mathematical Association of America, 1986.
 Dave Bayer, Persi Diaconis, Trailing the Dovetail Shuffle to its Lair, The Annals of Applied Probability, Vol. 2, No. 2, pp. 294-313, May 1992.
 Thomas M. Cover and Joy A. Thomas, Elements of Information Theory, John Wiley & Sons 2006.
 L. N. Trefethen, L. M. Trefethen, How many shuffles to randomize a deck of cards?, Proceedings of the Royal Society of London, Vol. 456, No. 2002, pp. 2561-2568, 2000.
 Dudley Stark, A. Ganesh, Neil O’Connell, Information Loss in Card Shuffling, HP Laboratories Bristol, September 1999.