Longest Topswops decks for sizes 20 and 21

The Topswops game

Topswops is a game invented by John Conway. The procedure is simple: start with a deck of n cards, numbered 1 through n, shuffled. Look at the number of the card on top, then reverse the order of that many cards on the top of the deck. Continue until the top card is a 1. For example, say we start with the 5-card deck 3, 1, 4, 5, 2. Then the game goes as follows, finishing after 7 moves:

move 0: 3 1 4 5 2
move 1: 4 1 3 5 2
move 2: 5 3 1 4 2
move 3: 2 4 1 3 5
move 4: 4 2 1 3 5
move 5: 3 1 2 4 5
move 6: 2 1 3 4 5
move 7: 1 2 3 4 5

The number of steps in the game for a given deck is the length of the deck. The question, of course, is what is the maximum length of a deck of size n, which Knuth calls f(n). It’s known that there is a quadratic lower bound, and an exponential upper bound of the (n+1)th Fibonacci number (proof in the Wikipedia article). OEIS sequence A000375 gives the maximum length of the game for various deck sizes. It was most recently extended to sizes 18 and 19 in 2021. Topswops was also the subject of an AZsPCs programming contest in 2011, which produced longest-known decks for prime sizes up to 97.

In the discussion below, I’ll use “an n-l deck” to mean an n-card deck that takes l moves; the example above is a 5-7 deck.

Results

But let’s not bury the lede – the new results are:

f(20) = 249, with a unique length-249 deck: `2 8 11 6 19 17 13 7 18 3 9 20 15 1 12 4 10 14 5 16`

f(21) = 282, with five decks tied for maximum length:

4 10 20 7 21 19 15 14 8 16 13 5 2 9 12 18 17 1 11 6
17 7 13 10 14 6 19 15 21 1 9 2 16 18 20 11 8 5 4 12
11 20 18 16 9 14 10 13 2 1 21 15 8 17 19 7 3 5 4 12
17 8 13 10 14 6 19 15 21 1 9 2 16 18 20 11 3 5 4 12
10 13 8 17 11 20 18 16 2 9 1 21 15 6 19 7 3 5 4 12

At least some of these decks had been previously found through optimization searches, but were not kown for sure to be optimal or complete.

A Topswops search algorithm

Knuth gives several algorithms for finding a longest Topswops deck in Volume 4, section 7.2.1.2, solution to exercise 107. The last one, which he refers to as the “better” algorithm, is a forward search which fills in the value of cards only when it becomes necessary. We start with a deck of blank cards; then we choose a number for the topmost card and make topswops moves until another blank card is on top. By trying each value not already used for blank cards, we can recursively search all permutations while simultaneously making the topswop moves, so that we don’t have to repeat the same initial moves for similar decks. And Knuth has a clever trick for keeping track of where in the initial deck each card started: we label the cards in the initial deck with the negative of their position, so the starting deck is -1, -2, -3, etc. Then a blank card is anything negative, and when we choose a value for it we also separately keep track of which card we assigned in the initial deck.

Pruning the search

We can prune the search in a few ways, to avoid considering all n! decks. The first two pruning strategies are straightforward:

Never choose 1 as the top card until all other values have been used. Choosing 1 ends the game, so if there is any other choice available then obviously it will result in a longer deck.
A longest deck must be a derangement; that is, it must not have any cards in their “home” position, i.e. card m in the m-th position. If it did, we could do a backward move by flipping the top m cards (thus putting m on top of the deck), giving a deck whose length is 1 greater than the current deck.

The third pruning strategy gets more involved. Say we are searching for the best 10-card deck. At some point, maybe card 10 ends up in the last position. Now it can never move again, and effectively we continue the game with a 9-card deck. But if we already know f(9), the maximum length of a 9-card deck, then that is an upper bound for the number of moves remaining; and if that upper bound plus the number of moves so far is not more than the current best 10-card deck, then we can stop searching right away, without filling in any remaining blank cards.

Knuth says that once the largest m cards are in order at the end of the deck, we can use f(n-m) as a bound. Others have observed that we can do better: we can apply the same bound if the largest m cards are at the end of the deck in any order, which is a significant improvement for larger decks.

A backward search

Knuth’s “good” algorithm, which he attributes to Pepperdine, is similar except that it moves backwards. We start with a deck that has 1 on top and the rest of the cards blank. At any point the possible backward moves correspond to any known card which is in its home position, or any blank card, which we can assign the value of its current position; we continue until there are no blank cards and no more backward moves possible.

Although Knuth does not mention it, very similar pruning can be applied to the backward search. A block of cards at the end of the deck is fixed if, for each card: 1) its value is known and it is not in its home position, or 2) it is blank, but the value of its current position has alread been assigned to another card.

With this pruning, the backward algorithm seems to make fewer top-flipping on average than the forward search, and does not require recording the initial deck. However there is a choice to be made after almost every move, because there are usually multiple backward moves available, which requires a lot more saving and restoring of decks (through deeper recursion, or however you choose to implement it). In the forward search, we only make a choice when the top card is blank, so the recursion is only ever n-deep. In my implementation, the extra saving and restoring of the deck outweighs the savings from fewer top-flips, and the backward search was generally 30-50% slower than the forward search. But I can’t shake the feeling that there is some improvement that would make the backward search outperform the forward search.

More pruning

But we can do even better than pruning based on f(n-m). Consider the 10-card search again: with the logic above, we would say that once 10 is in the last position, there can be at most f(9)=30 moves remaining. But can we do even that well? The longest 9-card deck is unique: 6 1 5 9 7 2 8 3 4. We can append a 10 to that and then work backwards to see what 10-card deck (or decks) could lead to it, producing this series of decks:

1 5 9 7 2 8 3 4 10
4 3 8 2 7 9 5 1 6
4 10 8 2 7 9 5 1 6

After 2 moves, no card is in its home position, so no more backward moves are possible. Thus there is a unique 10-32 deck which produces the maximum-length 9-card deck. (Note that, in general, there could be multiple possible backward moves at one time, so we may have to do a real recursive search to find the longest backward chain of moves.) Having done this tiny bit of work, and once we know that f(10) > 32, we can tighten our bound by 1: for a maximum-length 10-card deck, once 10 is in the last position the remaining deck cannot achieve f(9), and there can be at most f(9)-1=29 moves remaining.

We can extend this method: during the size-9 search, we can do some extra work and look for 9-29 decks (one shorter than f(9)), and do the same backward search on those; if they also cannot achieve a new 10-card best, then we can reduce our bound by 1 again. But there’s a pitfall here: our search wants to take advantage of the second pruning technique above and only look for derangements. This works when looking for maximum-length decks, but not when we also want to look for shorter decks. For example, there is exactly one 9-29 deck that our search will not find: the result of starting with the optimal 9-30 deck and making one move, to get 2 7 9 5 1 6 8 3 4.

In general, we can generate all size-_n_ decks with length >= k by: 1) pruning the search only when the maximum possible length is less than k, and 2) taking each deck with length x > k and making x - k forward moves.

Then on each of these decks we can perform a backward search to find the longest size-(n+1) deck which could produce it. Assuming those backward searches don’t find anything which matches the already-known lower bound for f(n+1), then when doing the search for size n+1 we can use k as the bound for our pruning when there is one fixed card at the end of the deck. Since the execution time is proportional to at least n! (and in fact more, because larger decks take more moves), it’s worth spending a lot more effort at one size to save on the next.

We can find the smallest useful k by taking the best-known deck for size n+1 and running it forward until n+1 is in the last place. For example, when playing the optimal 10-38 deck, 10 ends up in last place after 18 moves. The remaining 9 cards take 20 moves to complete, thus the smallest useful k is 21; looking any deeper than that will not allow us to improve the pruning for size 10.

In principle we could take this further – we could do the size-(n-2) search, appending two cards to each found deck and doing the backward search, to tighten the bound used at size-_n_ when there are two fixed cards at the end of the deck. But by experimentation I found that the large majority of pruning occurs with a single fixed card, and that when there are more fixed cards we almost always prune anyway using the existing bounds, so I didn’t take it beyond this first step.

A few tricks

This search is still going to take a long time, so we can tune the code a bit, taking advantage of the fact that we don’t expect to be dealing with decks larger than about 21 cards:

Use a bit vector to track which cards have already been used. Then we can find the lowest unused card (on most CPUs) with a bit-scanning instruction like that used by __builtin_ctz(), and clear the lowest set bit with x &= (x-1).
If we select n as the value for the top card, then we know that in one move we will have n in the last position, and we can perform our pruning check right away, before playing out all the subsequent moves.
An extension to this is that it’s better to check for fixed cards while making topswop moves, rather than waiting until we have a blank card at the top: it both saves moves by pruning sooner, and increases pruning because we notice the fixed cards after fewer moves have been made. But a loop that counts any number of fixed cards at the end of the deck is complex and slow due to branch mispredictions; we don’t want to do that after every move. Instead I hard-coded a check for 2 or 3 in-order cards after each move. Although this gives slightly less pruning than detecting any number of fixed cards, it was significantly faster.
The simplest code to perform a top-swapping move is just a loop which swaps cards from the outsides to the middle. But this is terrible for the branch predictor because the loop count is totally unpredictable. Simply creating a big ugly switch statement with 21 copies of the same loop gave a 15% performance boost, and cut mispredictions nearly in half. (Which I don’t totally understand, because that creates a branch table, which should have an unpredictable indirect branch to index into it… maybe something about branch history.) We can get another 5-10% by using bespoke combinations byte moves and __builtin_bswap for each size. And putting __builtin_unreachable() as the default case helps, by telling the compiler it doesn’t need to check for sizes not specified in the switch statement.

There is a clever way to represent the deck as a doubly-linked list, described by Tom Rokicki, where the stored values are the XOR or sum of adjacent cards. This allows a section of the deck to be reversed by updating just two elements, but requires a linked-list traversal to figure out their positions. This can be a huge savings when operating on large decks, but I didn’t try this because for small decks my bswap code was so short that it seemed hard to beat.

Putting it together

f(n) was known up to n=19. n=20 seemed doable, and maybe n=21 at a stretch – likely to be around 500 times the effort of n=19. So my focus was on making the n=21 search feasible.

Doing the basic size-19 search took about 670 CPU-days and confirmed the published result that f(19)=221. Playing topswops with the best-known 20-card deck (length 249) puts 20 in the last position after 60 moves, leaving a 19-189 deck. So then I reran the size-19 search with k=189, finding all solutions of length 189 and above, which took about 4 times as long as the original search. This found that a size-19 deck with length >= 190 can only appear in a 20-card deck of length <= 242. So if I just wanted to find the best size-20 dL(eck, then when 20 is in the last position I could prune assuming that the remaining 19 cards cannot take more than 189 moves – which is 32 less than f(19)=221, a huge improvement.

But I’m aiming at n=21, so I want to run the size-20 search to some greater depth, and I want to choose that depth to minimize the total time for the size-20 and size-21 searches. I took a sampling of sub-trees across both searches, and measured the effect of various options on the execution time. I settled on running size-20 with k=240 (depth of 9), which (the deep size-19 search tells me) can still use 192 as the 19-card pruning threshold, which is 29 less than f(19). This then allows the size-21 search to use 249-9=240 as the pruning threshold, which reduces execution time by about 38%. Going deeper at size 20 required a much higher pruning threshold, greatly increasing the time for the size-20 search, while giving only modest reduction in the size-21 search.

Detailed results

The size-20 search took ~7200 CPU-days, only ~10x more than the size-19 search due to the stronger pruning, and found that f(20)=249. The size-21 search took ~181,000 CPU-days, and found that f(21)=282.

Here is a complete list of optimal decks for all sizes up to 21:

n=2, length 1:
- 2 1; final deck sorted
n=3, length 2:
- 2 3 1; final deck sorted
- 3 1 2; final deck sorted
n=4, length 4:
- 2 4 1 3; final deck sorted
- 3 1 4 2; final deck sorted
n=5, length 7:
- 3 1 4 5 2; final deck sorted
n=6, length 10:
- 3 6 5 1 4 2; final deck sorted
- 4 1 5 2 6 3; final deck sorted
- 4 5 6 2 1 3; final deck sorted
- 4 1 6 5 2 3; final deck sorted
- 5 6 4 1 3 2; final deck sorted
n=7, length 16:
- 3 1 4 6 7 5 2; final deck sorted
- 4 7 6 2 1 5 3; final deck sorted
n=8, length 22:
- 6 1 5 7 8 3 2 4; final deck sorted
n=9, length 30:
- 6 1 5 9 7 2 8 3 4; final deck sorted
n=10, length 38:
- 5 9 1 8 6 2 10 4 7 3; final deck sorted
n=11, length 51:
- 4 9 11 6 10 7 8 2 1 3 5; final deck sorted
n=12, length 65:
- 2 6 1 10 11 8 12 3 4 7 9 5; final deck 1 6 5 2 3 4 7 8 9 10 11 12
n=13, length 80:
- 2 9 4 5 11 12 10 1 8 13 3 6 7; final deck sorted
n=14, length 101:
- 2 4 9 3 11 1 8 13 6 5 10 14 12 7; final deck sorted
- 3 9 4 2 11 1 8 13 6 5 10 14 12 7; final deck sorted
- 3 13 4 9 2 1 8 11 6 5 10 14 12 7; final deck sorted
- 9 4 11 3 1 8 13 6 2 5 10 14 12 7; final deck sorted
n=15, length 113:
- 2 9 4 11 7 1 13 5 3 14 12 15 8 10 6; final deck 1 3 2 4 5 6 7 8 9 10 11 12 13 14 15
- 3 1 7 11 4 9 2 5 13 14 12 15 8 10 6; final deck 1 3 2 4 5 6 7 8 9 10 11 12 13 14 15
- 3 1 7 11 15 12 9 4 10 2 13 8 14 5 6; final deck 1 3 2 4 5 6 7 8 9 10 11 12 13 14 15
- 3 11 8 15 12 10 5 9 2 13 1 4 14 7 6; final deck 1 3 2 4 5 6 7 8 9 10 11 12 13 14 15
- 5 15 12 11 8 10 3 9 2 13 1 4 14 7 6; final deck 1 3 2 4 5 6 7 8 9 10 11 12 13 14 15
- 7 12 15 10 13 4 2 11 1 8 9 5 14 3 6; final deck 1 3 2 4 5 6 7 8 9 10 11 12 13 14 15
n=16, length 139:
- 9 12 6 7 2 14 8 1 11 13 5 4 15 16 10 3; final deck 1 3 2 4 5 6 7 8 9 10 11 12 13 14 15 16
n=17, length 159:
- 8 15 17 13 9 4 6 3 2 12 16 14 11 5 10 1 7; final deck 1 6 2 4 9 3 7 8 5 10 11 12 13 14 15 16 17
- 2 10 15 11 7 14 5 16 6 4 17 13 1 3 8 9 12; final deck sorted
n=18, length 191:
- 6 14 9 2 15 8 1 3 4 12 18 5 10 13 16 17 11 7; final deck sorted
n=19, length 221:
- 12 15 11 1 10 17 19 2 5 8 9 4 18 13 16 7 3 14 6; final deck 1 10 9 8 7 6 5 4 3 2 11 12 13 14 15 16 17 18 19
- 9 4 19 17 10 1 11 15 12 8 5 2 18 13 16 7 3 14 6; final deck 1 10 9 8 7 6 5 4 3 2 11 12 13 14 15 16 17 18 19
- 12 1 18 11 2 3 14 6 8 16 5 4 15 10 13 17 19 7 9; final deck 1 10 9 8 7 6 5 4 3 2 11 12 13 14 15 16 17 18 19
- 12 1 18 11 3 14 2 6 8 16 5 4 15 10 13 17 19 7 9; final deck 1 10 9 8 7 6 5 4 3 2 11 12 13 14 15 16 17 18 19
n=20, length 249:
- 2 8 11 6 19 17 13 7 18 3 9 20 15 1 12 4 10 14 5 16; final deck 1 10 9 8 7 6 5 4 3 2 11 12 13 14 15 16 17 18 19 20
n=21, length 282:
- 3 4 10 20 7 21 19 15 14 8 16 13 5 2 9 12 18 17 1 11 6; final deck 1 8 11 10 5 12 7 3 4 2 6 9 13 14 15 16 17 18 19 20 21
- 3 17 7 13 10 14 6 19 15 21 1 9 2 16 18 20 11 8 5 4 12; final deck 1 11 10 9 5 2 3 4 6 7 8 12 13 14 15 16 17 18 19 20 21
- 6 11 20 18 16 9 14 10 13 2 1 21 15 8 17 19 7 3 5 4 12; final deck 1 11 10 9 5 2 3 4 6 7 8 12 13 14 15 16 17 18 19 20 21
- 7 17 8 13 10 14 6 19 15 21 1 9 2 16 18 20 11 3 5 4 12; final deck 1 11 10 9 5 2 3 4 6 7 8 12 13 14 15 16 17 18 19 20 21
- 14 10 13 8 17 11 20 18 16 2 9 1 21 15 6 19 7 3 5 4 12; final deck 1 11 10 9 5 2 3 4 6 7 8 12 13 14 15 16 17 18 19 20 21