Computational Complexity: Cuckoo Cycles

Thursday, April 04, 2019

Cuckoo Cycles

Guest Blog by John Tromp (Thanks for the opportunity, Lance)

In the past few months, cycle finding has become one of the most widely run graph theory problems. An estimated quarter-to-half million GPUs are constantly looking for 42-cycles in random bipartite graphs on a billion or more nodes. Customized chips have been designed to perform this specific application an order of magnitude more efficiently, thanks to integrating as much as 1GB of SRAM on a single chip, where Intel/AMD CPUs top out at 64 MB.

The purpose of all this? To compete for block rewards in various cryptocurrencies using the Cuckoo Cycle Proof of Work.

It is named after the Cuckoo Hashtable, in which each data item can be stored at two possible locations, one in each of two arrays. A hash function maps the pair of item and choice of array to its location in that array. When a newly stored item finds both its locations already occupied, one is kicked out and replaced by the new item. This forces the kicked-out item to move to its alternate location, possibly kicking out yet another item. Problems arise if the corresponding Cuckoo graph, a bipartite graph with array locations as nodes and item location pairs as edges, has cycles. While n items, whose edges form a cycle, could barely be stored in the table, any n+1st item mapping within the same set of locations (a chord in the cycle) would not fit, as the pigeonhole principle tells us.
In the Proof-of-Work problem, we typically set n ≥ 29, and use edge indices (items) 0 .. N-1, where N = 2ⁿ. The endpoints of edge i are (siphash(i|0) % N, siphash(i|1) % N), with siphash being a popular keyed hash function. A solution is a cycle of length L in this Cuckoo graph, where typically L = 42.

While looking for cycles takes a lot of time and memory, solutions can be instantly verified. This sets Cuckoo Cycle apart from many other memory hard proofs of work that require computing a memory hard hash function, such as scrypt, both for proving and verifying. Some luck is also required to find a 42-cycle; Slides 18 and 19 of this Grincon.US presentation sketch a proof of the

Cuckoo Cycle Theorem: The expected number of L-cycles tends to 1/L

While there is a known linear time-memory trade-off, it suffers from a large constant factor penalty. A large bounty is offered on the project webpage for better trade-offs.

Cuckoo Cycle solvers spend nearly all cycles on edge trimming; identifying and removing edges that end in a leaf node (of degree 1), as such edges cannot be part of a cycle. Trimming rounds alternate between the two node partitions. This can be done using N bits for the edge bitmap, and N log₂(3) bits for the (truncated) node degrees. The random accesses to the latter, while not a problem for custom chips using enough SRAM, cause large memory latencies on commodity hardware. These can be mostly avoided by storing remaining edges explicitly and (bucket) sorting them to determine node degrees. Despite using an order of magnitude more memory than the edge bitmap, this is still over 4x faster.

The fraction f_i of remaining edges after i trimming rounds (in the limit as N goes to infinity) appears to obey the

Cuckoo Cycle Conjecture: f_i = a_i-1 * a_i, where a_-1 = a₀ = 1, and a_i+1 = 1 - e^-a_i

f_i could equivalently be defined as the fraction of edges whose first endpoint is the middle of a (not necessarily simple) path of length 2i.

So far I have only been able to prove the conjecture for i ≤ 3. For instance, for i = 1, the probability that an edge endpoint is not the endpoint of any other edge is (1-1/N)^N-1 ~ 1/e. Here's hoping someone finds an elegant proof...