Comments on Computational Complexity: Need EASY approaches to getting unif random from non-random sources

More generally, if m = 2^a_1 + ... + 2^a_k is the ...

2018-07-31T15:40:01.901-05:00

More generally, if m = 2^a_1 + ... + 2^a_k is the binary expansion of n, then you can partition {1,...,m} into parts of sizes 2^a_1, ..., 2^a_k, and extract a_i bits from the i'th part.
The expected number of bits you extract in this way (for a random point in {1,...,m}) is log m - O(1) (in fact, the best constant is log(7)-10/7).
If you toss a p-biased coin n times, then you see roughly np ones. The corresponding binomial coefficient m = binomial(n,np) satisfies log m = h(p) n - O(log n). More generally, if k ~ Bin(n,p), then E[log m] = h(p) n - o(n). Therefore this strategy reaches the entropy in expectation.
To get an asymptotic rate of h(p) with high probability, it suffices to partition the input into omega(1) parts of size omega(1).

How do you use the other 3 possibilities? Lets say...

2018-07-29T18:00:18.693-05:00

How do you use the other 3 possibilities? Lets say these are 1110000, 1101000, 1100100. If you get the first one, add the bit 0 to your string. If you get the second, add the bit 1 to your string. And if you get the third, do nothing. This still gives each bit an equal probability of being 0 and 1. This technique gives you .544 bits/coin flip. You can use a modification of this similar to arithmetic coding to get .628 bits/coin flip. And making the length of the blocks go to infinity should make the efficiency go to 1 bit/coin flip.

Hi Bill, Mine should give truly random bits even i...

2018-07-29T17:51:18.307-05:00

Hi Bill,
Mine should give truly random bits even if the original source is very biased (but independent). Whether it gives more bits than your easy protocol is another question. It can certainly be modified to do that using ideas from arithmetic coding, but I don't know whether there's an intermediate-level way of doing this without introducing a large amount of information theory for background.

Anyway, in my protocol, if you get 4 0's and 3 1's, you add 5 bits. You add all possible sequences of 5 bits with equal probability, so each of these added bits has an equal probability of being a 0 or a 1. Similarly for any other numbers of 0's and 1's.

Thanks- will look at your code and may email you l...

2018-07-28T16:48:18.230-05:00

Thanks- will look at your code and may email you later.

THANKS! We want to get truly random bits even if t...

2018-07-28T16:47:55.080-05:00

THANKS!
We want to get truly random bits even if the original source is very
biased (but independent). Does yours do that, or can it be modified
to do that? I'm thinking no and yes.

I don't know exactly what you're looking f...

2018-07-28T15:33:05.811-05:00

I don't know exactly what you're looking for, but here's one approach. Break your bitstream into blocks of (say) 7. Now, suppose you get 4 0's and 3 1's. There are binomial(7,4)=35 ways of doing this. Choose 3 of these ways to throw out; the other 32 give you 5 bits. Now, suppose you get 5 0's and 2 1's. There are binomial(7,2)=21 ways of doing this. Choose 5 of these to throw out and use the others to get 4 bits. If the probability of a head is roughly 1/2, you roughly get .517 bits/coin flip. This is only slightly better than your first method, but the reason is that you're using 35 possibilities to get 32 random bits, and throwing away the other 3 possibilities. You can use them.

I coded my algorithm in python. It is available as...

2018-07-26T17:10:29.958-05:00

I coded my algorithm in python. It is available as a Gist.
(Unfortunately I was unable to figure out how to include it as part of a comment.)

As an example, the following code runs the generator with 20 iid bits distributed Bernoulli(0.4), generated from seed 1:

>>> test(.4, 20, 1)
[1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0]

The output is the 12 bits shown above.

2018-07-26T16:07:59.821-05:00

This comment has been removed by the author.

Clarify please: I understand how you get the three...

2018-07-26T11:03:17.705-05:00

Clarify please: I understand how you get the three
streams
Output
Daughter1
Daughter2

but how do you put them together to get a uniform random
sequence?

This all sounds nice in theory, but how does it ap...

2018-07-26T05:14:47.780-05:00

This all sounds nice in theory, but how does it apply to real physical random sources, like the Dice-O-Matic? Maybe some of the papers looked complicated, because they just tackled the real physical situation head-on, instead of hiding behind convenient idealizations?

How would you post process the output of the Dice-O-Matic? How would you describe what you can achieve (assuming that perfectly unbiased output is not possible)? Can you prove that achieving perfectly unbiased output is not possible, no matter how you post process the output of the Dice-O-Matic?

There is a simple approach that is asymptotically ...

2018-07-25T11:17:39.679-05:00

There is a simple approach that is asymptotically optimal, though convergence is somewhat slow.

We will construct an object called bitstream, which accepts iid random bits. The bitstream will have two daughter bitstreams: daughter1 and daughter2.
For every pair of received inputs a,b, the bitstream acts as follows:

1. If a != b, output a, and send 1 to daughter2.
2. If a = b, send a to daughter1 and 0 to daughter2.

The resulting data structure might seem infinite, but it becomes finite once you create the daughter bitstreams only when they're first accessed.

Let us denote by f(q) the average number of bits output by a bitstream when the input consists of iid Ber(q). Then

2f(q) = 2q(1-q) + (1-2q(1-q))f(q^2/(q^2+(1-q)^2)) + f(2q(1-q)).

You can check that the entropy function h(q) satisfies the same recurrence (using the chain rule), and so f(q) = h(q). In other words, the asymptotic output rate of this method is optimal. Analyzing the speed of convergence would require more work.

The algorithm is quite easy to implement in python, so here are some empirical results:

1. When p = 1/2, feeding the bitstream 10^6 random bits resulted in 982635 output bits, or an average of 0.982635 output bits per input bit (compared to the entropy, which is 1).
2. When p = 0.4, the same experiment yielded an average of 0.955164 ob/ib, compared to h(p) ~ 0.971.
3. When p = 0.3, we get 0.865629, compared to 0.881.
4. When p = 0.2, we get 0.708546, compared to 0.722.
5. When p = 0.1, we get 0.461367, compared to 0.469.

In all of these results, we get upwards of 98% of the optimum.