Thursday, May 24, 2018

Kolmogorov Complexity and Causation

I got an interesting email question.
Suppose I give you a set of points S of the form (x,y). He suggested ideally they would be pairs of a real numbers. Supposing there is a causal relationship between x and y of some kind, we want to know know if it is more likely that the x value causes the y value or the y value causes the x value. One plausible way to decide what the answer should be is by answering the question is the length of the shortest program which maps the x values to their y values shorter than the length of the shortest program which maps the y values to their x values.
So, my intuition says that this is clearly undecidable. I'm actually having a hard time thinking of a proof, so do you happen to know of one or if this problem might actually be decidable?
On a related note, since I'm already writing you about this question, do you happen to know about the complexity of any related questions which involve circuit size instead of program length? 
Let's use notation from Kolmogorov complexity, letting C(x|y) be the size of the smallest program that takes y as input and outputs x. Now suppose it is decidable to determine whether C(x|y) > C(y|x). Then find an x of length n such that for all y of length n/3, C(x|y)>C(y|x). Such x exist: For any random x, C(x|y)>= 2n/3 and C(y|x) <= n/3.

Now I claim C(x)>=n/4 for the x you found. If not we have C(x|y)<=C(x)<n/4 but for some y, C(y|x)>=n/3 since there aren't enough shorter programs to cover all the y's.

Since there is no computable procedure to find x such that C(x)>=n/4, there can't be decidable procedure to determine whether C(x|y) > C(y|x).

But does this question relate to causality. Pick a random x from strings of length n and y at random from strings of length n/3. We have C(x|y) >  C(y|x) even though there is no causality.

Instead you could look at the information of y in x, how many bit of x does y help describe, defined by I(y|x) = C(x)-C(x|y). This measure correlation since I(y|x)=0 iff x and y are independent but symmetry of information gives I(y|x)=I(x|y) so no hope for causation.

In short, Kolmogorov complexity won't give you much on causation--you can't avoid the controlled experiments.

For your last question, there is a notion of Kolmogorov complexity that roughly corresponds to circuit size, KT(x|y) defined as the sum of the program size and running time minimized over all programs p that take y as an input and output x. I'm guessing it's hard to determine if KT(x|y) < KT(y|x) and you could probably show it under some assumption like secure psuedorandom generators. Also symmetry of information isn't believed to hold for KT complexity so maybe there is something there. Interesting questions.



  2. The paper "Information-geometric approach to inferring causal direction" ( contains a nice formalization of this idea.

  3. I think that you've (slightly) misunderstood the question. In my interpretation our input is S={(x1,y1),..,(xt,yt)}, where all the xi's and yi's are distinct, and we want a (shortest) program P for which P(xi)=yi for every i, and a (shortest) program Q for which Q(yi)=xi for every i. The question is whether P or Q is shorter. Of course, your argument shows that this is undecidable already for t=1.

  4. Assumptions are key to causal inference. Under certain assumptions, one can decide the causal direction using Kolmogorov complexity. In the two variable case, instead of the Kolmogorov complexity of a random variable (exchangeably a string), Peters & Schölkopf, 2010 postulate that Kolmogorov complexity of distributions of those variables may have an answer. The paper titled "Causal inference using algorithmic Markov condition" by Peters & Schölkopf takes an interesting outlook on the problem of causal inference using Kolmogorov complexity.

  5. Kolmogorov complexity is slightly asymmetric (caused by the incomputability of Kolmogorov complexity). If there are several rounds of information exchange, this asymmetry can accumulate and become linear in the number of rounds.

    Imagine 2 devices run a computation. At regular time intervals each device sends a bit to the other. We are given a list of communicated pairs of bits: (x1, y1), (x2, y2), ... We are asked to determine whether xi is a reply to yi or vice versa, thus the bits are either communicated in the order x1 -> y1 -> x2 -> y2 -> ... or y1 -> x1 -> y2 -x2 -> ... In this case the sum of online Kolmogorov complexities corresponding to the machines can differ by almost a factor 2, while the complexities grow linearly. See:

  6. still it can happen that a noisy signal has much higher complexity wrt no-noise propotype than vice versa...