Comments on Computational Complexity: Shaving Logs with Unit Cost

m should be the number of bits for largest number(...

2016-11-27T03:51:17.119-06:00

m should be the number of bits for largest number(absolute).

Yes, if it is a scientific code and not just a the...

2009-05-20T19:38:06.207-05:00

Yes, if it is a scientific code and not just a theoretical result.

Many times you run up against one algorithm which is O(n log n) and one that is O(n) with constants over one hundred times larger.

A colleague of mine has stressed for a long time that Log(particles in the universe) might as well be treated as a constant for real world applications.

However, yes it always has bugged me that people d...

2009-05-15T11:56:00.000-05:00

However, yes it always has bugged me that people drop log factors.Does it bother you that they drop constant factors?

Building a machine with register size that is log(...

2009-05-14T21:22:00.000-05:00

Building a machine with register size that is log(number of particles in the universe) wouldn't be too hard. These register algorithms are very useful in practice [1] shaving an order of magnitude off of run times.

However, yes it always has bugged me that people drop log factors. I guess if you had them describe their algorithm on both an 8bit machine, and on a log(number of particles in the universe) machine there wouldn't be much confusion.

[1] http://blogofbrew.blogspot.com/2009/04/faster-k-mer-packing.html

In answer to the very first comment, O(nk) using r...

2009-05-14T10:19:00.000-05:00

In answer to the very first comment, O(nk) using radix sort, and this is optimal for the worst case.

As the author of the paper in question, let me cla...

2009-05-14T08:30:00.000-05:00

As the author of the paper in question, let me clarify that the
"offensive" assumption can be avoided in this case, as in case of Four Russians and many related algorithms, using preprocessing/table lookup. The set difference operation can be split into operations on O(log n) sized sets, and in a preprocessing phase, the answers to these "small operations" can be computed for all possible arguments (all of this can be done in time less than the final runtime) and stored in a table in a sparse format (lists in this case). Now set difference over big sets just involves table lookup and list concatenation, and it all works out. Being standard, this implementation was omitted from the paper.

There are assumptions about table lookup here, but without these, the previous dynamic programming algorithms for the problem would run in O(n^3 log n) time. When I say the result is a "simple adaptation of a known technique," I do not mean it is just about changing models, but that a similar technique was used by a different researcher in a related problem. I'd have to be a much bigger moron than I actually am to write this paper otherwise.

Love and peace,
Swarat

Dear Complexity Blog, For the record, I did not m...

2009-05-13T23:16:00.000-05:00

Dear Complexity Blog,

For the record, I did not make the above comment and I do not believe everything in it. Let me request that future Ryans put a last initial or something.

I would be remiss if I did not take the opportunity to also say something now about log factors. Please, shave them. (Within reason.)

Cheers,
ryan williams

I don't see it as cheating to point out that anoth...

2009-05-13T20:03:00.000-05:00

I don't see it as cheating to point out that another model is more appropriate and to find tighter time bounds in that model.What I meant was, it's "cheating" if you obtain a logarithmic acceleration simply by switching to a model in which the acceleration essentially comes for free. Although what I mean by "cheating" is more just that such a result is lame.

It is true that it's fine to switch to a more appropriate model. But what does "more appropriate" even mean? I am happy to recognize any interesting-looking theorem as a work of mathematics, whether the model is appropriate or not. However, real computers are much more complicated than any idealized architecture which is suitable for a theorem.

A real computer architecture is vaguely equivalent, up to log factors, to a RAM machine. It's impossible to avoid overhead factors from considerations such as the address bus, the addressing wires in each memory chip, the distance from the memory chips to the CPU, etc. For a while these overhead factors are sort-of logarithmic. But obviously a very large computer would be more similar to a 3D cellular automaton, which is not polylog equivalent to a RAM machine.

You could even argue on the basis of heat exchange that large computers can't really have 3-dimensional connectivity; rather the effective connectivity has to be closer to 2-dimensional. Certainly that is a problem at the length scale of a CPU.

Arnold Schonhage of integer multiplication fame ha...

2009-05-13T19:29:00.000-05:00

Arnold Schonhage of integer multiplication fame has an interesting argument that the log-cost RAM model is not the right model because there is a super-linear time lower bound for storing n bits in the model. See his July 1988 J.ACM paper: "A Nonlinear Lower Bound for Random-Access Machines under Logarithmic Cost".

Re Greg's "It can be cheating to switch to a diffe...

2009-05-13T19:09:00.000-05:00

Re Greg's "It can be cheating to switch to a different model, to then save a logarithmic factor with an apples-vs-oranges comparison." I don't see it as cheating to point out that another model is more appropriate and to find tighter time bounds in that model. If (as in the supposed example that started this post) the savings is just a matter of plugging in a trivial bit-parallel set intersection algorithm, then it's not a very interesting result, and the authors' hyperbole about breaking through barriers is unjustified, but it's still not cheating. And in other problems, saving the log can be much less a matter of plugging in known techniques to known algorithms from different models and much more a matter of novel algorithm design.

As for the people who still think Turing machines are a better model than RAMs: this is as baffling to me as the log-shavings must be to the complexity theorists. There are a million and one models of computation that are polynomially equivalent to Turing machines or RAMs or whatever. Why must you stick to a model that's so difficult to program, and that gives you the wrong exponent in your polynomials due to the model-specific work needed to move the head around? If mathematical cleanliness is what you're after, why not use a cellular automaton?

Ryan (#4) wrote: It's right along with the assumpt...

2009-05-13T18:58:00.000-05:00

Ryan (#4) wrote: It's right along with the assumptions that doing basic integer operations, addition, subtraction, multiplication, and division, are also constant. Of course they aren't really constant! The 'mathematical modeling' is assuming they these operations are constant, and as such it allows you to get away with things like this.I suspect the right way to think about this is that there is some bit-level parallelism in the physical hardware, so yes, the set-difference operation can really be constant for inputs of size O(word length). And it's reasonable to model the word length as more than a constant, by 11011110's arguments. As for whether it's just a trick of them model: in principle you don't automatically get an improvement by changing the model ... it could well be that you need a smart algorithm in order to exploit this parallelism. Of course, that might not be the case in this paper, about which I know nothing.

Anon #16 wrote: On the other hand, in practice lots of big data sets don't all fit into RAM at once. So the assumption that of course the computer will have plenty of RAM to store everything and will have a processor architecture that can address all that RAM with a single word is itself dubious.I think 11011110's point still holds. Regardless of what sort of memory you're storing the input in (registers, cache, RAM, local disk, disks across the Internet...), you still need to be able to address it in a conveniently small way, since all the memory that anyone uses these days is random access. Maybe you store the address in two words instead of one, but that's still O(1) word operations.

A time complexity estimate is only well-founded to...

2009-05-13T18:06:00.000-05:00

A time complexity estimate is only well-founded to the extent to which different computational models are equivalent to each other. Clearly, a wide variety of models, from a Turing machine with one semi-infinite tape at one end to other models that are much more efficient than that, are polynomial-time equivalent. This inspires the Church-Turing thesis, although ironically a quantum Turing machine is probably not equivalent and leads to the class BQP.

A smaller class of models are equivalent at the fast end, up to logarithmic time factors. The most common model at the fast end is a RAM machine, which I think is the same as a pointer machine. Another model which is so equivalent but much less widely cited is a tree-tape Turing machine. I think that it's a clever definition which is simpler than that of a RAM machine. And this is just for serial computation. For parallel computation things may get even more complicated, although I suspect that you can still obtain equivalence up to log factors.

Resolving the logarithmic factors is then model-specific. It is not really cheating to describe the model precisely, and then say that you can save a log factor in that model. It can be cheating to switch to a different model, to then save a logarithmic factor with an apples-vs-oranges comparison.

If I recall correctly all of these papers turned o...

2009-05-13T17:49:00.000-05:00

If I recall correctly all of these papers turned out to be pretty useless since the same runtime can be achieved very simply using scans.Some of the ideas from the EREW undirected connectivity algorithms were quite elegant and inspired Trifonov's O(log n loglog n)space deterministic connectivity algorithm.

Let's concentrate our limited resources on determi...

2009-05-13T17:40:00.000-05:00

Let's concentrate our limited resources on determining the proper degree of the polynomial, not the exponentially less important task of removing logarithms.
This depends on the problem. For instance, removing the log from the complexity of FFT will certainly be considered very interesting.

In a certain sense this whole thread is reflective of the tendency in TCS to overhype and oversell results. If the authors just claim that we made an improvement over existing results in this model for whatever it is worth, then there is no room for arguments (as long the proof is deemed correct).
The problem arises when one uses language such as "we broke this barrier" -- as if they broke the sound or light barrier or something. Language such as this should be used with utmost care (if at all) in serious scientific communication (especially of mathematical nature), but seems to be commonplace in TCS today. It is probably the effect of the "conference culture" -- a la Koblitz again.

More concretely, I would consider the interesting ...

2009-05-13T17:30:00.000-05:00

More concretely, I would consider the interesting barrier (for the problem discussed in the post) to be Oh-tilde(n^3), not O(n^3), so the barrier was not broken.

Logarithms often come and go when you switch machi...

2009-05-13T17:23:00.000-05:00

Logarithms often come and go when you switch machine models. This often leads to people wasting time removing logarithms when the problem would be completely different in a more realistic model.

The PRAM model assumes that memory can be accessed in constant time but no other operations can. In practice prefix scan operations can be done as efficiently as memory access and algorithms using them as primitives are often simpler and more efficient. For example see Guy Blelloch's 1989 paper "Scans as Primitive Parallel Operations" available at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=42122
.

For example, there was a long series of papers in the 80s in good conferences on finding connected components of undirected graphs in parallel. They gradually eliminated a log factor from the runtime and switched from CRCW (concurrent) PRAM to EREW (exclusive) PRAM. If I recall correctly all of these papers turned out to be pretty useless since the same runtime can be achieved very simply using scans.

I would argue that the problem was not the PRAM model per se, but our obsession with tightness. Striking a logarithm from runtime, or improving the constant in a constant factor approximation algorithm, is valuable but should not be considered a major improvement sufficient for a slot in a top conferences such as STOC/FOCS. I especially dislike replacing a simple and elegant algorithm with a much more complicated algorithm that improves a log factor.

Let's concentrate our limited resources on determining the proper degree of the polynomial, not the exponentially less important task of removing logarithms.

To remove any confusion, i am not Anon 16, my firs...

2009-05-13T17:18:00.000-05:00

To remove any confusion, i am not Anon 16, my first comment was 18.

My comments was not about theoretical consideratio...

2009-05-13T16:11:00.000-05:00

My comments was not about theoretical considerations. If you take the c++ standard library, the general purpose algorithm is introsort which is a hybrid between quicksort(best average performance) and heapsort (avoid n^2 worst case performance of quicksort). The standard c library implementations generally use quicksort (GNU implementation uses mergesort to avoid worst case scenario of quicksort).

Which algorithm are you talking about. If it is ge...

2009-05-13T15:50:00.000-05:00

Which algorithm are you talking about. If it is general purpose, then it should normally be comparison based.
Non-numerical/non-alphanumerical keys are the most common sorting operation. Sacrificing performance for those cases is the theoretical assumption of little relevance, not the other way around.

If one is a mathematical purist, one should only d...

2009-05-13T15:15:00.000-05:00

If one is a mathematical purist, one should only do theory using finite state machines, because all of our actual machines really do have finite state.This is nonsense. Mathematical purity has nothing to do with what exists or does not exist as machines. Any mathematical theorem is a statement about a consequence derived from a certain hypothesis. Which hypothesis you prefer is a matter of choice and has nothing much to do with mathematics. Mathematicians by training tend to prefer hypotheses which are simpler and elegant. In the context of complexity theory, a natural choice would be to consider computation over different structures -- finite fields, real numbers, complex numbers etc. -- and prove theorems in each case -- this is the crux of the Blum-Shub-Smale philosophy. What actually models real world computation is another matter of course -- but an extra-mathematical one.

Which algorithm are you talking about. If it is ge...

2009-05-13T15:09:00.000-05:00

Which algorithm are you talking about. If it is general purpose, then it should normally be comparison based.

I think what upsets me about this sort of thing is...

2009-05-13T14:21:00.000-05:00

I think what upsets me about this sort of thing is that I don't really believe the minor increase in realism of the model improves the real-world applications.
The fastest general puropse sort algorithm in use today is based on such a model, so you might have to revisit your assumptions about how realistic this model is.

Re "the use of the logarithm is ridiculous": it's ...

2009-05-13T13:41:00.000-05:00

Re "the use of the logarithm is ridiculous": it's justified by the fact that a machine that can't address all of its memory is ridiculous. So if you're going to do non-streaming algorithms, and assume that your problem fits into memory, you need a model with at least logarithmic word size.Thanks, I was forgetting about that, so it's at least better justified than I was thinking. On the other hand, in practice lots of big data sets don't all fit into RAM at once. So the assumption that of course the computer will have plenty of RAM to store everything and will have a processor architecture that can address all that RAM with a single word is itself dubious.

I think what upsets me about this sort of thing is that I don't really believe the minor increase in realism of the model improves the real-world applications. Instead, I think it's a function of the sort of researchers who do the work. Practically-minded researchers will come up with practical algorithms in just about any model. If the model has unrealistic aspects, they won't take advantage of them, because they always have in mind real computers (and the model is just a convenient formalization for describing their results, rather than their ultimate goal). By contrast, pure theorists can exploit just about any model to produce unrealistic results, since it's practically impossible to eliminate all loopholes.

Normally, this doesn't cause any problems. Some results are impractical but beautiful illustrations of fundamental principles, while others have immediate applications, and both types of papers are worthwhile. The problem occurs when someone mixes the two, for example by taking advantage of technical details in a model to claim a real-world improvement. That's often a sign of problems.

If you say a comparison takes O(log n) steps, shou...

2009-05-13T13:10:00.000-05:00

If you say a comparison takes O(log n) steps, shouldn't you also be honest about your input size, and say that the input size is O(n log n) instead of O(n)? Then the algorithm still runs in time O(m log m), where m is the input size.This is exactly why the word-RAM is such a good model.

"I remember complaining as a student that sorting ...

2009-05-13T13:01:00.000-05:00

"I remember complaining as a student that sorting actually takes O(n log2 n) time since one needs O(log n) time to do a comparison but my complaints fell on deaf ears. "

But if you say a comparison takes O(log n) steps, shouldn't you also be honest about your input size, and say that the input size is O(n log n) instead of O(n)? Then the algorithm still runs in time O(m log m), where m is the input size.