Tuesday, December 19, 2006

Show Us Your Research

Now that most of the FCRC Deadlines have passed, I would again suggest that you post your papers on a public archive like the Electronic Colloquium on Computational Complexity or the Computing Research Repository. The world wants to know about your research.

Which one should you choose? You don't have to, you can freely submit to both ECCC and CoRR. But how do they compare? [Disclosure: I am on the ECCC Scientific Board.]

  • ECCC focuses on computational complexity though often contains papers across theoretical computer science. CoRR broadly covers all of computer science (with tags for subfields) and is part of the arXiv project covering physics and math as well.
  • An article has to be approved by an ECCC board member to meet a minimum standard before it can appear. CoRR only checks for relatedness to the topic area.
  • Both plan to have papers posted forever. ArXiv is currently run by the Cornell Library that gives stronger backing to this promise. However every paper on the ECCC and CoRR should later appear in a conference proceedings and/or journal.
  • ECCC takes postscript submissions. CoRR prefers LaTeX submissions and processes them with hypertex.
  • Both systems allow revisions and all versions remain available.
  • ECCC has a (not-so-user-friendly) discussion system and email announcements of new papers. CoRR has RSS feeds for each subject class. Both systems plan to continually update their interfaces and features.


  1. An article has to be approved by an ECCC board member to meet a minimum standard before it can appear. CoRR only checks for relatedness to the topic area.

    And here is where I would decide to only submit to ECCC. The main failing of arXiv is that there is ZERO check against the run-of-the-mill crackpot. I can see where arXiv serves a purpose: to have a place where papers can be made publicly available. But then, isn't that what our personal webpages can be used for? If I'm browsing for the latest results in an area, I'd like to have at least a little bit of a guarantee that I won't be wasting my time.

  2. I agree with the previous Anon.
    arXiv is great, but somewhat useless (at least for Computational Complexity); since every few weeks new exciting proofs that P=NP and P\neq NP appears in it. These papers seem to be written without any care and in an haphazard manner, so the most basic filtering could spot them quickly. On the other hand, most complexity people do not put their work on arXiv (or am I wrong?). So ECCC is much better.

  3. I'm not sure I see the point of supporting two repositories. It doubles the work for author and reader. Why not just come out in favor of one or the other? (And I agree with previous posters that the filtering used for ECCC makes it the prefered choice. Also it seems the de facto choice for complexity theory other than quantum stuff.)

  4. The main failing of arXiv is that there is ZERO check against the run-of-the-mill crackpot. I can see where arXiv serves a purpose: to have a place where papers can be made publicly available. But then, isn't that what our personal webpages can be used for?

    Well one other purpose that arXiv serves is that it is an "announcement" of a paper. In fields where conferences aren't attended by a large portion of the community (like CS theory), the arXiv serves as an announcment of the paper. It also has the further advantage that you can post your result when it is ready, as opposed to this strange guarding of results until a conference that seems to occur in CS (which to me feels very unscientific, but understandable, of course, given the publish or perish world of academia.)

    Finally the convenience of the arXiv, at least in quantum computing, is incredible. The noise factor on the quant-ph arXiv is incredibly high, but it takes me exactly one minute every morning to scan through the listings and sort out the signal (if one exists!) This is balanced by the fact that 98 percent of quantum computing is on the arXiv. I'd also say that it isn't really the crackpots who are a problem on the arxiv (you can spot a crackpot in about a half a second) but the sheer volume of low quality submissions.

    It's interesting to me that the field that you would most expect to be open to online preprint archives, computer science, isn't so open to the idea. I think it is understandable, given the central role that conferences play in CS (as an announcement and noise filter.) If you're an expert in your area of CS, this is fine, but I suspect that it actually presents a significant barrier to entry into the field that has been effectively demolished in field that are almost entirely on the arXiv.

  5. arXiv may be good for areas like quantum information processing, where you can find many papers each day. It's easier to check the latest results, compared to going to many personal homepages. Plus, what if the result doesn't appear at his/her homepage, or he/she doesn't even have a homepage?

  6. Plus, what if the result doesn't appear at his/her homepage, or he/she doesn't even have a homepage?

    Then this is a failing of the author(s)/researcher. I don't understand why departments don't require faculty to publish copies on their department webpages--its good for the faculty and department.

  7. I prefer arXiv, mainly because in my opinion any filtering will also result in false positives. Also, you can cross-list to other repositories. The overhead of looking at crackpot papers is small.

  8. The backdrop of this discussion is that you shouldn't have to choose. It would advance research in computer science if ECCC contributed its papers to the arXiv. We have seen this situation in mathematics many times.

    For example, the "signal-to-noise" ratio in the number theory category of the arXiv, math.NT, was not all that great until the competing Algebraic Number Theory archive was folded into it. Within months of that reform, math.NT had more submissions, and more good submissions, than the total of the ANT archive and what math.NT had before.

  9. No one is questioning the usefulness of preprint archives; the only questions were with regard to the value of arXiv, and whether there is a need for two archives covering the same subject matter (or containing the same sets of papers).

  10. The situation in theoretical CS with ECCC is simply crazy. I believe in the following principles:

    1. All research papers should be archived permanently in the same collection. There are enormous economies of scale here (and creating a truly permanent archive is much harder than most people who've never tried think it would be).

    Putting papers on your own web page, or a departmental page, is nice but it does not permanently archive them. ECCC is better but still not good. For example, accepting postscript submissions is outright stupid. If you care about preserving papers forever in the most usable form, then you should collect all available information, including latex source. You never know when you will want to offer downloads in new formats, for example (either widespread new formats or niche ones like spoken text for blind users). Converting postscript may be possible, but it is almost never the best solution: too much information has been lost by the time the paper ends up in postscript. Any archive that encourages postscript submissions is run by people who are either incompetent or irresponsible.

    2. Filtering should be layered on top of that. If you want to create a nicely organized web site to publicize carefully filtered papers, that's great. It substantially improves the usefulness of any archive; most users care more about the filtering than about the permanence of the archive. However, there's no logical reason whatsoever why the people doing the filtering should also undertake a half-hearted attempt at doing the archiving themselves. You can seamlessly link to files at the arXiv while keeping whatever organization and interface you like. You could even copy files from the arXiv and just leave the permanent archiving to them while keeping your web site completely independent (if you care, although I can't see why you would).

    My impression is that most of these issues are complicated by interest in credit. There are a lot of small disciplinary archives whose creators take great (and often well-deserved) pride in serving their own research communities. Doing your own archiving sounds like a bigger contribution than simply filtering a larger archive, even though it's the intellectually trivial part. Most researchers won't know or really care whether you're doing a good job on the archiving side, so there's a strong incentive to keep doing it yourself rather than turning into an arXiv overlay. However, I think becoming an arXiv overlay is the only intellectually defensible route, given the current options.

    For now, we should put pressure on the ECCC board members to focus on what they are good at (filtering the papers and publicizing the good ones) and let the papers be archived by experts.

  11. 1. Many years of FOCS aren't available electronically through IEEE. This needs to be fixed.

    2. The community needs to write a letter to digg asking them to give us a TCS category (buried off the main page of course). The ability to recommend papers and blog about them is priceless.

  12. I'm biased, of course, since I'm the administrator of CoRR, but it seems
    to me that it's possible to have the best of both worlds quite easily if
    ECCC would automatically post all its papers on CoRR, tagging them as
    being approved by ECCC. You can keep the ECCC home page as is.
    The only difference that the papers themselves would be stored on CoRR.
    The advantages are (at least to me) clear: ECCC can focus on what it
    does best, namely filtering, and leaving archiving to CoRR. I believe
    that bigger is better in the archiving world --- as others have pointed
    out, there are economies of scale here. Larger archives are more likely
    to be preserved, and they have the resources to keep up faster with
    changes in technology. In addition, by having complexity papers on
    CoRR, they can be better linked to other papers in CS as well as papers
    in quantum complexity, which often appear in the physics section of the
    arxiv. At the same time, if you come in through the ECCC website, or
    just search for papers with the ECCC tag, you can have all the benefits
    of filtering.

    For what it's worth, while it's true that CoRR has papers claiming to
    prove P=NP, it's actually quite easy to ignore them. Crackpot papers
    are typically easy to spot. The hard part is to filter out the low
    quality but essentially correct papers. In this regard, organizations
    like ECCC could really help.

    -- Joe