Friday, November 20, 2009

Citing Papers

A student asked me which version of a research paper to cite, a journal (the last reviewed version) or a conference (the first reviewed version) of a paper. I generally cite papers in this precedence list.
  1. The fully refereed journal version, even if it is "to appear".
  2. The reviewed, though not usually refereed, conference proceedings version, again even if it is "to appear".
  3. On an electronic archive, like arXiv or ECCC.
  4. As a departmental technical report.
  5. On a generic web page, like a personal page.
  6. As a "Manuscript", if I have seen the paper but it's not publicly available.
  7. As "Personal Communication" if the paper doesn't exist.
If the original paper is not in English I'll cite both the paper and an English translation if there is one.

The journal version can distort precedence. Paper A that depends on paper B can have a much earlier journal publication date. If precedence is a real issue, say when I am trying to give an historical overview, then I will cite both the journal and conference versions.

What if you use a theorem that appears in a page-limited conference paper but who's proof only appears in the longer tech report. Then I cite the conference paper for the theorem and the tech report for the proof. Even if a proof exists in a paper, I'll often cite another paper or book if it has a cleaner or simpler proof.

What if you cite a paper for a theorem for a proof that doesn't exist (the infamous "will appear in a later version of the paper")? If your paper critically needs that theorem, you really should give the proof for it yourself. At the very least later papers will cite your paper for the proof.

What if the conference or journal version is not on-line or behind a pay wall? I still cite the latest version figuring that if someone wants to read the paper they can use a service like Google Scholar to find an accessible version.

I try to use the same rules for links to papers on this blog because it's more important to give out the citation. If someone wants to download the paper again it's usually easy enough to find it.

In my .bib file (of bibtex entries), I replace the entries of papers as they get updated under the same citation-key. That way when I go back to latex older paper they get the latest references. We have too many people in our field who don't bother updating references, pointing to a tech report when the conference or journal version has been published.

Should you add hyperlinks in your bibliography to other papers? Nice if you do so and probably good if you are young and get into the habit now. But I haven't found the impetus to add links to papers in my now quite large .bib file.

In my ideal world, each research paper would have a web location which has human and machine readable descriptions of and pointers to all versions of that paper. We would just input that location into bibtex and it would automatically pull the information from the web and make the appropriate entry in your references. Then we would all cite correctly, or at least consistently.

13 comments:

  1. Shouldn't you cite the version you actually read?

    ReplyDelete
  2. On the same subject: How do you manage your .bib file? Are papers classified by subject, or just by key, or unclassified? How do you manage to find for example all the papers tou read on a particular subject? Do this issues matter for you?

    ReplyDelete
  3. I pack into one citation
    (1) Journal version since it is `official',
    (2) Conference version since it may be better for
    WHEN it was done, and
    (3) pointer to a free source if such exists since that way the reader can access it.

    ReplyDelete
  4. Same question as Gareth respect your point 1.: I reviewed a journal paper I would like to cite. It is accepted but not yet publisehd. I could cite the previous conference paper (for the result I need, it doesn't matter), and I think this is the correct think to do (I didn't oficially read the journal paper)

    ReplyDelete
  5. In response to Gareth and anonymous:

    I think you should cite the version that is most accurate for the citation and will be most useful to your readers.

    On a related note: you don't have to read an entire paper in order to cite it (though you may miss out on useful information!). For example, say you read a result w/ complete proof in a conference paper, but want to cite a later, more complete version of that paper. If the same result and proof appears in the later paper, I think you can safely cite that paper without having to read the whole thing...

    --Anon 2

    ReplyDelete
  6. Regrading the ideal world, well there is DOI:
    http://en.wikipedia.org/wiki/Digital_object_identifier

    ReplyDelete
  7. I get annoyed when people cite one of my old STOC papers rather than the journal version. The conference paper was full of gaps and major errors. There is a reason we worked so hard to make a journal version correct, but people don't even know there is a difference in the statement of the main theorem.

    ReplyDelete
  8. Is a "conference publication" a real publication? Not just an "announcement/unproved claim"?

    Lance's nice "CS illness description" in SIGACT explains what is happening -- CS doesn't need journals. Nobody else (no other field) plays this game. All others know what is to cite (the published paper, journal paper).

    ReplyDelete
  9. If you want to alert people that your main theorem has changed from conference to journal version, put a disclaimer on your webpage! I agree that journal version should be cited if available, but don't blame other people for not knowing that your conference version is incorrect. It's up to you to make that clear.

    ReplyDelete
  10. What sort of field do you people have that your conference papers can have gaps and major errors? Maybe this speaks to whether STOC is useful to cite at all. Do your promotion and tenure committees know that STOC is this way?

    ReplyDelete
  11. I'm embarrassed to ask: what's the difference between refereed and reviewed?

    ReplyDelete
  12. Refereed means that the paper is judged to be "good" enough to get into some conference; reviewed means that paper is judged to be "correct" by a reviewer after (hopefully) reading it carefully.

    ReplyDelete
  13. Regarding automatic population of metadata, have you tried a services like Papers, Mendeley or CiteULike, all of which can populate a record from the DOI or other identifiers, e.g. arXiv identifier. They are not perfect and require a little manual editing, but both can output bibtex files.

    ReplyDelete