Thursday, December 04, 2025

Finding Papers Before the Web

Inspired by Daniel Litt's X Post

and Bill's recent post on finding papers on the web I would tell the story of the before times.

In the 1980s if you wanted to read a paper, you either had to find it in a journal or conference proceedings or have it mailed to you. You could reach out to an author or SIGACT News would publish a list of tech reports from various universities. Departments would keep a master copy of each paper. You would send a stamped self-addressed envelope to the department which would copy the paper, put on a tech-report cover and send it back to you.

If you had a particularly exciting result, you would share it by physically mailing it out to your colleagues. I found out about the latest circuit results from Håstad and Razborov, as they sent papers to my advisor Michael Sipser, often hand-written and in Razborov's case in Russian. Neil Immerman sent a copy of his nondeterministic space closed under complement paper to Sipser but he was away for the summer. I found out about the result from a Berkeley talk announcement

Email wasn't a common method of communication until the mid-80's and it wasn't until a few years after that that people figured out how to send papers by putting the latex or postscript text directly in the email. This was before attachments and PDFs. Old mail systems put a ">" before From so it wouldn't be confused as a header and LaTeX rendered ">From" as "¿From" which you'd often see in conference papers from around that time.

In my first year as an assistant professor in 1989-90, there was a flurry of emailed papers marking (and causing) the quick progress we had in interactive proofs, described so well by László Babai's E-mail and the Unexpected Power of Interaction. Babai had a warning about researchers disadvantaged because they weren't receiving these emails.

I got tired of emailing papers so as soon as the web became a thing in 1993, I put all my papers online and have maintained it since. Now with sites like arXiv and ECCC, everyone has access to the latest and greatest in complexity.

Now how long before the next generation asks how we discovered papers before we had chatbots to find them for us?

6 comments:

  1. The next generation will ask how we discovered proofs before AI...

    ReplyDelete
    Replies
    1. Why would the chatbot generation even give a damn about proofs? This has to be true; chatty said so and provided convincing (hallucinated) evidence as well!

      Delete
  2. The next stage is already here: there is so much information that it is hard to process and find relevant and important papers.

    Services like Gemini DeepResearch are extremely useful for building the initial map for your exploration.

    Will they find the nuanced niddle in haystack, that doesn't seem to be the case. LLMs struggle with edge cases and outliers, and new exciting research is not well represented in their training dataset until the results becomes well known results with many references in the training dataset.

    Vector search using LLM embeddings won't help and will return too results in a semantic search. How to tell what is the gem among the garbage required the systems to be able to assess the papers, and drafts on archive. You can rely on side signals like reputation of the authors but then you are mostly back to following the publications of famous researchers, not much different from following the publication feed for those researchers.



    ReplyDelete
  3. I recall an ancient search system (when I was still in school) in libraries called "Dialog". Essentially you could create sets from simple text searches and then combine them using set operations. Once you narrowed it down to a couple you could read the abstracts, but still had to retrieve the physical paper or microfilm (there was such a thing) to read the rest.

    ReplyDelete
    Replies
    1. I have a microfiche and microfilm reader in my office, just in case.

      Delete
  4. To submit a paper to STOC or FOCS or other conferences you had to send by mail (often FEDEX on the due date) 10 (or more) copies of the paper to the chair. The chair would then MAIL copies of all the submissions to the people on the committee.

    More generally, just the mechanics of writing, submitting, correcting papers and making slides was really time consuming. Oddly enough we now spend a lot of time formatting and reformating for different venues, and polishing past the point of diminishing returns. Still, I would much rather write a paper in 2025 then in 1990.

    ReplyDelete