Friday, November 12, 2004

Does Google Make Us Lazy?

The search wars are brewing. Microsoft starts a beta test of their new search engine. Yahoo keeps improving their searching as well and Google does not stand still with doubling the number of pages it indexes and their new desktop search (which is rather useless to me because it doesn't index LaTeX files). In addition, search engines now do more than just web searching, for example you can directly use dictionaries, maps, time, a calculator and more with the right shortcuts on Google, Yahoo and Microsoft. What does it mean though when we need a manual to use a search engine?

All this searching power leads to the mistaken belief that the best way to find anything is by a direct search. For example many people look for a paper by Googling on the name of the paper. Usually that does lead to a version of the paper to download. But Google searches the deep recesses of the internet and often returns old versions of papers, even technical reports well after conference and journal versions have appeared. Google says "Don't be evil"; I say "Don't be lazy." If you have a reference to a paper in a particular conference or journal, search for that conference or journal and find the paper there if you have access. Or use a site like DBLP. Otherwise look on the author's web pages; good authors will keep versions of their papers up to date. If all this fails you can fall back on Googling the name of the paper. Working off the newest version of the paper will save you far more than the extra fifteen seconds you'll spend searching for it.


  1. I agree with the "Don't be lazy" mindset. I have started to become somewhat peevish about authors who do a poor job finding relevant previous work, now that search engines are available. One problem is that people don't take the time to utilize the search engines correctly. You have to play a bit and use some trial and error. For example, when I wrote a survey article on power laws, to find relevant articles I couldn't just search on power laws. I had to use other terms, like "heavy tails" and even "lognormal distributions". Sometimes people in related areas have different terminology than we use; you have to try to find these terms, either by finding them in relevant papers and continuing to search backwards from there, or even by guessing what someone else might have called your idea in the past.

    This all may sound time-consuming, but it doesn't have to be. I find that a couple of hours can go a long way. At the very least, when I put in this effort, I feel more secure that I have not missed any relevant previous work, or that if I have I at least will have a good excuse.

  2. Michael's point about keywords is an excellent one. Knowing the "right" keywords is the most important part of a web-based lit search; without it you really don't get much out of it.

    Citeseer is a bit more useful in that regard: at least for forward referencing, but as many people have argued, it can't find what it doesn't index (like Google). I wonder how many people use ScienceCitation Index ?

  3. Internet search should be a lazy process. The problem is the copyright police. Researchers ouside the juristiction of US/Aus/EU should do us all a favor and post electronic copies of copyrighted documents.

    Only researchers at wealthy institiutions have a subscription to IEEE, ACM, MathScinet, and Elviser.

  4. I always start by searching for the authors in google. Presumably, if the paper is not on one of the authors' home page then I should try to locate it on a journal or conference, but I almost never do that.

    I think that if the search fails for a paper
    written in the last decade, the person who is lazy is not (just) the searcher but the author. Part of our job is not just to write papers, but also to communicate the results and to make them available.

    Today, making the results available means putting the results on your homepage (and preferrably also on some archive). Journals are great for their refereeing
    service but they are very inconvenient as a source to actually read papers (not all of them have easy & cheap electronic access, and even if your institution has such access, Murphy's law says that you'll need the paper when you're at home).


  5. Does anyone have ideas for finding related TRs? These are worst, because they can be the closest to your research, but they are the hardest to find.