Today Yahoo is closing AltaVista, the best search engine before Google. The news caught me by surprise, AltaVista still existed? A number of commentators attribute bad management for AltaVista losing its dominance to Google. But it was an algorithm that killed the search engine.
AltaVista made its claim to fame in the mid-90's by indexing a large number of web pages. AltaVista did very well for obscure search terms like "fortnow" but didn't do so well for more common searches. I used to run a test on search engines by looking for "Holiday Inn", a popular hotel chain in the US. When you search AltaVista for Holiday Inn, the first thing listed was a Holiday Inn in Buffalo, New York. The Holiday Inn home page was nowhere to be found on the search results.
For searches like Holiday Inn, one had to use Yahoo, which back then was not a search engine but a directory tree of web sites. We needed our own directories as well. Ian Parberry maintained the TCS Virtual Rolodex, a list of home pages of theoretical computer scientists, most of which had names common enough that AltaVista wouldn't find them.
A Stanford professor (I can't remember which one) came to give a talk at the University of Chicago around 1997 and he mentioned a research project at Stanford developing a new search engine known as Google. I tested Google with my Holiday Inn test and was in shock when the Holiday Inn home page showed up as the first time. Google passed every other test I could throw at it and I've rarely used any other search engine since. Google made AltaVista, the Yahoo directory and the TCS rolodex irrelevant. Google's PageRank algorithm simply took search to a new level, like the way that Steve Jobs didn't create the first smart phone but completely changed the game with the iPhone. AltaVista managed to survive for another 15+ years but never recovered market share.
The AltaVista story leads to a lesson we still tackle today. Collecting and storing big data is a huge technical challenge but data by itself is of limited value without the algorithms to find the important parts among the muck.