Monday, March 07, 2011

Three Questions that I think Watson would have trouble with

Here are two questions that were on Jeopardy (the shows slogan: Watch "Jeopardy!", Alex Trebek's fun TV quiz game show!) that I do not think Watson would have gotten right. I have added a third that I also think Watson would not have gotten right. What do you think?

QUESTION ONE: The FINAL JEOPARDY category was Computer Science. Here is the question. I mean the answer.
John Tukey coined this compound word in 1959 saying it was as important as "Tubes, transistors, wires, tapes ..."
Person A wrote WHAT IS A MOTHERBOARD. Wrong. Person B wrote down WHAT IS. Wrong. (This might have worked if the question was in philosophy.) Peron C wrote down WHAT IS WI-FI. Wrong. I got it right from reasoning not memory. I do not have the quotes of John Tukey memorized. Does Watson? I doubt it. I think he would have gotten it wrong. (The answer can be found HERE.)

QUESTION TWO: The FINAL JEOPARDY category was 1930's Films. Here is the... answer
In this classic film, one of the characters tries to quote the Pythagorean theorem, but gets it wrong.
Two of the contestants wrote Gone with the Wind. The third one wrote the correct answer which I will not reveal here in case you want to try it. (The contestant who got it right won the game.) I doubt Watson would know it--- too much to correlate. (The answer can be found HERE.)

QUESTION THREE: This was not on Jeopardy. I am asking it in the form of a question: What is unusual about the Jeopardy Slogan? (The answer can be found HERE.)


  1. The answer provided for question three is clever ... but it is not natural ... in the sense that the cleverness of the answer resides in English-language conventions that are essentially arbitrary.

    At least one alternative answer exists that *is* natural .. tomorrow I will post the specific example I have in mind ... perhaps there is more than one such answer.

    It is by no means clear that Watson has anything approaching the capability to recognize "naturality" ... it is interesting to speculate about how this capability might be provided.

  2. The triangle one, I knew it only from the Simpsons. Nice to learn (as has happened many times before) what they are referencing!

  3. If only Kurt Godel was still around... he would give Watson a rough time :D

  4. You spell Jeopardy wrong on the page that has "Jeapardy"

  5. Last Anonymous- I fixed it.

    Also- person who emailed me about a mistake that if I allowed the comment would give away the answer- I fixed it also. (It was on the
    John Tukey Question.)

  6. The Google ngram incidence for "Tukey's word" is mighty striking ... that was a great question!

    As for Question 3, the Oxford English Dictionary (OED) supplies the following definition:

    quiz: "b. A set of questions provided as an entertainment; spec. a series of questions asked of competing individuals or teams, and often divided into rounds."

    From a strictly logical point-of-view, what's odd about the Jeopardy Slogan is that it contradicts the OED definition, in the sense that Jeopardy is an "anti-quiz game show" in which contestants provide questions rather than answers (which was Jeopardy's "gimmick" when it first appeared).

    This strictly logical answer is of course entirely devoid of humor ... but heck, Lance's question didn't specify *amusing* answers.

    Oddly enough, this was a case where Watson's search algorithms might have yielded the humorous answer that Lance had in mind more easily than the logically natural answer.

    It's not easy to think of questions whose answers are both mathematically natural *and* funny ... one famous example that comes to mind is the Monty Hall Problem ... are there others?

  7. I bet Watson gets the first two. It has a large database of published material, and I assume it has a snapshot of the World Wide Web.

    In both cases, simple web searches turn up the trivia facts in question. For the first one, "John Tukey coin" turns it up. for the second one, "pythagorean theorem film" turns up a page of of moves using the Pythagorean theorem, and only one of them is from the 30s.

    It's worth thinking about where Jeopardy questions come from. It's not like the question writers live in a monastery on a hill and send out their questions via puffs of smoke. They are engaged in humanity's conversations, and a lot of that conversation happens via the web. Watson has access to the question writers' source book.

  8. A snapshot of the World Wide Web- that sounds like its too much memory for Watson.

    The Tukey Question-- if its NOT in his memory then could he derive it?
    Its not in my memory, but I derived it.

    How well can Watson REASON?

  9. Isn't there an implied question with every round of Jeopardy?

    Given the answer "x", what is the correct question to ask?

    A question about a question, so to speak.

  10. "How well can Watson REASON?" is a curiously tough question to answer.

    People reason with ideas. What is the Watson equivalence of an idea?

  11. Got the first two. Can report that in 1979 I told the prime minister of Jamaica that Cuba was beating them in the production of software, and the look I got back was basically, "what's that?".

    For the third I took the slogan to be "This is Jeopardy!", unusual because it's an illegal answer form for itself. Ah!---I should have posted this in Dick Lipton's diagonal phrases item.

  12. I was guessing "binary digit", which is also a Tukey-ism, but even older.

    If you check out the Wikipedia entry for Tukey, it says "The term "software", which Paul Niquette claims he coined in 1953[citation needed], was first used in print by Tukey in a 1958 article in American Mathematical Monthly, and thus some attribute the term to him". (The Jeopardy! question was also 1958, by the way, at least judging from the Jeopardy! sites out there, which now make it look overwhelmingly as if Tukey defined the term "software").

    John Prager of IBM just gave a talk at Columbia explaining much of the details of how Watson worked.

    It does not have the entire web in memory. Its data's mostly derived from encyclopedias (including Wikipedia), gazzeteers and dictionaries (including WordNet); John said they also expanded these sources very conservatively with a bit of web crawling. There's a component that measures the reliability of the source from which the answer was gleaned.