Thursday, January 22, 2015

There Should be an Algorithm

My high school daughter Molly was reading her Kindle and said "You know how you can choose a word and the Kindle will give you a definition. There should be an algorithm that chooses the right definition to display depending on the context". She was reading a book that took place in the 60's that referred to a meter man. This was not, as the first definition of "meter" would indicate, a 39 inch tall male. A meter man is the person who records the electric or gas meter at your house. Today we would use a gender neutral term like "meter reader" if technology hadn't made them obsolete.

Molly hit upon a very difficult natural language processing challenge known as word-sense disambiguation with the most successful approaches using supervised machine learning techniques. If anyone from Amazon is reading this post, the Kindle dictionary would make an excellent application of word-sense disambiguation. You don't need perfection, anything better than choosing the first definition would be welcome. Small tweaks to the user interface where the reader can indicate the appropriate definition would give more labelled data to produce better algorithms.

And to Molly: Keep saying "There should be an algorithm". Someday there might be. Someday you might be the one to discover it.

7 comments:

  1. This may sound very naive and stupid, but I assume in this case from Amazon's end the book comes with additional data, such as the genre, country and time of the story. If this is indeed the case, can't the Kindle dictionary leverage this to tweak already the definition (assuming, once more, each definition comes with similar annotations)?

    ReplyDelete
  2. "if technology hadn't made them obsolete."

    Perhaps technology *should* have made them obsolete, but it has not. We still see meter readers here in NM taking a reading each month for our water supply. And gas and electric company also had them up until just a couple of years ago, although I have not seen one recently.

    ReplyDelete
  3. Amazon indeed has a feature in the new Kindle Voyage. It is called Word Wise .. do check it out! It is not perfect but a great start.

    ReplyDelete
  4. Already it's impossible to explain binary search to modern kids, since they've never searched for a word in a paper dictionary, and now you want to make it impossible to explain table look-up...

    ReplyDelete
  5. Even better. Tell her to file a patent for the idea, and then nobody will be able to use it for the next 20 years.

    ReplyDelete
  6. Might have her puzzle over how to treat the sentence: "Time flies like an arrow, fruit flies like a banana":

    https://en.wikipedia.org/wiki/Time_flies_like_an_arrow;_fruit_flies_like_a_banana

    ReplyDelete
  7. One of the main areas for development, in my mind, is the development of a "computer science friendly" dictionary. The dictionary model is stuck in the Gutenberg era. A dictionary should be like a computer program, with categories of words and phrases, pointers to definitions (in the C sense), and pattern matching. A definition of a word should not just be a sequence of words - it should be a parse tree with pointers to other entries in the dictionary.

    This probabilistic NLP approach that derives from 15th century technology needs to be updated. Start from scratch and build something elegant.

    Lance Q. Elevator
    Москва процветает!

    ReplyDelete