## Friday, September 18, 2009

### You Will All Work for Google

Google has acquired reCAPTCHA, Luis von Ahn's project to use humans to aid transcribing old documents. We consider Luis an honorary theorist and congrats for the successful commercialization of his research.

CAPTCHA helps distinguish humans from automated web robots to prevent, for example, making unlimited free email accounts for spamming purposes. CAPTCHA typically presents a distorted word that the user needs to enter. I hate that we need CAPTCHA but I understand its need and reluctantly make all of you solve a CAPTCHA when you leave a comment on this blog that dramatically helped decrease the amount of comment spam.

reCAPTCHA gives two words for the user to type in, one to test the user and the other from some corpus to help determine a word from a scanned image. When I purchase from Ticketmaster I had to deal with reCAPTCHA which helped understand words from old New York Times articles.

I'm less a fan of reCAPTCHA, forced to perform a task for someone else even if that task is admirable. Google will use reCAPTCHA to help in digitizing scanned images from their books project. Google makes revenue from this corpus and I don't see why I should be forced to perform tasks for them even if it takes only a little bit of my time.

I assume Google will replace all the CAPTCHAs on their properties with reCAPTCHA, including Blogger, the software that powers this weblog. So if you want to leave an anonymous comment on this blog without signing into Google, you will have to give Google some of your time.

1. "forced to perform a task for someone else", "you will have to give Google some of your time."

You'd have to spend the time anyway. reCaptcha just makes that time a little less wasted.

2. It's also a way to pay Google for providing the service.

3. Any idea how much Luis made from the deal????

4. "You'd have to spend the time anyway. reCaptcha just makes that time a little less wasted."

You would *not* have to spend the time anyway. The second word is not used for verification, so could be omitted. The scheme is also very inefficient, since the scanned word has to be verified by multiple people before Google can use it.

It is similar to idea of charging a penny for each post, to push down spam. That is hard to implement, so instead they waste huge amounts of time (cumulatively). Not a fan.

5. Captcha doesn't have to be a "word". But My understanding is reCaptcha has to be a pair of words since otherwise it is very easy to find out which word is the one to be verified and a program can attack this mechanism.

6. "The second word is not used for verification, so could be omitted."

I believe the key that is one of the words isn't used for verification, but you don't know which one it is, so you do have to enter both.

7. We consider Luis an honorary theorist

Why? He's a successful academic in CS, but nothing really makes him a theorist (except personal connections to theorists at CMU).

8. Why? He's a successful academic in CS, but nothing really makes him a theorist (except personal connections to theorists at CMU).

You mean nothing aside from a paper in STOC and a paper in Crypto early on as a grad student?

9. One could look at recaptcha, ESP etc from a social welfare point of view.

In case there is no fixed cost, and all costs are marginal, then this kind of work may not contribute to the social welfare. (But this kind of work do contribute the social welfare because there are huge fixed costs in te alternatives.)

For an example, the confidence of verification is the same as the confidence in the pre-knowing what the words were. So if one word is completely unkown, then the verification obtained is equivalent of one-word, but the common users have to solve two words. This takes away all the gains in the social welfare.

Similarly, in the ESP game, the game-boards, i.e., pictures are usually a property of somebody. So somebody's property is used for creating entertainment, which is paid as wages for the work done. In case, the costs are all marginal costs, ESP is not generating any additional social welfare, as the picture owner would claim that this is an additional commercial use of their pictures, and they would rather make the commercial benefit themselves.

Well to note again, in practice there are huge fixed costs, and therefore this kind of human computation may actaully add value to the social welfare.

10. >It's also a way to pay Google for providing the service.

11. I am afraid I did not understand Kamal's comments at all. Exactly which costs are "fixed" and which are "marginal", and for whom?

12. Basically, when you do a transaction there is certain fee which is independent of the transaction. Take an example of ESP game. Suppose I have an image. ESP game uses that image for creating entertainment, which is exchanged for the work done (image tagging).

The image is mine which generated the entertainment, which is eventually given to the players, who tagged the images. The tags would be used for commercial purpose, say for improving image search, for example creating advertising money. I do not have any share in that money, though the image belonged to me, which was used as a game board in an ESP game. So this cause disutility.

The problem is that this disutility is show tiny, that if I want to get money from the users in order to use my image for entertainment purpose, then most of the cost would be fixed cost (logistics of receiving money). Imagine that somebody uses your image, and used in a game of monopoly which sells for $20 a piece. Would not you want$5 royalty? Suppose the company instead of asking for $20, it instead ask for$20 worth of work done. Does it mean, your claim on $5 is gone? Well, these equations remain true for ESP game, except the scale of transactions is so small that it is not worth transacting.$20 is basically replaced by 20 micro cents.

The recaptcha is even easier to argue. If you are giving a captcha to another person, your confidence that the captcha is solved correctly is only as much as you already knew about the captcha to start with. So basically as Lance points out, recaptcha is not generating any extra value as such, but it is asking us to do the work. This work causes certain inconvenience, beyond the verification of we being human obtained by the captcha provider.

An alternate way of compensating this inconveneince is to actually pay us micro cents, but then most of it will be used as the transaction cost (fixed costs).

So if recaptcha provider says that solving those texts (beyond verification) adds value to the world more than our inconvenience, then a higher social welfare would be obtained by actually paying us money for it, provided there is no fixed transactional cost.

15. Captcha is worse than virus which can be isolated, and removed. You go on.. This is flagrant waste of time. If its to prove a human is behind the process, then it should be a learning tool (improve vocabulary, foreign language or math skills eg. E=MC2, Y=Mx+B). Whatever Luis was paid was too much ! Hope Google does something worthwhile with opportunity.

16. Google's methods of harnessing people to do a tiny bit of work for them is much better IMHO than Microsoft's approach.