Friday, June 11, 2004

Favorite Theorems: Connections

May Edition

I have always loved results that find connections between previously-thought different areas of complexity. This month we highlight one of the best.

Extractors and Pseudorandom Generators by Luca Trevisan

Informally a pseudorandom generator takes a small random seed and generates strings that can fool every probabilistic algorithm. To describe an extractor we start with some distribution D over strings of length n. Let p be the maximum probability of any string in D and let k = log(1/p). An extractor uses D and a small number of truly random bits to create a new uniform distribution of strings of length close to k.

Both pseudorandom generators and extractors have many uses in complexity and many papers in the field show various constructions to improve the parameters of both. Trevisan showed that one can view any pseudorandom generator as an extractor and then derives better extractors from known pseudorandom generator constructions.

Pseudorandom generators fool resource-bounded algorithms while extractors nearly uniform distributions in an information-theoretic sense. That makes this connection all the more amazing. Trevisan's paper has affected the how researchers think about and prove results in both areas.


  1. Out of curiousity, why do researchers insist on putting PostScript versions of their papers on the web? I would imagine that they would get a larger readership if they would adopt PDF. For those of us not working in TeX and PS-equipped environments, having to track down and install GhostScript to just read the paper seems foolish.


  2. PDF vs. PostScript files
    While portable document format (PDF) and the PostScript language are related, there are some fundamental differences and some nice similarities.

    * Both share the same imaging model, which is great for converting from one to the other.
    * PDF is not a programming language, it is a highly structured file format that carries page descriptions.
    * The PostScript language has always had the ability to address the needs of high-end prepress.
    * PDF was originally designed for screen-oriented business document portability. Prepress capability has been added since and is still evolving.
    * PostScript files must be processed linearly, from start to finish.
    * PDF pages can be viewed randomly.
    * PostScript files can be executed (RIPed) on different platforms.
    * PDF pages can be viewed on different platforms.
    * PDF files are not always smaller in size than their original PostScript counterparts. In fact, sometimes they can be much larger.
    * PostScript files contain document structuring conventions which provide valuable information about the structure and content of the print file.
    * PDF files contain no document structuring conventions. Thus, for example, you are not able to tell where one internal EPS ends and another begins.

  3. Trevisan (and many other computer scientists) put some effort into making their research papers freely available. Don't get greedy.

  4. I work at a university library, so I'm well aware of the effort that people go through to get their papers out freely. However, I'm just remarking that it seems like generating a PDF version is a relatively simple thing for an author to do, and it reduces sizable hurdle for the reader. If I were doing PhD research and depended upon this paper, I'd have no quibbles installing a PS reader. However, as a casual reader who may decide whether to read based upon how much they have to do just to get at the content, may simply skip this one because of the time required to track down a PS viewer to open the thing.

    I'm grateful that researchers put these things out freely, but after going through the trouble of clearing the papers, uploading the papers, checking the permissions and so forth, is the little extra effort to make the paper accessible to everyone using a modern stock computer all that unreasonable?

    FWIW, I also got really annoyed at my undergraduate CS profs who would post reading lists and assignments as PS when text, HTML, or PDF would have worked just as well. I also get annoyed with those who do the same thing with .DOC files, so I'm not just picking on the PostScript crowd.

  5. One last PS gripe to my post above... (Forgot to identify myself in it.)

    Significant work has been done with respect to screen readibility and PDF. I went ahead and used a converter to take the paper above and turn it into a readable PDF. If you're viewing this on the screen (using MacOS X's built-in PS->PDF converter), the on-screen readibility is atrocious when viewing the document at an otherwise readable 3/4 of a page visible. You have to zoom in significantly for something that can be read without being bothered by the fuzziness caused by the aliasing. The text is also not selectable for cut and paste operations. It is also unsearchable.

    Now, the point to all this griping is that while I appreciate the prepress quality inherent in PostScript, why is paper publishing be optimized for dead-tree distribution in a field such as computer science? I understand the historical inertia argument, but doesn't electronic media offer enough advantages in terms of storage, transmission, and use to justify optimizing these processes for electronic distribution first, and dead-tree distribution last?

    Again, apologies if this is seen as senseless griping about a paper. I'm happy that the content is freely available online, but I just get annoyed when I see things released by professionals (computer science experts in this case) who either should know better and don't, or do know better and choose not to. This is probably a silly thing to get annoyed about, but I'm probably not alone in my irkedness.


  6. I recommend you save it to your desktop, open up Cygwin, and do the ps2pdf command there.

    Also, to avoidd blocky fonts, I recommend everyone use this rule to turn the dvi into a pdf:

    dvips -P cmz -t letter -q -f < MYDVI.dvi | gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=MYPDF.pdf -c save pop -

    Naturally, in a make file the rule would be:

    .SUFFIXES: .tex .dvi .pdf

    DVIPS=dvips -P cmz -t letter -q -f
    GS=gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite

    ${DVIPS} ${<} | ${GS} -sOutputFile=${@} -c save pop -

    (I hope the blog renders the newlines correctly)

  7. I think one reason many researchers use ps files is that they are used to it (perhaps having started doing this before pdf). Also, essentially every serious CS researcher works in an environment with TeX and postscript readily available; I'd be surprised if even 1% don't. The net effect is that postscript files are easily accessible for 99+% of the desired audience for the papers. I don't intend this as a snide comment, just an explanation. (I personally think more people should put in effort to reach out beyond the research community, but most people don't.)

  8. I use You upload a .ps file and you can download the converted PDF. Great if you're at a machine without a postscript viewer.

  9. I must admit that I envy anyone for whom the greatest barrier to understanding "cutting-edge" TCS is the file format of the publications.

    "Do not worry about your own difficulties in opening files. I assure you that mine are far greater."