Computational Complexity: Creating an Email System at Cornell

Thursday, June 23, 2011

Creating an Email System at Cornell

Email celebrates its fortieth anniversary so let me tell the story of my job for three summers, and part-time during the academic year, while an undergrad at Cornell University: Creating an email system from scratch.

In my sophomore year (1982) I took an computer structure course. I had a heavy set of final exams and papers so I did the final program for this course early and turned it in the last day of class to the instructor, Steve Worona. In that class you could scale assignments and tests from 0.75 to 1.5 to make them count more or less. When I turned it in, Worona asked me why, if I'm turning it in a week early, did I scale it at 0.75? "You never give me A+'s on the programs and I didn't want to lower my grade."

That was perhaps my most obnoxious moment but it got me noticed and Worona, who worked for computer services, offered me a programming job. We would create a new email system for Cornell. Cornell had an email system written in some scripting language, slow and clunky. We wouldn't use any fancy high-level language, we would code directly in IBM 370 assembly language. We would do it all ourselves, user interface, database for storing messages, interactions with SMTP servers, etc to maximize efficiency. No small task which is why it took me nearly three years.

IBM Assembly language was quite bloated with instructions. There was a command called "Edit and Mark" that went through a range data making modifications based on some other data. This was a single assembly language instruction. We used to joke that there was a single instruction to do your taxes.

Cornell at the time was a gateway between BITNET ("Because It's Time NETwork", connecting about 30 universities in US and Europe) and a fledgling ARPANET, the precursor to the Internet. BITNET worked with files, ARPANET one line at a time so there was a special file-based Batch SMTP to transmit email between the two. The fun I had working this all out.

As a test bed, my email system was used in only one building, Day Hall, which held the university administration: President, Provost etc. Great pressure to make sure there were no bugs.

One day a company that helps get people green cards sent an email to everyone on BITNET. My first piece of spam.

As a side project I helped write an ARPANET interface into CUINFO, an early electronic information system at Cornell. That was pretty simple, we just used the Telnet interface into a different port. This is basically what HTTP does now. I could have invented the Web!

In my senior year I told Steve Worona that I was planning to go to graduate school in theoretical computer science.

"You really want to spend your life shaving log n factors off algorithms?"

"Yes I do." (But I never did, since I went into computational complexity)

"Well the world just lost a great programmer."

As soon as I left Cornell my email system was scrapped for a commercial product. C'est la vie!

10 comments:

Unknown8:59 AM, June 23, 2011
"we would code directly in IBM 370 assembly language. We would do it all ourselves, user interface, database for storing messages, interactions with SMTP servers, etc to maximize efficiency."

I am 22 years old, a CS major and I feel terrified only at hearing this. I have programmed small programs in assembly... I cannot even imagine what would be like writing an e-mail program in it.

And after all this, they are saying that theoriticians are not savvy enough...
ReplyDelete
Replies
GASARCH10:39 AM, June 23, 2011
Several theorists have, at one time or another, done some real programming. (I am not one of them.)
Does it help them as theorists?

Does anyone code in Assembly anymore? How about binaries?
My guess would be YES to both but
VERY VERY FEW.
Why is that? are modern compilers so good that the code you get is fast enough?

A concept that terrifies me: self-modifying code. This was used in real assembly code programs to get some speedup.

LANCE- your code is gone but your theorems will last forever.
ReplyDelete
Replies
John Sidles11:14 AM, June 23, 2011
-------
"Several theorists have, at one time or another, done some real programming."
-------

Three examples are Turing, von Neumann, and Shannon (pretty considerable examples, to be sure).

Moreover, Turing was known as a dab hand with a soldering iron, von Neumann was gifted at wiring plugboards, and Shannon's childhood hero was Thomas Edison.

Unsurprisingly, all three wrote essays arguing that good mathematics can arise from empirical experience:

--------
"Some of the best inspirations of modern mathematics (I believe, the best ones) clearly originated in the natural sciences." (von Neumann)
--------
ReplyDelete
Replies
Sasho11:39 AM, June 23, 2011
Nice bit of personal histrory :)

Don Knuth of course has done some awesome programming (TeX!). And he even proved theorems about algorithms specified in an assembly language.
ReplyDelete
Replies
Anonymous1:22 PM, June 23, 2011
Doesn't e-mail pre-date the "Internet" and have its origins in the 1960s?
ReplyDelete
Replies
The Ubiquitous Anonymous1:28 PM, June 23, 2011
SNDMSG was not really e-mail and if you want to say that it was, there had already been e-mail before, it was just single-machine e-mail (even Ray Tomlinson notes that he was making improvements to the existing SNDMSG program when he writes about his work from 1971).
ReplyDelete
Replies
David Williamson2:57 PM, June 23, 2011
Lance, I'm impressed.

I did assembly as well, but only on lowly microprocessors.

Bill, COBOL also has self-modifying features: you can ALTER the destination of a GOTO statement. Sometime you can ask me how I know.
ReplyDelete
Replies
CSProf7:14 AM, June 24, 2011
Another pretty amazing hacking by Knuth: D. Knuth. Minimizing drum latency time. Journal of the ACM, 8:119–150 (the assembly language of the IBM 650 had instructions of the form

OP operand operand nextinstruction address

The memory was a rotating magnetic drum. Knuth wrote a program to optimize the location of the next instruction, so disk latency would have minimal effect.
ReplyDelete
Replies
Paul Beame3:52 PM, June 24, 2011
According to a history of e-mail by one of the Multics crew, networked e-mail started in 1971 but e-mail between users on the same machine started several years earlier.

370 assembler - matched in COBOL and PL/I - had an surprising array of data-types for numbers.
I recall that they included the char format EBCDIC (IBM's extended binary-coded decimal standard that predated ASCII which wasn't always the standard it is today), packed decimal, zoned decimal, etc.

COBOL had some other oddities (other than verbosity) and self-modifying code (which I don't recall seeing used) that made it painful indeed to debug. In addition to having PERFORM commands for a paragraph of statements, which were used like macros instead of subroutines and whose start was a line label and whose end was a period (used more when GOTOs were deprecated), COBOL had PERFORM THRU commands. A PERFORM THRU command was just as insidious as self-modifying code. It would start execution at some line label and keep executing until another specified line label was reached. This line label could be anywhere, even in the middle of a paragraph. At least with a GOTO statement you knew when you were leaving the current flow of execution. With a PERFORM THRU there is no indication in the
terminating line that control will move to an earlier point in the code. I have seen it used in 100,000 line COBOL programs.

Y2K as a threat might have been overblown BUT at least it allowed companies to wash away years of COBOL programs. I wonder how many people still have to maintain old COBOL programs.
ReplyDelete
Replies
A5:33 PM, June 24, 2011
RE: Does anyone code in assembly anymore? How about binaries?

The answer is "yes" to both of these questions, but for reasons other than speed. One can safely assume that most people out there cannot write assembly which will produce a binary that is faster than one produced by an optimizing compiler.

The real reason one would want to code in assembly (and machine code) would be to achieve access to the underlying hardware that a high-level language compiler would prohibit you from doing, or if you're trying to do something squirrelly (e.g., shell-coding).
ReplyDelete
Replies

Add comment