Thursday, September 20, 2018

Why wasn't email built securely?

Recently I talked with Ehsan Hoque, one of the authors of the ACM Future of Computing Academy report that suggested "Peer reviewers should require that papers and proposals rigorously consider all reasonable broader impacts, both positive and negative." which I had satirized last May.

Ehsan said that "if email had sender authentication built in from the beginning then we wouldn't have the phishing problems we have today". Leaving aside whether this statement is fully true, why didn't we put sender authentication and encryption in the first email systems?

Email goes back to the 60's but I did get involved on the early side when I wrote an email system for Cornell in the early 80's. So let me take a crack at answering that question.

Of course there are the technical reasons. RSA was invented just a few years earlier and there were no production systems and the digital signatures needed for authentication were just a theory back then. The amount of overhead needed for encryption in time and bandwidth would have stopped email in its tracks back then.

But it's not like we said we wish we could have added encryption to email if we had the resources. BITNET which Cornell used and the ARPANET gateway only connected with other universities, government agencies and maybe some industrial research labs. We generally trusted each other and didn't expect anyone to fake email for the purpose of getting passwords. It's not like these emails could have links to fake login pages. We had no web back then.

But we did all receive an email from a law firm offering green card help. My first spam message. We had a mild panic but little did we guess that spam would nearly take down email at the turn of the century. Nor would we have guessed the solution would come from machine learning which kills nearly all spam and much of the phishing emails today.

I don't disagree with the report that we shouldn't think about the negative broader impacts, but the true impacts negative and positive are nearly impossible to predict. Computer Science works best when we experiment with ideas, get things working and fix problems as they arise. We can't let the fear of the future prevent us from getting there.

Monday, September 17, 2018

What is a Physicist? A Mathematician? A Computer Scientist?

 Scott Aaronson recently won the Tomassoni-Chisesi Prize in Physics (yeah Scott!).
In his post (here) about it he makes a passing comment:

I'm of course not a physicist

I won't disagree (does that mean I agree? Darn Logic!) but it raises the question of how we identify ourselves. How to answer the question:

Is X a Y?

(We will also consider why we care, if we do.)

Some criteria below. Note that I may say thinks like `Dijkstra is obviously a computer scientist'
but this is cheating since my point is that it may be hard to tell these things (though I think he is).

1) If X in a Y-dept then X is a Y. While often true, there are some problems: MIT CS is housed in Mathematics, some people change fields. Readers- if you know someone who is in dept X but really does Y, leave a comment. (CORRECTION- I really don't know how MIT is structured. I do know that the Math Dept has several people who I think of as Computer Scientists: Bonnie Burger,  Michael Goemans, Tom Leighton, Peter Shor, Michael Sipser. There may be others as well. The point being that I would not say `Sipers is a mathematician because he is in the MIT Math Dept')

2) If X got their degree in Y then they are Y. Again, people can change fields. Also, some of the older people in our field got degrees in Physics or Math since there was no CS (I am thinking Dijkstra-Physics, Knuth-Math). Even more recently there are cases. Andrew Child's degree is in Physics, but he did quantum computing. Readers- if you know someone who got there degree in X but is now donig Y, leave a comment.

3) Look at X's motivation. If Donald Knuth does hard math but he does it to better analyze algorithms, then he is a computer scientist. One problem -- some people don't know their own motivations, or it can be hard to tell. And people can get distracted into another field.

4) What does X call himself? Of course people can be wrong. The cranks he email me their proofs that R(5) is 40 (its not) think the are mathematicians. They are not- or are they? see next point

5) What X is interested in, ind. of if they are good at it or even know any. Not quite right- if an 8 year old  Bill Gasarch is interested in the Ketchup problem that does not make him a mathematician.

6) What X is working on right now. Fine but might change. And some work is hard to classify.

7) If you win an award in X, then you are an X. Some exceptions

Scott is a computer scientist who won the Tomassoni-Chisesi Physics Prize

Ed Witten is a Physicist who won the Fields Medal (Math)

John Nash is a mathematician who won a Nobel prize in Economics.

I want to make a full circle- so if you know other X won a prize in Y then leave a comment and
we'll see what kind of graph we get. Bipartite with people on one side and fields on the other.

8) What they can teach? Helpful in terms of hiring when you want to fill teaching needs.

Does any of this matter? We use terms like `mathematician' `physicist' `computer scientist' as shorthand for what someone is working on, so its good to know we have it right.

Thursday, September 13, 2018

P = NP and Cancer

Often when the question comes to what happens if P = NP, one typically hears the response that it kills public-key cryptography. And it does. But that gives the impression that given the choice we would rather not have P = NP. Quite the opposite, P = NP would greatly benefit humanity from solving AI (by finding the smallest circuit consistent with the data) and curing cancer. I've said this before but never explained why.

Of course I don't have a mathematical proof that P = NP cures cancer. Nor would an efficient algorithm for SAT immediately give a cancer cure. But it could work as follows:
  1. We need an appropriately shaped protein that would inhibit the cancer cells for a specific individual without harming the healthy cells. P = NP would help find these shapes perhaps just the DNA of the person and the type of cancer.
  2. At this point we don't understand the process that takes a ACGT protein sequence and describes that shape that it forms. But it must be a simple process because it happens quickly. So we can use P = NP to find a small circuit that describes this process.
  3. Use P = NP to find the protein sequence that the circuit from #2 will output the shape from #1.
We'll need an truly efficient algorithm for NP problems for this to work. A n50 algorithm for SAT won't do the job. All this steps may happen whether or not P = NP but we'll need some new smart algorithmic ideas.

Please note this is just a thought exercise since I strongly believe that P ≠ NP. I do not want to give false hope to those with friends and loved ones with the disease. If you want to cure cancer your first step should not be "Prove P = NP". 

Tuesday, September 11, 2018

The Tenure system is broken but not in the way that you think (Anon Guest Post)

This is an ANON guest post. Even I don't know who it is! They emailed me asking if they
could post on this topic, I said I would need to see the post. I did and it was fine.

I have written many tenure/promotion letters before. But this summer, I was especially inundated with requests. Thinking about my past experiences with such letters, I started to question their value.

For those unfamiliar with the process, let me explain. When someone is applying for a research job, they typically need to have recommendation letters sent on their behalf. Once someone is hired in
a tenure-track position, they then need to get additional letters each time they are promoted (in the US, this will typically occur when someone is granted tenure and again when they are promoted to full

Now, I know from experience that recommendation letters are scrutinized very carefully, and often contain useful nuggets of information. I am not questioning the value of such letters (though
they may have other problems). I am focusing here only on tenure/promotion letters.

Let me fill in a bit more detail about the tenure/promotion process,since it was a mystery to me before I started an academic position. (I should note that everything I say here is based only on how things are
done at my institution; I expect it does not differ much at other US universities, but it may be different in other countries.) First, the department makes a decision as to whether to put forward someone's
case for promotion. If they do, then a package is prepared that includes, among other things, the external recommendation letters I am talking about. After reviewing the candidate's package, the department holds an official vote; if positive, then the package is reviewed and
voted on by higher levels of administration until it is approved by the president of the university.

The external letters appear very important, and they are certainly discussed when the department votes on the candidate's case. However, I am not aware of any cases (in computer science) where someone who was put forward for tenure was denied tenure. (In contrast, I am aware of a very small number cases where a department declined to put someone forward for tenure. In such cases, no letters are ever
requested.) Perhaps more frustrating, this seems to be the case even when there are negative letters. In fact, I have written what I consider to be "negative" letters in the past only to see the candidate still get tenure.(To be clear, by academic standards a negative letter does not mean saying anything bad, it just means not effusively praising the candidate.) This makes be believe that these letters are simply being used as "checkboxes" rather than real sources of information to take into account during the decision-making process. Essentially, once a department has decided to put someone forward for promotion, they have effectively also decided to vote in favor of their promotion.

Letters take a long time to write, especially tenure/promotion letters, and especially when you are not intimately familiar with someone's work (even if they are in the same field). But if they are
basically ignored, maybe we can all save ourselves some time and just write boilerplate letters (in favor of tenure) instead?

Thursday, September 06, 2018

Are Conferences Discriminatory?

Glencora Borradaile wrote a blog post in June about how conferences discriminate.
Let me spell it out. In order to really succeed in most areas of computer science, you need to publish conference papers and this, for the most part, means attendance at those conferences. But because of the institutional discrimination of border control laws and the individual discrimination that individuals face and the structural discrimination that others face, computer science discriminates based on nationality, gender identity, disability, and family status, just to name a few aspects of identity.
Suresh Venkatasubramanian follows up with a tweet storm (his words) echoing Glencora's points.
Ryan Williams had a twitter thread defending conferences.
Not much difference these day between blog posts, tweet storms and twitter threads and I recommend you read through them all.

Much as I think conferences should not serve as publication venues, they do and should play a major role in connecting people within the community. We should do our best to mitigate the real concerns of Glencora and Suresh, create an environment that everyone feels comfortable, have travel support and child care to make it easier and have meetings in different countries so those with visa issues can still attend at times. But we cannot eliminate the conference without eliminating the community. Personal interactions matter.

Monday, September 03, 2018

The Rule of Threes/Astrology

On Aug 16, 2018 Aretha Franklin died. A famous singer.

On Aug 18 2018 Kofi Anan died. A famous politician.

On Aug 25, 2018 John McCain died. A famous politician.

On Aug 26, 2018 Neil Simon died, a famous playwright.

For 12 famous people who died between Aug 5 and Aug 26 see here (be careful- there are a few more on the list who died in August but a different year).

One could group those 12 into four sets of three and claim the rule of threes that celebrities die in threes. There was an episode of  30 Rock   where two celebrities had died and Tracy Jordan (a celeb) tried to kill a third one so he would not be a victim of the rule of threes. (see the short video clip: here.)

How would one actually test the rule of threes? We would need to define the rule carefully. I have below a well defined rule, with parameters you can set, and from that you could do data collection (this could be a project for a student though you would surely prove there is no such rule).

  1. Decide on a definite time frame: T days. The deaths only count if they are within T days.
  2. Define celebrity. This may be the hardest part. I'll start with they must have a Wikipedia page of length W and they must have over H  hits on Google. This may be hard to discern for people with common names or alternative spellings. You might also look into Friends on Facebook and  Followers on Twitter. A big problem with all of this is that if you want to do a study of old data, before there was Google, Wikipedia, Facebook, and Twitter, you will need other criteria (ask your grandparents what it was like in those days).
  3. Decide whether or not to have a cutoff on age. You may decide that when Katherine Hepburn, Bob Hope, and Strom Thurmond died less than a month apart, at the ages of 96, 100, 100 this doesn't qualify. Hence you may say that the celebrities who die must be younger than Y  years.

I doubt anybody  will ever do the experiment--- those that believe its true (are there really such people?) have no interest in defining it carefully or testing it. And people who don't believe would not bother, partially because so few people believe it that its not worth debunking. But I wonder if a well thought out experiment might reveal something interesting. Also contrast the data to all deaths and see if there is a difference. For example, you might find that more celebs die in August then would be expected based on when all people die. Or that celebs live longer. Or shorter. Actually with enough p-hacking I am sure you could find something. But would you find something meaningful?

Astrology is in the same category- people who believe (there ARE such people!)  could do well defined experiments but have no interest in doing so. I doubt they would find anything of interest if they did. Here there are enough people who believe it in to be worth debunking, but would a well designed science experiment convince them that astrology does not have predictive powers? Has such been done?

I once DID do such an experiment to disprove a wild theory. In 2003 a cab driver once told me (1) there is no Gold in Fort Know, and Ian Fleming was trying to tell us this in the book Goldfinger,  (2)  Reagan was shot since he was going to tell,  (3) a small cohort of billionaires  runs the world. I challenged him-- if that is the case then how come in 1992 Bill Clinton beat George Bush, who was surely the billionaires  pick. He responded that Bill Clinton was a Rhodes Scholar and hence he is in-the-club. I challenged him- OKAY, predict who will get the Democratic Nomination in 2004. This was a well defined experiment (though only one data point) He would give me a prediction and I could test it. He smiled and said Wesley Clark was a Rhode Scholar. Oh well.

Thursday, August 30, 2018

What is Data Science?

The Simons Institute at Berkeley has two semester long programs this fall, Lower Bounds on Computational Complexity and Foundations of Data Science. The beginning of each program features a "boot camp" to get people up to speed in the field, complexity last week and data science this week. Check out the links for great videos on the current state of the art.

Data Science is one of those terms you see everywhere but not well understood. Is the the same as machine learning? Data analytics? Those pieces only play a part of the field.

Emmanuel Candès, a Stanford statistician, gave a great description during his keynote talk at the recent STOC theoryfest. I'll try to paraphrase.

The basic scientific method works as follows: You make an hypothesis consistent with the world as you know it. Design an experiment that would distinguish your hypothesis from the current models that we have. Run the experiment and accept, reject or refine your hypothesis as appropriate. Repeat.
The Higgs Boson followed this model as a recent example.

Technological Advances have given us a different paradigm.
  1. Our ability to generate data has greatly increased whether it be from sensors, DNA, telescopes, computer simulations, social media and oh so many other sources.
  2. Our ability to store, communicate and compress this data saves us from having to throw most of it away.
  3. Our ability to analyze data through machine learning, streaming and other analysis tools has greatly increased with new algorithms, faster computers and specialized hardware.
All this data does not lend itself well to manually creating hypotheses to test. So we use the automated analysis tools, like ML, to create models of the data and use other data for testing those hypotheses. Data science is this process writ large.

We are in the very early stages of data science and face many challenges. Candès talked about one challenge: how to prevent false claims that arise from the data not unrelated to the current reproducibility crisis in science.

We have other scientific issues. How can we vouch for the data itself and what about errors in the data? Many of the tools remain adhoc, how can we get theoretical guarantees? Not to mention the various ethical, legal, security, privacy and fairness issues that vary in different disciplines and nations.

We sit at a time of exciting change in the very nature of research itself, but how can we get it right when we still don't know all the ways we get it wrong. 

Monday, August 27, 2018

Is Trivium (the Stream Cipher) used?

This Fall I am teaching the senior course in Crypto at UMCP. Its a nice change of pace for me since REAL people REALLY use this stuff! Contrast to last Spring when I taught

                   Ramsey Theory and its `Applications'

There is one topic in the Crypto course that LOOKS really useful but I can't tell if it IS being used, so I inquire of my readers. (I will probably come across others topics like that in the future.)

A Secure Stream Cipher is (informally) a way to, given a seed and optionally an Init Vector (IV), generate bits that look random. Alice and Bob communicate the seed either in person or over a private channel or perhaps by using RSA (or some other public key system) and they then both effectively have a very long string of random bits. They send the IV in the clear. They can then do one-time-pad (really a psuedo-one-time-pad). There are other uses for random-looking bits as well.

So what is needed is a Secure Stream Cipher.  Trivium seems to be one such. According to the Trivium wiki

It was submitted to the Profile II (hardware) of the eSTREAM compeition by its authors Christophe De Canniere and Bart Preneel, and has been selected as part of the portfolio for low area hardware ciphers (Profile 2) by the eSTREAM project. It is not patented.

According to these papers: here and here, and the Wikipedia entry, here the following are true:

1) Trivium takes an 80 bits seed and an 80 bit IV

2) The implementation is simple and is already in hardware. Around 3000 logic gates.

3) There are reasons to think its random-looking but no rigorous proof.

4) So far it has not been broken, though its not clear how many people have tried. Thats goes to my question-- how widely used it is it?

5) Trivium need 1152 steps in the init phase. If it only does 799 then The Cube Attack can break it in 2^68   which is better than the naive algorithm of trying every key and IV (2^160) but still not feasible.

6) Trivium is also An American Metal Band and a Medieval theory of education. Its a good name for a band. See my post What Rock Band Name Would you Choose? for fictional good names for bands with a math or theoretical cs connection.

OKAY, back to the main topic:

SO my questions:

Is Trivium used?

If so then by whom and for what (for the psuedo 1-time pad?) ?

If not then why not (e.g., some of of my points above are incorrect)? and should it be instead
of what is being used?