Thursday, April 19, 2018

Memory is Hot

A good number of the faculty candidates interviewing at Georgia Tech have a common theme: Memory. Memory connected to databases, to programming languages, to architecture, to operating systems, to networks and in security. Why all the interest in memory?

I started asking the candidates. The short answer: We no longer get faster performance from the CPU but new memory technologies can make a large difference.

Intel and Micron developed a new memory technology they call 3D XPoint ("3D Cross-Point") as diagrammed above, with the memory, in yellow, addressable by choosing the adjacent horizontal and vertical bars. 3D XPoint gives fully bit addressable high-density high-speed non-volatile memory without transistors or electrons. Non-volatile means the memory does not disappear when the power goes off, like a "flash" drive. Intel has announced a 3D XPoint main memory card under the Optane brand available this fall. One could use this memory as a full replacement or in conjunction with traditional DRAM in a heterogeneous system.

What's the big deal? The non-volatility means we can reduce power needed for memory, power being perhaps the biggest bottleneck in computing today. We can have large-scale databases in memory, fast performance with quick crash recovery since the memory isn't lost. 3D XPoint can enable edge or fog computing that brings the power of the cloud closer to the user for applications like virtual reality or self-driving cars where the time to reach a data center can cause unacceptable lag. Like most transformative technologies it will bring opportunities and challenges we can't even imagine now.

As theorists we need to take a leading role. How can we model 3D XPoint-like memories so we can properly develop algorithms and analyze complexity to understand what these memories can or cannot enable? Theoretical Computer Science can play a large role in adapting to new technologies if it is willing to get into the game.

Monday, April 16, 2018

Is DTIME(n) closed under concat? star? of course not but...

(STOC 2018 will offer child care for the first time. I was emailed the following and asked to
pass it on:

We are pleased to announce that we will provide pooled, subsidized
child care at STOC 2018. The cost will be $40 per day per child for
regular conference attendees, and $20 per day per child for students.
For more detailed information, including how to register for STOC 2018
childcare, see http://acm-stoc.org/stoc2018/childcare.html

Ilias Diakonikolas and David Kempe (local arrangements chairs)

I tell my class that poly time is nice mathematically since its closed under lots of operations including
concat and *.  That is:

L1 , L2  ∈ P  implies L1 L2 ∈ P.

unlike DTIME(n) which, as you can see, is NOT closed under concat! After all, the proof that P is closed under concat uses that if p(n) is a poly then np(n) is a poly which does not hold for linear time. If p(n) is O(n) then np(n) is NOT necc O(n).

But--- thats not a proof that DTIME(n) is not closed under concat! Thats just the observation that the argument for P being closed under concat does not extend to DTIME(n). Perhaps some other argument does.

I do not believe that. I believe there exists L1 , L2  ∈ DTIME(n) such that  L1 L2 is not in DTIME(n).

I have not been able to prove this. In fact, the question I pose is not well defined since I need to specify a machine model.

I pose the following question which may well be known - if so then please leave a comment:

Find a reasonable machine model (RAM? k-tape TM?) such that DTIME(n) on that model is NOT closed under concat.  (Prob use DTIME(O(n))).

Similar for *

These are likely hard questions since if L is in DTIME(n) then L* is in NTIME(n), (similar for concat),
so I would be separating DTIME(n) from NTIME(n), which HAS been done, but not with nice natural problems of the type that I seek.

Friday, April 13, 2018

Lance and Bill Gather for Gardner

Every two years in Atlanta, recreational mathematicians gather to honor Martin Gardner, whose Scientific American column Mathematical Games through the 60's and the 70's. Those columns inspired budding mathematicians of a certain age including Bill and I.

Bill came down to this years Gathering for Gardner 13. Talks are only six minutes long. Bill talked on the Muffin Problem right after an 8-year old and right before Stephan Wolfram.

We also did a short vidcast from the exhibition room.

Monday, April 09, 2018

Whan a deep theorem of your Uncles becomes standard should you be sad?

(An exposition of Nash-Williams's proof of  the Kruskal Tree Theorem is here)

Andrew Vazsonyi (the mathematician, see here, not the folklorist, see here for that folklorist's wife) conjectured that the set of trees, under the minor ordering, is a well quasi order. I do not know when he made the conjecture, but he was born in 1916 and the conjecture was solved in 1960, so you can take a guess based on that. We only know he conjectured it since when Joe Kruskal proved it, he credits Vazsonyi and calls it `Vazonyi's conjecture' I do not know if anyone else ever called it that.
After a conjecture is solved it's usually called by the solvers name, not the poser's name, so it's hard to find out anything about Vazsonyi's conjecture when it was a conjecture.

Joe Kruskal's proof was quite hard and quite deep (are they necc the same? Prob not). See here for that paper. Nash-Williams three years later, in 1963, provided a much easier, though still deep proof.  (I could not find a free online copy to point to- if you know of one please email me or comment.) Nash-Williams prove is in my writeup.

Joe Kruskal is Clyde Kruskal's uncle (Joe is also known in our circles for MST).  I told Clyde that I made his Uncle's Theorem a problem on my TAKE HOME midterm in graduate Ramsey Theory. He PONDERED- is it sad that this once great theorem is now merely a problem in a course?

I asked  some random students from both my Ramsey Theory class and my Aut theory class for their take on this. Here are the responses.

Dolapo: (Aut Student) Clyde should stop worrying about his Uncle's legacy and start building his own!

Ben:   (Ramsey Student) Bill proved in class that SUBSEQUENCE is a wqo. GIVEN that, the problem wasn't that hard. Had he not given it to us the problem would be impossible.

Clyde: Bill, next time you teach the course give them the problem cold- without any prior theorems about wqo.

Bill: I'd rather not

Nishant:(Ramsey Student) Clyde should be happy, and know it, and clap his hands! People are still talking about his Uncle's Theorem!

Bill: If you're happy and you know it clap your hands! Alice is not clapping here hands. Bob is. What can you deduce about Alice and Bob? Never mind, that will be a different post.

Bill:  If I gave the problem on an in-class exam in a grad course or a take-home exam in an ugrad course THEN you should be sad. But take-home exam in grad-course. That's just right.

Ben: (Ramsey Student) When are you going to do a post about this?

Bill: Since my post will include a pointer to a proof of the Kruskal Tree Theorem, I won't post until after you all hand in your take-home midterms.

Nishant and Ben together: Darn!

Joshua (TA for Aut Theory): I heard the grad students complaining about the problem being too hard is a  sign of a great mathematician. If your  grad students complain then  kudos to Joe Kruskal!

Bill: They Didn't comlain

Clyde: Darn!

Ajeet (Ramsey Student) It's much harder to CREATE new math then to LEARN new math. I feel our working out this problem, given what you already gave us, was more a LEARNING thing rather than a CREATE thing. Like P vs NP: easier to verify then generate.

Katherine (Ramsey Student and Clydes Algorithms TA): A bigger problem for Joe Kruskal's legacy is that people in the algorithms class refer to The Kruskal MST algorithm as Clyde's Algorithm.

Saddiq (Aut Theory Student) If Clyde can test on Kruskal MST on his final (which he did) then you can test on Krusakl Tree theorem on your midterm (which you did).

Anyway, one lesson here is that fame is fleeting. Very few people are remembered 100 years after their death. So Clyde - I am helping keep your Uncle's memory alive!

Clyde: Oh Joy

SO- what do you think? Should Clyde be happy that his Uncle's theorem is on my take him midterm? Should he know it? Should he clap his hands?



Thursday, April 05, 2018

Challenge about NFA for {a^y : y\ne 1000} answered.


Recall that in a prior post I asked

Is there an NFA for  { ay : y ≠ 1000 } with substantially less than 1000 states.

I will now show that any NFA for this set requires 999 states, so essntially 1000. The proof uses Ramsey Theory. I will tell you the little bit of Ramsey Theory that you need.

NO- the above is false.

There is an NFA with 60 states.  I have a complete exposition here and I have a paper (with coauthors) on cofinite unary languages and NFA's here). Generally if you want to only avoid
an the NFA an be done in sqrt(n) + O(\log n) states, but requires sqrt(n) states.

I sketch the ideas here for the 1000.

Convention- `reject 988' would mean rejecting a988.

It is easy show the following:
For all n \ge 992 there exists x,y\in N such that n = 32x +33y

For all x,y  32x+ 33y is NOT = 991.

If you have  an NFA with a 33-loop and a shortcut so you can also to 32 and back to the start
state, this NFA

accepts all y ≥ 992

rejects 991

we have no comment on anything else.

So if you prepend 9 states to that NFA you will have an NFA that

accepts all y ≥ 1001

rejects 1000

How to get all the numbers < 1000?

Use mod 3, mod 5, mod 7, mod 11 loops that only accept if the number is NOT equiv to 1000
mod 3, 5, 7, 11, Since 3*5*7*11 > 1000 we have

if y is rejected then y ≤ 1000, but y \equiv 1000 mod 3*5*7*11, so must have y=1000.

so 1000 is the only reject.

Number of states: 1 + 33 + 3 + 5 + 7 + 11 = 60 states.

I think you can do this in 58, but what is the BEST you can do?
My paper has lower bounds of sqrt(n) so in this case 32.

This is a GREAT theorem to teach to students in a course that covers regular langs and also P vs NP since the students THINK that an NFA cannot do better - AH BUT IT CAN! - so it gives a concrete example of

lower bounds are hard since someone may come along with a very clever trick you didn't think of

This semester when I had the class VOTE on whether or not there was a small NFA

48 thought that ANY NFA would require around 1000 states

One of the best students proved this! or perhaps ``proved'' this

Only 2 students knew that it could be done with MUCH LESS than 1000 states- and those two are exceptions- one is co-author on the paper (and he tells me that he originally thought it required 1000 when I first showed him the problem) and one is someone who often goes to my old course websites looking for more information and problems to work on (I like that ambition!) and came across material on this. 

I tell them `you thought this was the last lecture on reg langs. It was not. it was the first lecture on P vs NP'






Tuesday, April 03, 2018

Challenge: Is there a small NFA for { a^i : i\ne 1000} ?


(Added later- a reader left a comment pointing to a paper with the answer and saying that the problem is not original. My apologies-  upon rereading I can see why one would think I was claiming it was my problem. It is not. I had heard the result was folklore but now I have a source! So I thank the commenter and re-iterate that I am NOT claiming it is my problem.)


Consider the language

{L =  ai   : i ≠ 1000 }

There is a DFA for L of size 1002 and one can prove that there is no smaller DFA.

What about an NFA?  Either:

a) Show that any NFA for L requires roughly 1000 states

or

b) Show that there is a small NFA for L, say less than 500 states

or

c) State that you think the question is unknown to science.

I will reveal the answer in my next post, though its possible that (c) is the answer and the comments will convert it to either (a) or (b).

Feel free to leave comments with your answers. if you want to work on it without other information then do not read the comments.






Thursday, March 29, 2018

Almost Famous Quantum Polynomial Time

I have been playing with a new complexity class AFQP, defined in a yet-to-be-published manuscript by Alagna and Fleming. A language L is in AFQP if there is a polynomial-time quantum Turing machine Q such that for all inputs x,
  • If x is in L, then Q(x) accepts with high probability.
  • If x is not in L, then Q(x) rejects with high probability.
  • Q(x) only has O(log |x|) quantumly entangled bits as well as a polynomial amount of "deterministic" memory. 
AFQP is meant to capture the state of the art of current quantum technology.

AFQP has several nice properties, capturing the complexity of many open problems.
  1. If BQP is in AFQP then factoring is in BPP.
  2. If one can solve satisfiability in AFQP then the polynomial-time hierarchy collapses.
  3. AFQP is contained in the fifth level of the polynomial-time hierarchy.
  4. If AFQP is in log-space then matching has a deterministic NC algorithm.
  5. Graph isomorphism is in a quasi-polynomial time version of AFQP.
  6. AFQP = PPAD iff Nash Equilibrium has solutions we can find in polynomial time.
The proofs use a clever combination of Fourier analysis and semidefinite programming. 

Where does the name AFQP come from? The authors claim that they didn't name the class after themselves, and instead say it stands for "Almost Famous Quantum Polynomial Time" as it won't get the fame of BQP. More likely it is because it's April the first and I'm feeling a bit Foolish making up a new complexity class that is just P in disguise. 

A Reduced Turing Award

A Turing machine has an extremely simple instruction set: Move left, move right, read and write. If you want to do real programming, you need something a bit more powerful. So computer architects created chips with more and more operations. As an undergrad at Cornell in the early 80's I worked on coding an email system written in IBM 370 assembly language. As I wrote in a 2011 post,
IBM Assembly language was quite bloated with instructions. There was a command called "Edit and Mark" that went through a range data making modifications based on some other data. This was a single assembly language instruction. We used to joke that there was a single instruction to do your taxes.
After Cornell I spent one year at Berkeley for graduate school where I heard about this new idea: Let's move in the other direction and have a "reduced instruction set computer". Make the chip as simple as possible to simplify the design, save power and increase performance. Fast forward to last week and ACM tapped RISC pioneers John Hennessey and David Patterson for the Turing Award, the highest honor in computer science. They won an award named after the man who created the simplest instruction set of them all.

RISC didn't take over--Intel never really embraced the concept and kept adding new features from real-number operations to AES encryption/decryption built into the processor. GPUs added vector operations and Tensor Processing Units can do machine learning instructions. Added complexity has its complications such as the famous 1994 division bug.

RISC processors do play a role in mobile devices which need to have low power, one can find ARM (Advanced Risc Machines) processors in many mobile phones and tablets. Based on quantity alone there are far many more RISC processors than CISC (C = complex).

These days the lines between the processor and the operating systems blur and we are programming chips. David Patterson continues to work on RISC, with a recent open RISC-V architecture that also supports chip programming. John Hennessey was otherwise occupied--he was president of Stanford from 2000 to 2016.

Monday, March 26, 2018

Why do we give citations? How should we give citations?


Why do we cite past work? There are many reasons and they lead to advice on how we should cite past work

  1. Give credit where credit it due. Some people over cite and that diminishes any one citation. I once saw a paper that had in the first paragraph: ''Similar work in this field has been done by [list of 20 citations].'' One of the papers in that list was extremely relevent, the rest much less so. This is not really helpful. There should be less citations and more about each one.
  2. If you are using a result in a prior paper the user should be able to read that paper. For that reason I try to give the websites of where the paper is. (this might be  less crucial now then it used to be since if a paper is on lnie someplace for free its usually easy to find). Some students ask me if its okay to cite papers on arXiv. Of course it is, especially if it's to guide the reader to a place to read the result. Note also that papers on arXiv are not behind paywalls.
  3. At some point a result is so well known that it does not need a citation. It's not clear when this is. I think people write `by the Cook-Levin theorem' without citing the original source. Nor do people ever cite Ramsey's original paper either. See next item for why this might be a mistake.
  4. A reader might want to know WHEN a result was discovered. For this reason, perhaps people should give references for Cook-Levin or for Ramsey. The original source is often NOT a good place to read a result so I often do  ``by a theorem of Curly [1] (see also Moe's simplification [2] or Larry's survey [3])'' so I give the reader WHO did it first, when they did it, but also an alternative place to read it. However, when does it end? `By the Pythagorean theorem [1] (also see [2])'
  5. A reader should know the context of the result.  Is the problem new? Is the problem related to other problems? Has there been much work in this field? Inquiring minds want to know!