Monday, February 27, 2023

I wish we had less students in a Class. Demographics says I may get my wish.

 According to this article, in the near future LESS people will be going to college. There is even a name for this upcoming shift: The Enrollment Cliff. Why?

Is it Covid-related?  Is it that College has gotten to expensive? To liberal? To much cancel culture?  To many dead white males in the core? The core is to multicultural? Online learning is stealing our students? 

No. The reason is actually very boring and does not serve anyone's political agenda. (thats not quite right).  Or any agenda. And you can probably guess the cause from the title of this blog post.

For some years up until 2007 the birth rate was slowly dropping. Then there was a large drop in the birth rate after the recession of 2007, and the birth rate has never really recovered. And the recession might not have that much to do with it-- the long term move from an agricultural society (where kids are an economic gain) to an industrial one (where, after child labor laws and the expense of college, kids are an economic loss- though that can be debated) has resulted in a very long term decline in births. 

And from personal experience, I know (a) very few people who have 4 or more kids, (b) there is NO stigma about having 0 kids as there once was.  Of course the sample size of people I know may be skewed. 

ANYWAY, what will this mean for colleges? 

a) Harvard, Yale, etc will not be affected. Plenty of people will still apply. Note that they draw from all of American and also internationally. 

b) Colleges that draw from a local area may be affected a lot since they depend on locals, and that population may be shrinking. 

c) Schools in between Harvard and Small colleges- hard to say. 

d) The sports betting places paying schools to allow them to promote on campus (and in some cases helping them promote it) may find far less students to sucker into this loser's game. See my blog on this topic here

Univ of MD has around 4000 Computer Science majors (depending on who tells you this its either a brag or a complaint). In the Spring of 2023 there are three lectures of Discrete math of sizes 240, 270, and 90. Each of those also has recitations of  30 (or so) each. If the decline is gradual (either from demographics or from the CS majors bubble finally bursting, or from the other reasons above) then I am sure we can handle it. If it declines very suddenly we may have a problem adjusting. 

One caveat to this that I've heard is that immigration will save us. Maybe. But America is politically going in the opposite direction. The counterargument of without immigration there will be less students going to college is not that compelling to most Americans. There are other more intelligent and compelling pro-immigration arguments. However, American politics is no longer interested in compelling and logical arguments. (The notion that it once was may be nostalgia for a time that never was.) 


Thursday, February 23, 2023

The Virtual Grad Student

Martin Haug, who is working on a LaTeX alternative Typst, asked me if I had updates on a LaTeX rant from 2011. I haven't seen any new serious backward compatibility problems. We have easier collaboration through on-line editors like Overleaf. We have got closer to WSYWIG thanks to quick compiling but still not at the level of Word or Google Docs. The big problem of user friendliness remains. There's a reason LaTeX has its own Stack Exchange

But we live in a new machine learning world. Can we use generative AI to make LaTeX easier to use?

Mandatory Disclaimer: Generative AI can sometimes create inaccurate, inappropriate or previously-published material. You are ultimately responsible for the contents of your paper no matter how you produced it.

Since I sometimes think of LaTeX as a programming language for papers, I tweeted

Thanks for the responses. The answer to the question is yes, GitHub Copilot works for LaTeX if you edit LaTeX in a programming environment like VS Code, Neovim or Jet Brains. It helps with formatting of formulas and pictures, less so on the text itself. I made a video so you can see how it works.

Latext AI offers a chrome extension that will let you generate text via GPT in Overleaf based on a prompt or previous text, though Latext requires a subscription after a one-week trial. You can also just cut and paste between any text editor and ChatGPT.

ChatGPT notoriously makes up references if you ask for them. Can we have a good system that finds relevant articles to cite and adds them automatically into your bibliography?

Ideally all these should work together seamlessly, suggestions that happen as you type. A true co-pilot for research papers.

There are many more tools out there, feel free to add them to the comments. I expect the integration to improve over time as we develop new APIs and models.

I look forward to the days of a virtual grad student: Here's a research goal and an idea to get there. Now go figure out the details and write the paper. 

It will be a long wait.

Sunday, February 19, 2023

It is more important than ever to teach your students probability (even non-stem students)

(This topic was also covered here.) 

You are a college president. An online betting company says  We will give you X dollars if you allow us to promote online gambling at your University.

I suspect you would say NO.

Too late- it's already happening. A link to a NY times article about this is: here. I urge you to read the entire article. It's worse than it sounds. 

My thoughts

0) I wondered if  a company needed permission to promote a product on a campus. I am not sure of the answer; however, in some cases a school HELPED with the promotion: 

a) During a game there are announcements reminding students that they can place a sports bet! It's easy! It's fun!

b) Links on the schools website to sports gambling sites

c) References to sports betting in emails that goto students.

This is WAY BEYOND  allowing a company to promote.

1) Some points from the article 

Some aspects of the deals also appear to violate the gambling industry's own rules against marketing to underage people. The ``Responsible Marketing Code'' published by the American Gaming Association, the umbrella group for the industry, says sports betting should not be advertised on college campuses. 

``We are not seeing enough oversight, transparency, and education to support the rollout of these kinds of deals'' said Michael Goldman who teaches sports marketing at the Univ of San. Fran. 

During the pandemic, many universities struggled financially ...To fill those holes public and private universities nationwide have been struggling to line up new revenue sources, including by arranging sponsorship deals. (MY THOUGHTS- They don't quite say it, but it seems like the extra money is going back to sports programs. I would be happier if it went into academics- and to be fair, maybe some of it does.) 

2) Online gambling is more addictive than in-person gambling. And it's easier since you don't have to leave your dorm room to do it. 

3) The school gets money and  teaches the students that everything is for sale. So it's a win-win (I am kidding.) 

4) Should a college take  money to allow the promotion of tobacco or alcohol or (if it becomes legal) heroin? I see NO difference between those and online gambling. (See here)

5) I am in favor of all of those things being legal (maybe not heroin but I am open to debate on that)  however, there is a big difference between making something legal, and promoting it.  

6) Silver Lining: This may encourage more students, even non-STEM students, to learn probability. Either advertise it honestly:


Take Probability to find out that Sports Betting is a Loser's Game


Or advertise it dishonestly


Take Probability to find out how you can win at Sports Betting!


 



Thursday, February 16, 2023

Blurry JPEG or Frozen Concentrate



Ted Chiang in a recent New Yorker article likened ChatGPT to a blurry JPEG, i.e. a "lossy compression" of the web. It's a good article but the analogy isn't quite right, there's a different kind of compression happening. Think of all human written knowledge as a random example of what could have been generated and we remove the randomness, like water is removed to make concentrated orange juice. We then add water (or randomness) to get back some version of the original. 

Lossless compression, like gzip, gives a compressed version of some data with the ability to reconstruct it exactly. It corresponds nicely to Kolmogorov complexity where K(x) is the smallest program p that generates the string x. p is a lossless compression of x.

Lossy compression, like JPEG, often allows much higher compression but with some error. In Kolmogorov terms you are trading off the size of the program p and some error function between x and the output of p. Most compression programs for pictures, music and video use algorithms designed for the specific medium. You can also use machine learning to get lossy compression by training both the compression and decompression algorithms.

Lossy compression tries to recreate the original picture. Generative AI, like ChatGPT, takes a different approach. Let's consider Wikipedia as this is the example used by Chiang. For any specific topic, there are many different ways to write a Wikipedia article, as good as or better than the article that currently exists. ChatGPT doesn't need to recreate anything close to the original article, just one that explains topic well. What we want is a description of a program p that corresponds to a set of possible Wikipedia articles, of which the real article is a random example of this set. An ideal version of ChatGPT would choose a random article from this set. Dall-E, generative AI for art, works a similar way, creating art that is a random example of what art might have been. 

In terms of Kolmogorov complexity, this corresponds to the Kolmogorov Structure Function, basically the smallest program p such that p describes a set S of size m that contains x. with |p| + log m ≈ K(x). The string x is just a random element of S, you can get a string like it by picking an element of S at random.

There is no recursive algorithm that will find p and we also need to limit ourselves to p that are computationally efficient, which means that generative AI algorithms may never be ideal and will sometimes make mistakes. That doesn't mean we shouldn't use them just that we need to be wary of their limitations. As the saying goes "All models are wrong, but some are useful".

Sunday, February 12, 2023

When is a paper `Easily Available' ?

I was looking at the paper 

                                PSPACE-Completeness of reversible deterministic systems

by Erik Demaine, Robert Hearn,  Dylan Hendrickson, and Jayson Lynch (see here) and came across the following fascinating result which I paraphrase:

The problem of, given balls on a pool table (though it can be one you devise which is not the standard one) and each balls initial position and velocity, and a particular ball and place, it is PSPACE complete to determine if that ball ever gets to that place. 

Demaine et al. stated that this was proven by Edward Fredkin and Tommaso Toffoli in 1982 (see here for a link to the 1982 paper, not behind a paywall). Demaine et al. gave an easier proof with some nice properties. (Just in case the link goes away I downloaded the paper to my files and you can find it here.) 

I needed the bib reference for the FT-1982 paper and rather than copy it from Demaine et al. I wanted to cut-and-paste, so I looked for it in DBLP. I didn't find the 1982 paper but I did find a book from 2002 that reprinted it. The book, Collision-based computing, has a website here. The book itself is behind a paywall.

On the website is the following curious statement:

[This book] Gives a state-of-the-art overview of an emerging topic, on which there is little published literature at the moment. [The book] Includes 2 classic paper, both of which are widely referred to but are NOT EASILY AVAILABLE (E. Fredkin and T. Toffoli: Conservative Logic, and N . Margolous Physics-Like Models of Computation). 

The caps are mine.

Not easily available? I found a link in less than a minute, and I used it above when I pointed to the paper. 

But the book IS behind a paywall. 

Perhaps Springer does not know that the article is easily available. That would be odd since the place I found the article is also a Springer website. 

The notion of EASILY AVAILABLE is very odd. While not quite related, it reminds me of when MIT Press had to pay a few thousand dollars for permission (that might not be the legal term) to reprint Turing's 1936 paper where he defined Turing Machines (he didn't call them that), which is on line here (and other places), for Harry Lewis's book Ideas that created the future. 




Thursday, February 09, 2023

Why Can't Little Chatty Do Math?

Despite OpenAI's claim that ChatGPT has improved mathematical capabilities, we don't get far multiplying large numbers.

L:What is 866739766 * 745762645?  C:647733560997969470

Typical for ChatGPT, the answer passes the smell test. It has the right number of digits and has correct first and last couple of digits. But the real answer is 646382140418841070,  quite different from the number given.

As far as I know, multiplication isn't known to be in TC0, the complexity class that roughly corresponds to neural nets. [Note Added: Multiplication is in TC0. See comments.] Also functions learned by deep learning can often be inverted by deep learning. So if AI can learn how to multiply, it might also learn how to factor. 

But what about addition? Addition is known to be in TC0 and ChatGPT performs better.


The correct answer is 1612502411, only one digit off but still wrong. The TC0 algorithm needs to do some tricks for carry lookahead that is probably hard to learn. Addition is easier if you work from right to left, but ChatGPT has trouble reversing numbers. There's a limit to its self-attention.



ChatGPT can't multiply but it does know how to write a program to multiply.


It still claims the result will be the same as before. Running the program gives the correct answer 646382140418841070. 

ChatGPT is run on a general purpose computer, so one could hope a later version that could determine when its given a math question, write a program and run it. That's probably too dangerous--we would want to avoid a code injection vulnerability. But still it could use an API to WolframAlpha or some other math engine. Or a chess engine to play chess. Etc. 

Monday, February 06, 2023

After you are notified that an article is accepted...

 After just one round of referees reports

(they send me the reports, I made the corrections, they were happy) I got email saying my paper on proving the primes are infinite FROM Schur's theorem in Ramsey was ACCEPTED. Yeah! Now What? 

1) The journal send me an email with a link GOOD FOR ONLY FIFTY DAYS to help me publicize the article. Here is the link:

https://authors.elsevier.com/a/1gTyD,H-cWw6X


Will this really help? The article is already on arxiv. (ADDED LATER: the link on arxiv is here.)  Also, I can blog about it, but how do non-bloggers publicize their work? Do they need to? 

(ADDED LATER: A commenter wanted to know why I am publishing in an Elsevier journal. This was a memorial issue in honor of Landon Rabern (see here) a combinatorist who died young.  I was invited to submit an article.) 

QUESTION: Is this common practice? If so, what do you do with those links? Email them to all of your the people who should care about the article?

2) I got some forms to fill out that asked how many offprints I wanted. While my readers can probably guess what that means, I will remind you: paper copies of the article. I filled out the form:

I want 0 of them.

They still wanted to know the address to send the 0 copies to, so I gave that as well.

Does anyone actually get offprints anymore? That seems so 1990's. With everything on the web I tend to email people who want article pointers. In fact, that happens rarely - either nobody wants to read my articles (quite possible) or they find them on my website (quite possible). 

In 1991 when I went up for tenure the dept wanted 15 copies of every article I wrote so they could send my letter writers (and others) all my stuff. Rumor is that the Governor of Maryland got a copy of every article I ever wrote. I hoped he was a fan of oracle constructions. 

In 1998 when I went up for full prof they did not do this, assuming that the letter writers could find what the needed on the web. I do wonder about that- it might have been a nice courtesy to send them stuff directly and that would be a use for offprints. Depends on if my letter writers prefer reading online or on paper. They could of course print out my papers, but again- as a courtesy perhaps we should have supplied the papers. 

QUESTION: Do you order a non-zero number of offprints and if so why? 

3) The journal offered to have my article to be open access at their site for a price. I did not do this as, again, the article is already on arxiv.

QUESTION: Is there a reason to have your article formally open-access given that its already on arixv? 

4) One of my co-authors on a different article asked me When will it appear IN PRINT?  I can't imagine caring about that. Its already on arxiv and I doubt having it in a journal behind paywalls will increase its visibility AT ALL. The only reason to care about when it appears IN PRINT is so I can update my resume from TO APPEAR to the actual volume and number and year. 

QUESTION: Aside from updating your resume do you care when an article that was accepted appears IN PRINT? And if so why? 



Thursday, February 02, 2023

Responsibility

Nature laid out their ground rules for large language models like ChatGPT including

No LLM tool will be accepted as a credited author on a research paper. That is because any attribution of authorship carries with it accountability for the work, and AI tools cannot take such responsibility.

Let's focus on the last word "responsibility". What does that mean for an author? It means we can hold an author, or set of authors, responsible for any issues in the paper such as

  • The proofs, calculations, code, formulas, measurements, statistics, and other details of the research.
  • Any interpretation or conclusions made in the article
  • Properly citing related work, especially work that calls into question the novelty of this research
  • The article does not contain text identical or very similar to previous work.
  • Anything else described in the article.
The authors should take reasonable measures to ensure that a paper is free from any issues above. Nobody is perfect and if you make a mistake in a paper, you should, as with all mistakes, take responsibility and acknowledge the problems, do everything you can to rectify the issues, such as publishing a corrigendum if needed, and work to ensure you won't make similar mistakes in the future.

Mistakes can arise outside of an author's actions. Perhaps a computer chip makes faulty calculations, you relied on a faulty theorem in another paper, your main result appeared in a paper fifteen years ago in an obscure journal, a LaTeX package for the journal created some mistakes in the formulas or a student who helped with the research or exposition took a lazy way out, or you put too much trust in AI generative text. Nevertheless the responsibility remains with the authors. 

Could an AI ever take responsibility for an academic paper? Would a toaster ever take responsibility for burning my breakfast?