Sunday, March 08, 2026

How does AI do on Baseball-Brothers-Pitchers

In my graduate Ramsey Theory class I taught Kruskal's tree theorem (KTT) which was proven by Joe Kruskal in his PhD thesis in 1960.  (Should that be in a graduate Ramsey Theory class? There are not enough people teaching such a course to get a debate going.) A simpler proof was discovered (invented?) by Nash-Williams in 1963.

The theorem is that the set of trees under the homeomorphism ordering is a well quasi order.

But this blog post is not about well quasi orderings. It's about baseball brothers and AI.

The Kruskals are one of the best math families of all time. See my post on math families.The Bernoullis are another great math family.  What makes both families so great is that they had at least THREE great math people. Most have two.

Having taught the KTT and talked briefly about math families, I was curious how ChatGPT would do on the better-defined question of largest number of wins by a pair of brothers in baseball.  So I asked my students to look that up and include which tools they used,  as a HW problem  (worth 0 points but they had to do it). 

I wrote up 9 pages on what the answer is (there are some issues) and what the students' answers were. See my write up.

In case those 9 pages are tl;dr, here are the main takeaways

1) 7 of the answers given were just WRONG no matter how you look at it.

2) 7 had either the Clarksons who played in the 1880's, so don't count as modern baseball, or there were three of them, or both.  Even so, one could argue these are correct

3) 1 got it right. (That's   one got it right  not  I, bill g, got it right. It can be hard to distinguish the numeral for one, the letter capital i, and the letter small L. I blogged about L vs I here in the context of Weird AI vs Weird AL).


4) There were 13 different answers, which I find amazing.

As usual, when I study some odd issue, I learn a few other things of interest, at least to me, which may also have life lessons, though YMMV. 

a) Around 85% of pitchers in the Hall of Fame have won over 200 games. Dizzy Dean (his brother was also a pitcher which is why I was looking at this) got into the Hall of Fame with only 150 wins. Why? For 6 years he was the most dominant pitcher in the game. In addition (1) there was some sympathy for him since his career was cut short by an injury he got in an All-star game, and (2) he had a colorful personality and people liked him. The four least-wins for a HOF are: Candy Cummings (W-L 145-94).  [he is said to have invented the curveball], Dizzy Dean (W-L 150-83) , Addie Joss (W-L 160-97 )[he only played 9 years, which is less than the 10 needed for eligibility in the HOF, but he died so he was given an exception], Sandy Koufax (W-L 165-87) [he was dominant, like Dean, for a short time]. Sandy is the only one who is still alive. (W-L means Win-Loss. For example, Candy won 145 games and lost 94.)


b) Modern baseball starts in 1900. But this is arbitrary. In 1904 Jack Chesbro won 40 games which would never happen in modern baseball. But you have to draw the line someplace.  History is hard because there are fuzzy boundaries.

c) I had not known about the Clarkson brothers. 

d) My interest in this subject goes back to 1973. In that year I heard the following during a baseball game and, ever since I began blogging, I wondered how I could fit it into a blog post:


An old baseball trick question is now gone!  Just last week [in 1973] if you asked

What pair of brothers in baseball won the most games?

the answer was Christy and Henry Mathewson. Christy played for 16 seasons and had a W-L record of 373-188, while his brother Henry played in 3 games and had a W-L record of 0-1. So their total number of wins is 373 which was the record.

But this last week [in 1973] the Perry brothers, Gaylord and Jim, got 374 wins between them.  Hence the question

What pair of brothers in baseball won the most games?

is no longer a trick question since both Perrys are fine pitchers [Gaylord made it into the Hall of Fame; Jim didn't, but Jim was still quite good.]

As you will see in my writeup, the Perrys' record was broken by the Niekros.

e) Christy and Henry are a trick answer because Henry wasn't much of a player with his 0-1 W-L record. Are there other pairs like that? Greg (355 wins) and Mike (39 wins) Maddux might seem that way but they are not. While Mike only had 39 wins he had a 14-year career as a relief pitcher. Such pitchers can be valuable to the team, but because of their role they do not get many wins.



SO the usual question: Why did AI get this one wrong? Lance will say that the paid version wouldn't have and he may be right. David in Tokyo might say that ChatGPT is a random word generator. Others tell me that ChatGPT does not understand the questions it is asked (my proofreader thinks this is obvious).  I'll add that the pitcher-brother  question has more ambiguity than I thought- do you count pitchers before 1900? Do you count a brother who won 0 games? What about 3 brothers (not the restaurant)?

4 comments:

  1. Cy Young https://en.wikipedia.org/wiki/Cy_Young had a carreer that straddled the year 1900 and so also illustrates the fuzzyness of historical boundaries. He and his brother Alonzo together had 511 wins, behind the Neikro's but still worth a mention (Alonzo did not play in the majors, but the statement is mathematically correct :-) ).

    ReplyDelete
    Replies
    1. 1) Indeed- in the modern era it would be impossible to win 511 games. Walter Johnson played 1907-1927 and won 417 games. I am not sure if I want to call that `modern' Again Fuzzy Boundaries.
      2) Implicit is that there has to BE a brother who played SOME major league baseball. Bill James had a way around this issue when he used WAR (Wins above Replacement, a measure of how good a player is). If there are three players, Alice, Bob, Carol and that is their ranking of how good they are, map to
      Alice-WAR + 2(Bob-WAR) + 3(Carol-WAR). Cy Young may still top the list.

      Delete
  2. _Why did ChatGPT get this one wrong_.

    Well, not looking at the question or answer, the obvious point is that ChatGPT is great at coughing up answers that are in (or similar to something in) it's training data, and not so great at stuff that isn't. Just as it's great at "solving" math problems for which there's a solution in it's training data, and very bad at math problems that aren't.

    The _underlying technology_ has problems with logic and counting and the like, but there's so much stuff in the training data that it's hard to find things that stress said underlying technology.

    Once you've found a problem that isn't in the training data, writing about it changes that state, and next week it'll be able to "solve" said problem.

    But you knew that. And still enjoy playing with it. That, I don't understand.



    ReplyDelete
    Replies
    1. Actually I am surprised both with what ChatGPT can do, and can't do. And I find it hard to predict.As you point out- I don't know what they trained it on.

      Delete