Sunday, February 08, 2026

I used to think historians in the future will have too much to work with. I could be wrong

 (I thought I had already posted this but the blogger system we use says I didn't. Apologies if I did. Most likely is that I posted something similar. When you blog for X years you forget what you've already blogged on.) 

Historians who study ancient Greece often have to work with fragments of text or just a few pottery shards. Nowadays we preserve so much that historians 1000 years from now will have an easier time. Indeed, they may have too much to look at; and have to sort through news, fake news, opinions, and satires, to figure out what was true.

The above is what I used to think. But I could be wrong. 

1) When technology changes stuff is lost. E.g., floppy disks.

2) (This is the inspiration for this blog post) Harry Lewis gave a talk in Zurich on 

The Birth of Binary: Leibniz and the origins of computer arithmetic

On Dec. 8, 2022 at 1:15PM-3:30PM Zurich time. I didn't watch it live (too early in the morning, east coast time) but it was taped and I watched a recording later. Yeah!

His blog about it (see here) had a pointer to the video, and my blog about it (see here) had a pointer to both the video and to his blog.

A while back  I was writing a blog post where I wanted to point to the video. My link didn't work. His link didn't work. I emailed him asking where it was. IT IS LOST FOREVER! Future Historians will not know about Leibniz and binary! Or they might--- he has a book on the topic that I reviewed here. But what if the book goes out of print and the only information on this topic is my review of his book? 

3) Entire journals can vanish. I blogged about that here.

4) I am happy that the link to the Wikipedia entry on Link Rot (see here) has not rotted.

5) I did a post on what tends to NOT be recorded and hence may be lost forever here.

6) (This is  bigger topic than my one point here.) People tend to OWN less than they used to. 


DVDs-don't bother buying! Whatever you want is on streaming (I recently watched, for the first time, Buffy the Vampire Slayer, one episode a day, on Treadmill, and it was great!)

CD's- don't bother buying!  Use Spotify. I do that and it's awesome-I have found novelty songs I didn't know about! Including a song by The Doubleclicks  which I thought was about Buffy: here. I emailed them about that it and they responded with: Hello! Buffy, hunger games, divergent, Harry Potter, you name it.

JOURNALS- don't bother buying them, its all on arXiv (Very true in TCS, might be less true in other fields). 

CONFERENCES: Not sure. I think very few have paper proceedings. At one time they gave out memory sticks with all the papers on them, so that IS ownership though depends on technology that might go away. Not sure what they do now. 

This may make it easier to lose things since nobody has a physical copy. 

7) Counterargument: Even given the points above, far more today IS being preserved than used to be. See my blog post on that here. But will that be true in the long run? 

8) I began saying that I used to think future historians will have too much to look at and have to sort through lots of stuff (using quicksort?) to figure out what's true. Then I said they may lose a lot. Oddly enough, both might be true- of the stuff they DO have they will have a hard time figuring out what's true (e.g., Was Pope Leo's ugrad thesis on Rado's Theorem for Non-Linear Equations? No. See my blog about that falsehood getting out to the world here. Spoiler alert- it was my fault.)

QUESTIONS:

1) Am I right--- will the future lose lots of stuff?

2) If so, what can we do about this? Not clear who we is in that last sentence. 



11 comments:

  1. Not to worry. It's all been put into LLM training data, and it'll all come back (with interesting halucinations added) if you just feed it the right query.

    My particular horror here is that about half my library is in Amazon ebooks and if Amazon goes bankrupt (or decides not to talk to me*), they all disappear.

    More generally, things like YouTube and Google search and Twitter are (i.e. should be seen as) _necessary public services_. And if you let private companies run your necessary public services, it's real easy for very bad things to happen.

    *: Which actually has happened: I live in Japan and need to talk to Amazon Japan. Which no longer accepts credit cards not issued in Japan. Fortunately, I happen to have a Japanese-bank issued credit card. Which was unused until recently. Luckily, they were happy to renew it.

    ReplyDelete
  2. I once wanted to show someone a talk I gave at ICML 2018 and it is gone. Almost all of the talks are gone.

    ReplyDelete
  3. I have lived long enough to notice some occurrences where the "mainstream historic record" is quite different from my memory of what really happened, also during the internet age (and moving from "Web 1.0" to social networks made things much worse). History will continue being a guess game played in a sea of half-truths and untruths.

    ReplyDelete
  4. Here is a fun one, "More photos are lost each day now than were taken in the entire 19th century."

    People take billions of photos each day nowadays. It is easy to imagine at least a few million of those being lost/deleted each day.

    Of course the vast majority of those will be gone 100 years from now. Does that really matter?

    One question worth exploring might be how many of the billions of photos taken and printed during the 20th Century (perhaps even a few trillion) still exist today in practice (not in landfills).

    A central question might be whether the "important" things will still exist for historians. What organizations are preserving our media? Will those organizations continue to do so and keep the tools needed to view them alive? As one example, how is the National Archives handling this?

    A professor once held up a document written on animal skin hundreds of years ago and pointed to how it survived and told use that our digital documents would not. However, he had no idea what was written on it when asked because to try to open the document would have caused it to crack into pieces. His animal skin document felt more like my blue floppy disk which my drive could no longer read.

    ReplyDelete
    Replies
    1. I wonder how many photos were actually taken in the 19th century. Off hand, I'd think it would have been nowhere near one million. Photos were largely one at a time in a monster camera with nasty chemicals.

      But.I'm.Wrong.

      The 1888 Kodak (the first roll film camera, with a 100 shot (!!!) roll of film) was an instant success, sold 70,000 units the first year, and Eastman was processing 70 rolls (of 100 shots each) every day. That's a lot of photos.

      That's easily well over 2 million photos in the first 10 years.

      Delete
    2. For your consideration...

      A site called 1,000 memories which claimed, "We aspire to create a place that visitors can trust, a place where visitors know that their loved ones will be treated with respect and where their memory will be maintained forever." had a blog post around 15 years ago where they tried to estimate how many photos had ever been taken and how many were "confined to our shoeboxes as lost relics of a pre-digital era."

      The site is now dead. An equally lost relic, but of a digital era?

      The Internet Archives still has the post, perhaps just like there is a box of print photos in my closet.
      https://web.archive.org/web/20110923154948/http://1000memories.com/blog/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox

      Delete
  5. People throw things out that are later important or (in rare cases) valuable. Sally Hemming had no idea that her life would be studied many years later. People who bought the first issue of Superman had no idea it would become valuable.

    ReplyDelete
    Replies
    1. The first issue of Superman only became so valuable because people had no idea ;)

      Delete
  6. The Arctic World Archive stores some data, in an abandoned mine in Svalbard, Norway. It's limited capacity, but it is remote, so sounds fairly permanent.

    In the particular case of software, Github's Arctic Code Vault archived a 2020 snapshot of all code hosted on it (in the above-mentioned AWA). This would track changes to a chunk of software. That one's probably only interesting to a limited number of historians, though.

    I'm not a Git expert, but in reading its documentation, many commands warn about "losing commit history"...

    I'm sure that future historians will care about things besides source code (and data), in any case. As you say, it's hard to predict what'll be important.

    ReplyDelete
  7. I've been rereading the Foundation books, inspired by the recent TV series. There, in the far future, the basic facts about the origin of humanity have been lost to the galaxy-wide civilization. (There are various entities causing this to happen, of course.) When the Internet came into being, I formed the idea that this was totally implausible, since the important facts are stored in a really distributed survivable way and would last as long as the civilization that built the Internet.

    But I've come to think that the way facts get lost is that rather than being destroyed, they are masked by other mostly incorrect statements. I'm reminded of Lasswitz' Universal Library, which has a copy of every possible book of a particular length. So all the true histories of every event are there, along with all conceivable incorrect histories of those events, syntactically correct or not.

    I'm afraid that the consequence of our new AI era is that more of the incorrect ones are going to be created and propagated...

    ReplyDelete
  8. The Wayback Machine is often the best way to recover content from the web, but the switch to dynamic web-pages (and the sheer size of it all) makes this very tough and often impossible. As more and more people who have created great content have retired or moved on we are losing more access. The content just disappears unless it has been captured by the Wayback Machine. If that goes, then we will definitely be sunk.

    ReplyDelete