Sunday, November 30, 2025

Does ChatGPT really help programmers?

 
BILL: I honestly do not know whether ChatGPT will make programmers more productive. (I am not touching question of whether it puts programmers out of work. That's a problem for Future Bill.) Who can I ask? I found two people who disagree on the issue:

Alice who supports developers in industry. She doesn't write code full time now, but she has written plenty before. She thinks that NO, ChatGPT and LLMs DO NOT help programmers.

Bob is a  friend of a friend who writes code for a living and owns a small software/database company. He thinks YES, ChatGPT and LLMs DO help programmers.

I asked Alice and Bob to discuss the issue over email and make a blog post out of it, or have ChatGPT do it for them. Below is the result.

---------------------------------------------

ALICE: Recently I needed to add CSV import to my apps. The columns in the file needed to be optional and allowed in any order. I  could have figured it out myself but I thought Why not let an LLM do the boring part?

The first two answers it gave me ignored my requirement on the columns. If a human had done this, I would have asked them whether they had read what I wrote. On the third try, it produced something
reasonable. But in the end I went with a different approach anyway.

So yes, the LLM gave me code---just not the code I wanted. Sort of like ordering a salad and getting a smoothie: technically vegetable, but not what I asked for.

BOB: ALICE; you’re basically proving my point for me. You gave the model a fuzzy spec, it gave you fuzzy code. That’s not an indictment of LLMs — that’s just computers doing what computers have done for 70 years.

When I give an LLM a tight spec, it behaves. When I’m vague, it hallucinates. Honestly, that’s not that different from hiring a junior dev, except that the LLM doesn't steal your sparkling water out of the refrigerator or ask what CRUD stands for.  (Create-Read-Update-Delete)

BILL: I recommend keeping a small refrigerator in your office rather than use a communal one.

BOB: Good idea! Anyway, this (writing code, not giving advice on cutting down office theft) is exactly why LLMs save me time: I can hand them the dull bits: scaffolding, boilerplate, adapters — all the stuff I can write but would rather not spend a day on when a decent model will spit out 80% of it in 20 seconds.


ALICE: We tried having an LLM add some error checking to one of my apps. I wanted to put the checks in a plausible place, but it wasn't the best place because it didn't understand that there was another method that already did some of the error checking. It was able to modify a method in a way that was similar to another method it was told to use as a sample. But, I didn't find this too useful. Trying to get it to make such small changes to the code just interrupted my train of thought and required me to spend effort carefully reviewing what it wanted to do.

BOB: I get the train of thought thing — but that’s because you’re trying to use the model as a scalpel. I use it as a bulldozer.

I don’t ask it to make tiny edits inside an existing method; I ask it to refactor an entire 20-year-old unit. That is where it shines.

Recently I refactored code I hadn’t touched since the middle ages. Without an LLM? A week of muttering my ancient comments I’m embarrassed to admit I wrote. With an LLM? Half a day.

It identified the structure, spotted duplications, clarified naming, and turned archaeological dig into Tuesday morning. That’s the sort of productivity gain you don’t shrug off.

ALICE: We had an LLM review one of my apps. Some of what it wrote was correct, but some were just nonsense. It was just writing plausible sounding things, picking bits of my code to justify them, but didn't  understand the code. Its suggestions were vague (e.g., add unit tests), which is the code equivalent of your dentist saying floss more.

BOB: If you ask an LLM to review my app, you’ll get vague motherhood statements. If you ask a human the same Focus only on concurrency issues in this module, or Explain where this user’s error could be triggered based on these logs, it becomes very sharp.


This is, frankly, where it’s scarily good: tracking down where a user-reported issue is probably happening. Humans tell me symptoms; the LLM translates that into “It’s almost certainly in this 40-line cluster over here”, and it’s right far more often than chance. That alone pays for the subscription.

BILL: I've read through your 194 email saga, I still don't know what the upshot is. So, like a court case, each of you give your closing statements.

BOB: LLMs are not magic interns who understand your architecture. They're incredibly powerful amplifiers when you use them properly. For me, the time savings are real — especially in refactoring legacy code, debugging from cryptic user reports, and generating UI scaffolding that would otherwise be tedious.

Do I trust them to invent a brand-new algorithm whole cloth? Of course not — I write NP-Complete solvers for fun. But for 70% of my day-to-day coding, they turn chores into quick wins. And that makes me a more productive programmer, not a lazier one. 

ALICE: LLMs may be helpful for writing small, self-contained chunks of code or for suggesting approaches. But, they do  not understand code. Most of what programmers do requires understanding the code. So, LLMs will be of little, if any benefit, to most competent programmers. Coaxing LLMs to produced something useful takes more time than people realize.

Or as I like to put it, they're great at writing code-shaped objects, not actual code.

BILL: Thank you. I am enlightened. And don't forget to floss before brushing.




8 comments:

  1. any chance we'll get to read the original emails?

    ReplyDelete
  2. While Alice and Bob were arguing I used Gemini Antigravity to fix old links in the blog. You can now navigate through the Foundations of Complexity Lessons again.

    Saved me days of work. I not only didn't write the code, I didn't even look at it.

    ReplyDelete
    Replies
    1. Does this mean that your old blog posts could now have been arbitrarily overwritten by a mad AI?

      Delete
  3. my take:

    it is helpful and save time in some tasks, it is very far from replacing a junior software developer.

    It can do code transformations, afterall Transformers were originally invented for natural language translation.

    It can do semantic search on data.

    It can do copy-paste-edit.

    It can tell you how to use some library.

    It can tell you which library meeting can be used for foo.

    It can tell you how to fix simple code compile errors, though makes a lot of mistakes as well.

    It can write unit tests.

    It can build simple prototypes and basic structure for an application.

    And similar things.

    Code understanding: small coffee yes. Big complex software no.

    Software engineering is dealing with imperfect coffee with imperfect documentation, with undocumented hacks, complex systems working together in a way that have lots of nuances. It is often hard even for humans to understand what is going on. You need to spend considerable time to understand how some software is working with all its nuances and imperfections.

    Need on experience of our colleagues, the AI coding agents generate a lot of code that they then need to reject.

    The metrics that matter are not punished by the companies like Google or OpenAI or Anthropic, or other companies claiming that AI is going to replace programmers.

    What percent of suggestions are rejected by humans? How much time humans are spending on guiding them? How many AI iterations and human feedback looks before the AI generates the code that is accepted? What percentage is accepted with any human modifications?

    For different kinds of tasks, how much it takes to complete by an AI-augemented engineer vs. just the engineer?

    What is the productivity gain? How fast customer reported issues are solved vs. before AI? How much coffee is written vs. before AI? How long does it take to fix a typical customer reported issue? How long does it take to implement a new feature request?

    How AI generate code impacts maintenance costs? How well is the AI generated code from quality perspective? How concise is the AI generated code vs. good developers? How easy is it to understand AI generated code? ....

    So the reality is somewhere between the hyped claims by executives vs. those saying AI is useless. It is useful, but far from the level needed to replace junior software engineers at this point.

    ReplyDelete
    Replies
    1. ``code understanding: small coffee yes'' is the word coffee a typo or a metaphor or is it young people slang that I haven not caught up with yet?

      Delete
    2. it is typo, AI auto complete mistakes,

      s/coffee/code

      Delete