These articles keep popping up, analyzing an hypothetical usage of AI (and guessing it won’t be useful) as if it wasn’t something already being used in practice. It’s kinda weird to me.
“It won’t deal with abstractions” -> try asking cursor for potential refactors or patterns that could be useful for a given text.
“It doesn’t understand things beyond the code” -> try giving them an abstract jira ticket or asking what it things about certain naming, with enough context
“Reading code and understanding whether it’s wrong will take more time than writing it yourself” -> ask any engineer that saves time with everything from test scaffolding to run-and-forget scripts.
It’s as if I wrote an article today arguing that exercise won’t make you able to lift more weight - every gymgoer would raise an eyebrow, and it’s hard to imagine even the non-gymgoers would be sheltered enough to buy the argument either.
While I tend to agree with your premise that the linked article seems to be reasoning to the extreme off the basis of a very small code snippet, I think the core critique the author wants to make stands:
AI agents alone, unbounded, currently cannot provide huge value.
> try asking cursor for potential refactors or patterns that could be useful for a given text.
You, the developer, will be selecting this text.
> try giving them an abstract jira ticket or asking what it things about certain naming, with enough context
You still selected a JIRA ticket and provided context.
> ask any engineer that saves time with everything from test scaffolding to run-and-forget scripts.
Yes that is true, but again, what you are providing as a counterfactual are very bounded, aka easy contexts.
In any case, the industry (both the LLM providers as well as tooling builders and devs) is clearly going into the direction of constantly etching out small imoprovements by refining which context is deemed relevant for a given problem and most efficient ways to feed it to LLMs.
And let's not kid ourselves, Microsoft, OpenAI, hell Anthropic all have 2027-2029 plans where these things will be significantly more powerful.
Here's an experience I've had with Claude Code several times:
1. I'll tell Claude Code to fix a bug.
2. Claude Code will fail, and after a few rounds of explaining the error and asking it to try again, I'll conclude this issue is outside the AI's ability to handle, and resign myself to fixing it the old fashioned way.
3. I'll start actually looking into the bug on my own, and develop a slightly deeper understanding of the problem on a technical level. I still don't understand every layer to the point where I could easily code a solution.
4. I'll once again ask Claude Code to fix the bug, this time including the little bit I learned in #3. Claude Code succeeds in one round.
I'd thought I'd discovered a limit to what the AI could do, but just the smallest bit of digging was enough to un-stick the AI, and I still didn't have to actually write the code myself.
(Note that I'm not a professional programmer and all of this is happening on hobby projects.)
> I once again ask Claude Code to fix the bug, this time including the little bit I learned in #3. Claude Code fixes the problem in one round.
Context is king, which makes sense since LLM output is based on probability. The more context you can provide it, the more aligned the output will be. It's not like it magically learned something new. Depending on the problem, you may have to explain exactly what you want. If the problem is well understood, a sentence would most likely be suffice.
>If the problem is well understood, a sentence would most likely be suffice.
I feel this falls flat for the rather well-bounded use case I really want: a universal IDE that can set up my environment with a buildable/runnable boilerplate "hello world" for arbitrary project targets. I tried vibe coding an NES 6502 "hello world" program with Cursor and it took way more steps (and missteps) than me finding an existing project on GitHub and cloning that.
I had Claude go into a loop because I have cat aliased as bat
It wanted to check a config json file, noticed that it had missing commas between items (because bat prettifies the json) and went into a forever loop of changing the json to add the commas (that were already there) and checking the result by 'cat'ing the file (but actually with bat) and again finding out they weren't there. GOTO 10
The actual issue was that Claude had left two overlapping configuration parsing methods in the code. One with Viper (The correct one) and one 1000% idiotic string search system it decided to use instead of actually unmarshaling the JSON :)
I had to use pretty explicit language to get it stop fucking with the config file and look for the issue elsewhere. It did remember it, but forgot on the next task of course. I should've added the fact to the rule file.
(This was a vibe coding experiment, I was being purposefully obtuse about not understanding the code)
Why does it matter that you're doing the thinking? Isn't that good news? What we're not doing any more is any the rote recitation that takes up most of the day when building stuff.
I think "AI as a dumb agent for speeding up code editing" is kind of a different angle and not the one I wrote the article to address.
But, if it's editing that's taking most of your time, what part of your workflow are you spending the most time in? If you're typing at 60WPM for an hour then that's over 300 lines of code in an hour without any copy and paste which is pretty solid output if it's all correct.
But that’s just it: 300 good lines of reasonably complex working code in an hour vs o4-mini can churn out 600 lines of perfectly compilable code in less than 2 minutes, including the time it takes me to assemble the context with a tool such as repomix (run locally) or pulling markdown docs with Jina Reader.
The reality is, we humans just moved one level up the chain. We will continue to move up until there isn’t anywhere for us to go.
Isn't that the bare minimum attribute of working code? If something is not compilable, it is WIP. The difficulty is having correct code, then efficiently enough code.
Which is why you dictate series of tests for the LLM to generate, and then it generates way more test coverage than you ordinarily would have. Give it a year, and LLMs will be doing test coverage and property testing in closed-loop configurations. I don't think this is a winnable argument!
Certainly, most of the "interesting" decisions are likely to stay human! And it may never be reasonable to just take LLM vomit and merge it into `main` without reviewing it carefully. But this idea people have that LLM code is all terrible --- no, it very clearly is not. It's boring, but that's not the same thing as bad; in fact, it's often a good thing.
Don't get hung up on the word "coverage". We all know test coverage isn't a great metric.
I just used IntelliJ AI to generate loads of tests for some old code I couldn't be bothered to finish.
It wrote tests I wouldn't have written even if I could be bothered. So the "coverage" was certainly better. But more to the point, these were good tests that dealt with some edge cases that were nice to have.
You don't do it for bugs, you do it for features in this case.
Contrived example: You want a program that prints out the weather for the given area.
First you write the tests (using AI if you want) that test for the output you want.
Then you tell the AI to implement the code that will pass the tests and explicitly tell it NOT to fuck with the tests (as Claude 3.7 specifically will do happily, it'll mock the tests so far it's not touching a line of the actual code to be tested...)
With bugs you always write a test that confirms the exact case the bug caused so that it doesn't reappear. This way you'll slowly build a robust test suite. 1) find bug 2) write test for correct case 3) fix code until test passes
In lots of jobs, the person doing work is not the one selecting text or the JIRA ticket. There's lots of "this is what you're working on next" coding positions that are fully managed.
But even if we ignored those, this feels like goalpost moving. They're not selecting the text - ok, ask LLM what needs refactoring and why. They're not selecting the JIRA ticket with context? Ok, provide MCP to JIRA, git and comms and ask it to select a ticket, then iterate on context until it's solvable. Going with "but someone else does the step above" applies to almost everyone's job as well.
Yesterday I needed to import a 1GB CSV into ClickHouse. I copied the first 500 lines into Claude and asked it for a CREATE TABLE and CLI to import the file. Previous day I was running into a bug with some throw-away code so I pasted the error and code into Claude and it found the non-obvious mistake instantly. Week prior it saved me hours converting some early prototype code from React to Vue.
I do this probably half a dozen times a day, maybe more if I'm working on something unfamiliar. It saves at a minimum an hour a day by pointing me in the right direction - an answer I would have reached myself, but slower.
Over a month, a quarter, a year... this adds up. I don't need "big wins" from my LLM to feel happy and productive with the many little wins it's giving me today. And this is the worst it's ever going to be.
Out if interest, what kind of codebases are you able to get AI to do these things on. Everytime I have tried it with even simpler things than these it has failed spectacularly. Every example I see of people doing this kind of thing seems to be on some kind if web development so I have a hypothesis that AI might currently be much worse for the kinds of codebases I work on.
I currently work for a finance-related scaleup. So backend systems, with significant challenges related to domain complexity and scalability, but nothing super low level either.
It does take a bit to understand how to prompt in a way that the results are useful, can you share what you tried so far?
I have a codebase in Zig and it doesn't understand Zig at all.
I have another which is embedded C using zephyr RTOS. It doesn't understand zephyr at all and even if it could, it can't read the documentation for the different sensors nor can it plug in cables.
I have a tui project in rust using ratatui. The core of the project is dealing with binary files and the time it takes to explain to it how specific bits of data are organised in the file and then check it got everything perfectly correct (it never has) is more than the time to just write the code. I expect I could have more success on the actual TUI side of things but haven't tried too much since I am trying to learn rust with this project.
I just started an android app with flutter/dart. I get the feeling it will work well for this but I am yet to verify since I need to learn enough flutter to be able to judge it
My dayjob is a big C++ codebase making a GUI app with Qt. The core of it is all dealing with USB devices and Bluetooth protocols which it doesn't understand at all. We also have lots of very complicated C++ data structures, I had hoped that the AI would be able to at least explain them to me but it just makes stuff up everytime. This also means that getting it to edit any part of the codebase touching this sort if thing doesn't work. It just rips up any thread safety or allocates memory incorrectly etc. It also doesn't understand the compiler errors at all, I had a circular dependency and tried to get it to solve it but I had to give so many clues I basically told it what the problem was.
I really expected it to work very well for the Qt interface since building UI is what everyone seems to be doing with it. But the amount of hand holding it requires is insane. Each prompt feels like a monkey's paw. In every experiment I've done it would have been faster to just write it myself. I need to try getting it to write an entirely new pice of UI from scratch since I've only been editing existing UI so far.
Some of this is clearly a skill issue since I do feel myself getting better at prompting it and getting better results. However, I really do get the feeling that it either doesn't work or doesn't work as well on my code bases as other ones.
> I have a codebase in Zig and it doesn't understand Zig at all.
> I have another which is embedded C using zephyr RTOS. It doesn't understand zephyr at all and even if it could, it can't read the documentation for the different sensors nor can it plug in cables.
If you use Cursor, you can let it index the documentation for whatever language or framework you want [0], and it works exceptionally well. Don't rely solely on the LLM's training data, allow it to use external resources. I've done that and it solves many of the issues you're talking about.
The Cursor docs indexing works very well and it’s probably the biggest thing missing from Windsurf. The other key is to stop the response when you see something going wrong and go back to your first message to add more context, like adding docs or links to library source files (a url to Github just fine) or attaching more files with types and annotations. Restarting your request with more context works better than asking it to fix things because the wrong code will pollute the probability space of future responses.
I suppose saying that I've only seen it in web development is a bit of an exaggeration. It would be more accurate to say that I haven't seen any examples of people using AI on a codebase that looks like on of the ones I work on. Clearly I am biased just lump all the types of coding I'm not interested in into "web development"
It's not about backend or frontend, it's mostly about languages. If you're using niche stuff or languages that don't have an online or digital presence, the LLMs will be confused. Stuff like assembler or low-level C aren't really in their vocabulary.
C#, Go, Python for example work perfectly well and you can kinda test the LLMs preference by asking them to write a program to solve a problem, but don't specify the language.
That’s my experience too. It also fails terribly with ElasticSearch probably because the documentation doesn’t have a lot of examples. ChatGPT, copilot and claude were all useless for that and gave completely plausible nonsense.
I’ve used it with most success for writing unit tests and definitely shell scripts.
Agreed. It isn’t like crypto where the proponents proclaimed some use case that would prove value always on the verge of arriving. AI is useful right now. People are using these tools now and enjoying them.
> Observer bias is the tendency of observers to not see what is there, but instead to see what they expect or want to see.
Unfortunately, people enjoying a thing and thinking that it works well doesn't actually mean much on its own.
But, more than that I suspect that AI is making more people realize that they don't need to write everything themselves, but they never needed to to begin with, and they'd be better off to do the code reuse thing in a different way.
I'm not sure that's a convincing argument given that crypto heads haven't just been enthusiastically chatting about the possibilities in the abstract. They do an awful lot of that, see Web3, but they have been using crypto.
Even in 2012 bitcoin could very concretely be used to order drugs. Many people have used it to transact and preserve value in hostile economic environments. Etc etc. Ridiculous comment.
Personally i have still yet to find LLMs useful at all with programming.
I don't (use AI tools), I've tried them and found that they got in the way, made things more confusing, and did not get me to a point where the thing I was trying to create was working (let alone working well/safe to send to prod)
I am /hoping/ that AI will improve, to the point that I can use it like Google or Wikipedia (that is, have some trust in what's being produced)
I don't actually know anyone using AI right now. I know one person on Bluesky has found it helpful for prototyping things (and I'm kind of jealous of him because he's found how to get AI to "work" for him).
Oh, I've also seen people pasting AI results into serious discussions to try and prove the experts wrong, but only to discover that the AI has produced flawed responses.
Essentially the same for me, I had one incident where someone was arguing in favor of it and then immediately embarrassed themselves badly because they were misled by a chatgpt error. I have the feeling that this hype will collapse as this happens more and people see how bad the consequences are when there are errors
If AI gives a bad experience 20% of the time, and if there are 10M programmers using it, then about 3000 of them will have a bad experience 5 times in a row. You can't really blame them for giving up and writing about it.
It’s all good to me - let these folks stay in the simple times while you and i arbitrage our efforts against theirs? I agree, there’s massive value in using these tools and it’s hilarious to me when others don’t see it. My reaction isn’t going to be convince them they’re wrong, it’s just to find ways to use it to get ahead while leaving them behind.
I need some information/advice -> I feed that into an imprecise aggregator/generator of some kind -> I apply my engineering judgement to evaluate the result and save time by reusing someone's existing work
This _is_ something that you can do with AI, but it's something that a search engine is better suited to because the search engine provides context that helps you do the evaluation, and it doesn't smash up results in weird and unpredictable ways.
Y'all think that AI is "thinking" because it's right sometimes, but it ain't thinking.
If I search for "refactor <something> to <something else>" and I get good results, that doesn't make the search engine capable of abstract thought.
AI alone can't replace a search engine very well at all.
AI with access to a search engine may be present a more useful solution to some problems than a bare search engine, but the AI isn't replacing a search engine it is using one.
The "Deep Research" modes in web-based LLMs are quite useful. They can take a days worth of reading forums and social media sites and compress into about 10 minutes.
For example I found a perfect 4k120Hz capable HDMI switch by using ChatGPTs research mode. It did suggest the generic Chinese random-named ones off Amazon, but there was one brand with an actual website and a history - based in Germany.
I hadn't seen it recommended anywhere on my attempts, but did find it by searching for the brand specifically and found only good reviews. Bought it, love it.
This seems like a great example of someone reasoning from first principles that X is impossible, while someone else doing some simple experiments with an open mind can easily see that X is both possible and easily demonstrated to be so.
Y'all think that AI is "thinking" because it's right sometimes, but it ain't thinking.
I know the principles of how LLMs work, I know the difference between anthropomorphizing them and not. It's not complicated. And yet I still find them wildly useful.
YMMV, but it's just lazy to declare that anyone who sees it differently than you just doesn't understand how LLMs work.
Anyway, I could care less if others avoid coding with LLMs, I'll just keep getting shit done.
weird metaphor, because a gym goer practices what they are doing by putting in the reps in order to increase personal capacity. it's more like you're laughing at people at the gym, saying "don't you know we have forklifts already lifting much more?"
That’s a completely different argument, however, and a good one to have.
I can buy “if you use the forklift you’ll eventually lose the ability to lift weight by yourself”, but the author is going for “the forklift is actually not able to lift anything” which can trivially be proven wrong.
More like, "We had a nice forklift, but the boss got rid of it replaced it with a pack of rabid sled dogs which work sometimes? And sometimes they can also sniff out expiration dates on the food (although the boxes were already labeled?). And, I'm pretty sure one of them, George, understands me when I talk to him because the other day I asked him if he wanted a hotdog and he barked (of course, I was holding a hotdog at the time). But, anyway, we're using the dogs, so they must work? And I used to have to drive the forklift, but the dogs just do stuff without me needing to drive that old forklift"
I see it as almost the opposite. It’s like the pulley has been invented but some people refuse to acknowledge its usefulness and make claims that you’re weaker if you use it. But you can grow quite strong working a pulley all day.
"If you want to be good at lifting, just buy an exoskeleton like me and all my bros have. Never mind that your muscles will atrophy and you'll often get somersaulted down a flight of stairs while the exoskeleton makers all keep trying, and failing, to contain the exoskeleton propensity for tossing people down flights of stairs."
It's the barstool economist argument style, on long-expired loan from medieval theology. Responding to clear empirical evidence that X occurs: "X can't happen because [insert 'rational' theory recapitulation]"
There are people at the gym I go to benching 95 lbs and asking what does it take to get to 135, or 225? The answer is "lift more weight" not "have someone help you lift more weight"
If you already know how to code, yes AI/LLMs can speed you along at certain tasks, though be careful you don't let your skills atrophy. If you can bench 225 and then you stop doing it, you soon will not be able to do that anymore.
> If you already know how to code, yes AI/LLMs can speed you along at certain tasks, though be careful you don't let your skills atrophy.
This isn't a concern. Ice-cutting skills no longer have value, and cursive writing is mostly a 20th century memory. Not only have I let my assembly language skills atrophy, but I'll happily bid farewell to all of my useless CS-related skills. In 10 years, if "app developer" still involves manual coding by then, we'll talk about coding without an AI partner like we talk about coding with punch cards.
Maybe. I've seen a lot of "in 10 years..." predictions come and go and I'm still writing code pretty much the same way I did 40 years ago: in a terminal, in a text editor.
I don’t think the argument from such a simple example does much for the authors point.
The bigger risk is skill atrophy.
Proponents say, it doesn’t matter. We shouldn’t have to care about memory allocation or dependencies. The AI system will eventually have all of the information it needs. We just have to tell it what we want.
However, knowing what you want requires knowledge about the subject. If you’re not a security engineer you might not know what funny machines are. If someone finds an exploit using them you’ll have no idea what to ask for.
AI may be useful for some but at the end of the day, knowledge is useful.
I don't know. Cursor is decent at refactoring. ("Look at x and ____ so that it ____." With some level of elaboration, where the change is code or code organization centric.)
And it's okay at basic generation - "write a map or hash table wrapper where the input is a TZDB zone and the output is ______" will create something reasonable and get some of the TZDB zones wrong.
But it hasn't been that great for me at really extensive conceptual coding so far. Though maybe I'm bad at prompting.
Might be there's something I'm missing w/ my prompts.
For me, the hard part of programming is figuring out what I want to do. Sometimes talking with an AI helps with that, but with bugs like “1 out of 100 times a user follows this path, a message gets lost somewhere in our pipeline”, which are the type of bugs that really require knowledge and skill, AIs are completely worthless.
There really is a category of these posts that are coming from some alternate dimension (or maybe we're in the alternate dimension and they're in the real one?) where this isn't one of the most important things ever to happen to software development. I'm a person who didn't even use autocomplete (I use LSPs almost entirely for cross-referencing --- oh wait that's another thing I'm apparently never going to need to do again because of LLMs), a sincere tooling skeptic. I do not understand how people expect to write convincingly that tools that reliably turn slapdash prose into median-grade idiomatic working code "provide little value".
> I do not understand how people expect to write convincingly that tools that reliably turn slapdash prose into median-grade idiomatic working code "provide little value".
Honestly, I'm curious why your experience is so different from mine. Approximately 50% of the time for me, LLMs hallucinate APIs, which is deeply frustrating and sometimes costs me more time than it would have taken to just look up the API. I still use them regularly, and the net value they've imparted has been overall greater than zero, but in general, my experience has been decidedly mixed.
It might be simply that my code tends to be in specialized areas in which the LLM has little training data. Still, I get regular frustrating API hallucinations even in areas you'd think would be perfect use cases, like writing Blender plugins, where the documentation is poor (so the LLM has a relatively higher advantage over reading the documentation) and examples are plentiful.
Edit: Specifically, the frustrating pattern is: (1) the LLM produces some code that contains hallucinated APIs; (2) in order to test (or even compile) that code, I need to write some extra supporting code to integrate it into my project; (3) I discover that the APIs were hallucinated because the code doesn't work; (4) now I not only have to rewrite the LLM's code, but I also have to rewrite all the supporting code I wrote, because it was based around a pattern that didn't work. Overall, this adds up to more time than if I had just written the code from scratch.
You're writing Rust, right? That's probably the answer.
The sibling comment is right though: it matters hugely how you use the tools. There's a bunch of tricks that help and they're all kind of folkloric. And then you hear "vibe coding" stories of people who generate their whole app from a prompt, looking only at the outputs; I might generate almost my whole project from an LLM, but I'm reading every line of code it spits out and nitpicking it.
"Hallucination" is a particularly uninteresting problem. Modern LLM coding environments are closed-loop ("agentic", barf). When an LLM "hallucinates" (ie: is wrong, like I am many times a day) about something, it figures it out pretty quick when it tries to build and run it!
I haven’t had much of a problem writing Rust code with Cursor but I’ve got dozens of crates docs, the Rust book, and Rustinomicon indexed in Cursor so whenever I have it touch a piece of code, I @-include all of the relevant docs. If a library has a separate docs site with tutorials and guides, I’ll usually index those too (like the cxx book for binding C++ code).
I also monitor the output as it is generated because Rust Analyzer and/or cargo check have gotten much faster and I find out about hallucinations early on. At that point I cancel the generation and update the original message (not send it a new one) with an updated context, usually by @-ing another doc or web page or adding an explicit instruction to do or not to do something.
One of the frustrating things about talking about this is that the discussion often sounds like we're all talking about the same thing when we talk about "AI".
We're not.
Not only does it matter what language you code in, but the model you use and the context you give it also matter tremendously.
I'm a huge fan of AI-assisted coding, it's probably writing 80-90% of my code at this point, but I've had all the same experiences that you have, and still do sometimes. There's a steep learning curve to leveraging AIs effectively, and I think a lot of programmers stop before they get far enough along on that curve to see the magic.
For example, right now I'm coding with Cursor and I'm alternating between Claude 3.7 max, Gemini 2.5 pro max, and o3. They all have their strengths and weaknesses, and all cost for usage above the monthly subscription. I'm spending like $10 per day on these models at the moment. I could just use the models included with the subscription, but they tend to hallucinate more, or take odd steps around debugging, etc.
I've also got a bunch of documents and rules setup for Cursor to guide it in terms of what kinds of context to include for the model. And on top of that, there are things I'm learning about what works best in terms of how to phrase my requests, what to emphasize or tell the model NOT to do, etc.
Currently I usually start by laying out as much detail about the problem as I can, pointing to relevant files or little snippets of other code, linking to docs, etc, and asking it to devise a plan for accomplishing the task, but not to write any code. We'll go back and forth on the plan, then I'll have it implement test coverage if it makes sense, then run the tests and iterate on the implementation until they're green.
It's not perfect, I have to stop it and backup often, sometimes I have to dig into docs and get more details that I can hand off to shape the implementation better, etc. I've cursed in frustration at whatever model I'm using more than once.
But overall, it helps me write better code, faster. I never could have built what I've built over the last year without AI. Never.
> tools that reliably turn slapdash prose into median-grade idiomatic working code
This may be the crux of it.
Turning slapdash prose into median-grade code is not a problem I can imagine needing to solve.
I think I'm better at describing code in code than I am in prose.
I Want to Believe. And I certainly don't want to be "that guy", but my honest assessment of LLMs for coding so far is that they are a frustrating Junior, who maybe I should help out because mentoring might be part of my job, but from whom I should not expect any near-term technical contribution.
Sorry, are you saying "the only place where there's slapdash prose is right before it would be super cool to have an alpha version of the code magically appear, that we can iterate on based on the full context of the team, company, and industry"?
Well said. It's not that there would not be much to seriously think about and discuss – so much is changing, so quickly – but the stuff that a lot of these articles focus is a strange exercise in denial.
> “It won’t deal with abstractions” -> try asking cursor for potential refactors or patterns that could be useful for a given text.
That is not what abstraction is about. Abstraction is having a simpler model to reason about, not simply code rearranging.
> “It doesn’t understand things beyond the code” -> try giving them an abstract jira ticket or asking what it things about certain naming, with enough context
Again, that is still pretty much coding. What matters is the overall design (or at least the current module).
> “Reading code and understanding whether it’s wrong will take more time than writing it yourself” -> ask any engineer that saves time with everything from test scaffolding to run-and-forget scripts.
Imagine having a script and not checking the man pages for expected behavior. I hope the backup games are strong.
I have not had the same experience as the author. The code I have my tools write is not long. I write a little bit at a time, and I know what I expect it to generate before it generates it. If what it generates isn't what I expect, that's a good hint to me that I haven't been descriptive enough with my comments or naming or method signatures.
I use Cursor not because I want it to think for me, but because I can only type so fast. I get out of it exactly the amount of value that I expect to get out of it. I can tell it to go through a file and perform a purely mechanical reformatting (like converting camel case to snake case) and it's faster to review the results than it is for me to try some clever regexp and screw it up five or six times.
And quite honestly, for me that's the dream. Reducing the friction of human-machine interaction is exactly the goal of designing good tools. If there was no meaningful value to be had from being able to get my ideas into the machine faster, nobody would buy fancy keyboards or (non-accessibility) dictation software.
I'm like 80% sure people complaining about AI doing a shit job are just plain holding it wrong.
The LLM doesn't magically know stuff you don't tell it. They CAN kinda-sorta fetch new information by reading the code or via MCP, but you still need to have a set of rules and documentation in place so that you don't spend half your credits on the LLM figuring out how to do something in your project.
I was wanting to build a wire routing for a string of lights on a panel. Looked up TSP, and learned of Christofides herustic. Asked Claude to implement Christofides. Went on to do stuff I enjoy more than mapping Wikipedia pseudo code to runnable code.
Sure, it would be really bad if everyone just assumes that the current state of the art is the best it will ever be, so we stop using our brains. The thing is, I'm very unlikely to come up with a better approximation to TSP, so I might as well use my limited brain power to focus on domains where I do have a chance to make a difference.
This is exactly the way I succeed. I ask it to do little bits at a time. I think that people have issues when they point the tools at a large code base and say "make it better". That's not the current sweet spot of the tools. Getting rid of boiler plate has been a game changer for me.
I think my currently average code writing speed is 1 keyword per hour or something as nearly all my time coding is spent either reading the doc (to check my assumptions) or copy-pasting another block of code I have. The very short bust of writing code like I would write prose is done so rarely I don't even bother remembering them.
I've never written boilerplate. I copy them from old projects (the first time was not boilerplate, it was learning the technology) or other files, and do some fast editing (vim is great for this).
On Day-0, AI is great but by Day-50 there's preferences and nuance that aren't captured through textual evidence. The productivity gains mostly vanish.
Ultimately AI coding efficacy is an HCI relationship and you need different relationships (workflows) at different points in time.
That's why, currently, as time progresses you use AI less and less on any feature and fall back to human. Your workflow isn't flexible enough.
So the real problem isn't the Day-0 solution, it's solving the HCI workflow problem to get productivity gains at Day-50.
Smarter AI isn't going to solve this. Large enough code becomes internally contradictory, documentation becomes dated, tickets become invalid, design docs are based on older conceptions. Devin, plandex, aider, goose, claude desktop, openai codex, these are all Day-0 relationships. The best might be a Day-10 solution, but none are Day-50.
Day-50 productivity is ultimately a user-interface problem - a relationship negotiation and a fundamentally dynamic relationship. The future world of GPT-5 and Sonnet-4 still won't read your thoughts.
You pinpoint a truly important thing, even though I cannot put words onto it, I think that getting lost with AI coding assistants is far worse than getting lost as a programmer. It is like doing vanilla code or trying to make a framework suit your needs.
AI coding assistants provide 90% of the time more value than the good old google search. Nothing more, nothing less. But I don't use AI to code for me, I just use it to optimize very small fractions (ie: methods/functions at most).
> The future world of GPT-5 and Sonnet-4 still won't read your thoughts.
Chills ahead. For sure, it will happen some day. And there won't be any reason to not embrace it (although I am, for now, absolutely reluctant to such idea).
It's why these no-code/vibe-code solutions like bolt, lovable, and replit are great at hackathons, demos, or basic front-ends but there's a giant cliff past there.
There's this utility threshold due to a 1967 observation by Melvin Conway:
> [O]rganizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.
> It's why these no-code/vibe-code solutions like bolt, lovable, and replit are great at hackathons, demos, or basic front-ends but there's a giant cliff past there.
Back in the day, basically every "getting started in Ruby on Rails" tutorial involved making a Twitter-like thing. This seemed kind of magic at the time. Now, did Rails ultimately end up fundamentally end up totally changing the face of webdev, allowing anyone to make Twitter in an afternoon? Well, ah, no, but it made for a good tech demo.
4 lines of JS. A screenful of “reasoning”. Not much I can agree with.
Meanwhile I just asked Gemini in VS Code Agent Mode to build an HTTP-like router using a trie and then refactor it as a Python decorator, and other than a somewhat dumb corner case it failed at, it generated a pretty useful piece of code that saved me a couple of hours (I had actually done this before a few years ago, so I knew exactly what I wanted).
Replace programmers? No. Well, except front-end (that kind of code is just too formulaic, transactional and often boring to do), and my experiments with React and Vue were pretty much “just add CSS”.
Add value? Heck yes - although I am still very wary of letting LLM-written code into production without a thorough review.
Not even front end, unless it literally is a dumb thin wrapper around a back end. If you are processing anything on that front end, AI is likely to fall flat as quickly as it would on the backend.
My own experience writing a web-based, SVG-based 3D modeler. No traditional back end, but when working on the underlying 3D engine it shits the bed from all the broken assumptions and uncommon conventions used there. And in the UI, the case I have in mind involved pointer capture and event handling, it chases down phantoms declaring it's working around behavior that isn't in the spec. I bring it the spec, I bring it minimal examples producing the desired behavior, and it still can't produce working code. It still tries to critique existing details that aren't part of the problem, as evidenced by the fact it took me 5 minutes to debug and fix myself when I got tired of pruning context. At one point it highlighted a line of code and suggested the problem could be a particular function getting called after that line. That function was called 10 lines above the highlighted line, in a section it re-output in a quote block.
So yes, it's bad for front end work too if your front end isn't just shoveling data into your back end.
AI's fine for well-trodden roads. It's awful if you're beating your own path, and especially bad at treading a new path just alongside a superhighway in the training data.
it built the meat of the code, you spent 5 minutes fixing the more complex and esoteric issues. is this not the desired situation? you saved time, but your skillset remained viable
> AI's fine for well-trodden roads. It's awful if you're beating your own path, and especially bad at treading a new path just alongside a superhighway in the training data.
I very much agree with this, although I think that it can be ameliorated significantly with clever prompting
I sincerely wish that had been the case. No, I built the meat of the code. The most common benefit is helping to reducing repetitive typing, letting me skip writing 12 minor variations of `x1 = sin(r1) - cos(r2)`.
Similar to that, in this project it's been handy translating whole mathematical formulas to actual code processes. But when it comes out of that very narrow box it makes an absolute mess of things that almost always ends in a net waste of time. I roped it into that pointer capture issue earlier because it's an unfamiliar API to me, and apparently for it, too, because it hallucinated some fine wild goose chases for me.
wrt unfamiliar APIs, I don't know if this would have worked in your case (perhaps not) but I find that most modern LLMs are very comfortable with simply reading and using the docs or sample code here and now if you pass them the link or a copy paste or html containing the relevant info
> that kind of code is just too formulaic, transactional and often boring to do
No offense, but that sounds like every programmer that hasn't done front-end development to me. Maybe for some class of front-ends (the same stuff that Ruby on Rails could generate), but past that things tend to get not boring real fast.
This is a funny opinion, because tools like Claude Code and Aider let the programmer spend more of their time thinking. The more time I spend diddling the keyboard, the less time I have available to be thinking about the high-level concerns.
If I can just thinking "Implement a web-delivered app that runs in the browser and uses local storage to store state, and then presents a form for this questionnaire, another page that lists results, and another page that graphs the results of the responses over time", and that's ALL I have to think about, I now have time to think about all sorts of other problems.
That's literally all I had to do recently. I have chronic sinusitis, and wanted to start tracking a number of metrics from day to day, using the nicely named "SNOT-22" (Sino-Nasal Outcome Test, I'm not kidding here). In literally 5 minutes I had a tool I could use to track my symptoms from day to day. https://snot-22.linsomniac.com/
I asked a few follow-ups ("make it prettier", "let me delete entries in the history", "remember the graph settings"). I'm not a front-end guy at all, but I've been programming for 40 years.
I love the craft of programming, but I also love having an idea take shape. I'm 5-7 years from retirement (knock on wood), and I'm going to spend as much time thinking, and as little time typing in code, as possible.
I think that's the difference between "software engineer" and "programmer". ;-)
A programmer's JOB is not to think. It's to deliver value to their employer or customers. That's why programmers get paid. Yes, thinking hard about how to deliver that value with software is important, but when it comes to a job, it's not the thought that counts; it's the results.
So if I, with AI augmentation, can deliver the same value as a colleague with 20% less thought and 80% less time, guess whose job is more secure?
I know, I know, AI tools aren't on par with skilled human programmers (yet), but a skilled human programmer who uses AI tools effectively to augment (not entirely replace) their efforts can create value faster while still maintaining quality.
The value is in working and shipped features. This value increase when there's no technical debt dragging it down. Do the 20% less thought and 80% less time still hold?
I haven't been using AI for coding assistance. I use it like someone I can spin around in my chair, and ask for any ideas.
Like some knucklehead sitting behind me, sometimes, it has given me good ideas. Other times ... not so much.
I have to carefully consider the advice and code that I get. Sometimes, it works, but it does not work well. I don't think that I've ever used suggested code verbatim. I always need to modify it; sometimes, heavily.
It seems like the traditional way to develop good judgement is by getting experience with hands-on coding. If that is all automated, how will people get the experience to have good judgement? Will fewer people get the experiences necessary to have good judgement?
Compilers, for the most part, made it unnecessary for programmers to check the assembly code. There are still compiler programmers that do need to deal with that, but most programmers get to benefit from just trusting that the compilers, and by extension the compiler programmers, are doing a good job
We are in a transition period now. But eventually, most programmers will probably just get to trust the AIs and the code they generate, maybe do some debugging here and there at the most. Essentially AIs are becoming the English -> Code compilers
In my experience, compilers are far more predictable and consistent than LLMs, making them suitable for their purpose in important ways that LLMs are not.
I honestly think people are so massively panicking over nothing with AI. even wrt graphic design, which I think people are most worried about, the main, central skill of a graphic designer is not the actual graft of sitting down and drawing the design, it's having the taste and skill and knowledge to make design choices that are worthwhile and useful and aesthetically pleasing. I can fart around all day on Stable Diffusion or telling an LLM to design a website, but I don't know shit about UI/UX design or colour theory or simply what appeals to people visually, and I doubt an AI can teach me it to any real degree.
yes there are now likely going to be less billable hours and perhaps less joy in the work, but at the same time I suspect that managers who decide they can forgo graphic designers and just get programmers to do it are going to lose a competitive advantage
I disagree. It's all about how you're using them. AI coding assistants make it easy to translate thought to code. So much boilerplate can be given to the assistant to write out while you focus on system design, architecture, etc, and then just guide the AI system to generate the code for you.
Call it AI, ML, Data Mining, it does not matter. Truth is these tools have been disrupting the SWE market and will continue to do so. People working with it will simply be more effective. Until even them are obsolete. Don't hate the player, hate the game.
This is becoming even more of a consensus now as in it feels like the tech is somewhat already there, or just about to come out.
As a software professional what makes it more interesting is that the "trick" (reasoning RL in models) that unlocked disruption of the software industry isn't really translating to other knowledge work professions. The disruption of AI is uneven. I'm not seeing in my circles other engineers (e.g. EE's, Construction/Civil, etc), lawyers, finance professionals, anything else get disrupted as significantly as software development.
The respect of the profession has significantly gone down as well. From "wow you do that! that's pretty cool..." to "even my X standard job has a future; what are you planning to do instead?" within a 3 year period. I'm not even in SV, NY or any major tech hubs.
Software ate the world, it's time for AI to eat the software :)
Anything methodical is exactly what the current gen AI can do. Its phenomenal in translations, be it human language to human language or an algorithm description into computer language.
People like to make fun with the "vibe coding" but that's actually a purification process where humans are getting rid of the toolset that we used to master to be able to make the computer do what we tell it to do.
Most of todays AI developer tools are misguided because they are trying to orchestrate tools that were created to help people write and manage software.
IMHO the next-gen tools will write code that is not intended for human consumption. All the frameworks, version management, coding paradigms etc will be relics of the past. Curiosities for people who are fascinated for that kind of things, not production material.
I am very sceptical and cautious user of AI tools, but this sounds like someone who didn't figure out a workflow which works for himself:
> Nothing indicates how this should be run.
That's why I usually ask it to write a well defined function or class, with type annotations and all that. I already know how to call it.
Also you can ask for calling examples.
> ... are not functions whose definitions are available within the script. Without external context, we don't know what they do.
Are already solved by having proper IDE or LSP.
> run in E environments with V versions
Fair enough, stick to "standard" libraries which don't change often. Use boring technology.
> The handler implicitly ignores arguments
Because you probably didn't specify how arguments are to be handled.
In general, AI is very helpful to reduce tedium in writing common pieces of logic.
In ideal world, programming languages and libraries are as expressive as natural language, and we don't need AI. We can marshal our thoughts into code as fast as we marshal it into english, and as succinctly.
But until that happens "AI" helps with tedious logic and looking up information. You will still have to confirm the code, so being at least a bit familiar with the stack is a good thing.
I think there's some truth here in that AI can be used as a band-aid to sweep issues of bad abstractions or terse syntax under the rug.
For example, I often find myself reaching for Cursor/ChatGPT to help me with simple things in bash scripts (like argument parsing, looping through arrays, associative maps, handling spaces in inputs) because the syntax just isn't intuitive to me. But I can easily do these things in Python without asking an AI.
I'm not a web developer but I imagine issues of boilerplate or awkward syntax could be solved with more "thinking" instead of using the AI as a better abstraction to the bad abstractions in your codebase.
In the past I've worked at startups that hired way too many bright junior developers and at companies that insisted on only hiring senior developers. The arguments for/against AI coding assistants feel very reminiscent of the arguments that occur around what seniority balance we want on an engineering team. In my experience it's a matter of balancing between doing complex work yourself and handing off simple work.
If AI coding assistants provide little value then why is Cursor IDE a 300m company and why does this study say it makes people more 37% more productive?
I no longer need to worry about a massive amount of annoying, but largely meaningless implementation details. I don’t need to pick a random variable/method/class name out of thin air. I don’t need to plan ahead on how to DRY up a method. I don’t need to consider every single edge case up front.
Sure, I still need to tweak and correct things but we’re talking about paint by number instead of starting with a blank canvas. It’s such a massive reduction in mental load.
I also find it reductionist to say LLM don’t think because they’re simply predicting patterns. Predicting patterns is thinking. With the right context, there is little difference between complex pattern matching and actual thinking. Heck, a massive amount of my actual, professional software development work is figuring out how to pattern matching my idea into an existing code base. There’s a LOT of value in consistency.
>But AI doesn't think -- it predicts patterns in language.
We've moved well beyond that. The above sentence tells me you haven't used the tools recently. That's a useful way to picture what's happening, to remove the magic so you can temper your expectations.
The new tooling will "predict patterns" at a higher level, a planning level, then start "predicting patterns" in the form of strategy, etc... This all, when you start reading the output of "thinking" phases. They sound a lot likea conversation I'd have with a colleague about the problem, actually.
There's a percentage of developers, who due to fear/ego/whatever, are refusing to understand how to use AI tooling. I used to debate but I've started to realize that these arguments are mostly not coming from a rational place.
Title is a bit provocative and begs the question (is thinking the part being replaced?), but the bigger issue is what “little” means here. Little in absolute terms? I think that’s harsh. Little in relation to how it’s touted? That’s a rational conclusion, I think.
You need three things to use LLM based tools effectively: 1) an understanding of what the tool is good at and what it isn’t good at; 2) enough context and experience to input a well formulated query; and 3) the ability to carefully verify the output and discard it if necessary.
This is the same skillset we’ve been using with search engines for years, and we know that not everyone has the same degree of Google-fu. There’s a lot of subjectivity to the “value”.
It's nice that the author was kind enough to make his obviously wrong thesis righ in the title.
If you write code professionally, you're really doing yourself a disservice if you aren't evaluating and incorporating AI coding tools into your process.
If you've tried them before, try them again. The difference between Gemini 2.5 Pro and what came before is as different as between GPT 3.5 and 4.
If you're a hobbyist, do whatever you want: use a handsaw, type code in notepad, mill your own flour, etc.
I prefer a more nuanced take. If I can’t reliably delegate away a task, then it’s usually not worth delegating. The time to review the code needs to be less than the time it takes to write it myself. This is true for people and AI.
And there are now many tasks which I can confidently delegate away to AI, and that set of tasks is growing.
So I agree with the author for most of the programming tasks I can think of. But disagree for some.
If you're the 1% of Earth's population for which this is true, then this headline makes sense. If you're the 99% for which this isn't at all true, then don't bother reading this, because AI coding assistance will change your life.
"Writing code is easy" once you learn the tools and have thought through the design. But most people are skipping the latter two and complain about the first one.
It's like doing math proofs. It's easy when you know maths and have a theoretical solution. So, the first steps is always learning maths and think about a solution. Not jump head first into doing proofs.
On the contrary. Just yesterday, we've had here on HN one of the numerous reposts of "Notation as a tool of thought" by Ken Iverson, the creator of APL.
Does it do things wrong (compared to what I have in my mind?). Of course. But it helps to have code quicker on screen. Editing / rolling back feels faster than typing everything myself.
A programmers job is to provide value to the business. Thinking is certainly a part of the process, but not the job in itself.
I agree with the initial point he's making here - that code takes time to parse mentally, but that does not naturally lead to the conclusion that this _is_ the job.
As a SWE the comments on this page scare me if I'm being honest. If we can't define the value of a programmer vs an AI in a forum such as this then the obvious question is there to ask from an employer's perspective - in the world of AI is a programmer/SWE no longer worth employing/investing in long term? This equally applies to any jobs in tech where the job is "to do" vs "to own" (e.g. DevOps, Testing, etc etc)
Many defenders of AI tools in this thread are basically arguing against the end conclusion of the article which is that "to think" is no longer the moat it once was. I don't buy into the argument either that "people who know how to use AI tools" will somehow be safe - logically that's just a usability problem that has a lot of people seem to be interested in solving.
The impression I'm getting is that even the skill of "using/programming LLM's" is only a transitory skill and another form of cope from developers pro AI - if AI is smart enough you won't need to "know how to use it" - it will help you. That's what commoditization of intelligence is by definition - anything like "learning/intelligence/skills" is no longer required since the point is to artificially create this.
To a lay person reading this thread - in a few years (maybe two) there won't be a point of doing CS/SWE anymore.
And yet I keep meeting programmers who say AI coding assistants are saving them tons of time or helping them work through problems they otherwise wouldn't have been able to tackle. I count myself among that group at this point. Maybe that means I'm just not a very good programmer if I need the assistance, but I'd like to think my work speaks for itself at this point.
Some things where I've found AI coding assistants to be fantastic time savers:
- Searching a codebase with natural language
- Quickly groking the purpose of a function or file or module
- Rubber duck debugging some particularly tricky code
- Coming up with tests exercising functionality I hadn't yet considered
- Getting up to speed with popular libraries and APIs
It is _because_ a programmer's job is to think that AI Coding assistants may provide value. They would (and perhaps already do) complete the boiler plate, and perhaps help you access information faster. They also have detriments, may atrophy some of your capabilities, may tempt you to go down more simplistic paths etc., but still.
Reading the post as well: It didn't change my mind. As for what it actually says, my reaction is a shrug, "whatever".
It doesn't seem like the author has ever used AI to write code. You definitely can ask it to refactor. Both ChatGPT and Gemini have done excellent work for me on refactors, and they have also made mistakes. It seems like they are both quite good at making lengthy, high-quality suggestions about how to refactor code.
His argument about debugging is absolutely asinine. I use both GDB and Visual Studio at work. I hate Visual Studio except for the debugger. GDB is definitely better than nothing, but only just. I am way, way, way more productive debugging in Visual Studio.
Using a good debugger can absolutely help you understand the code better and faster. Sorry but that's true whether the author likes it or not.
I am the first to criticize LLMs and dumb AI hype. there is no nothing wrong with using an LSP, and a coding assistant is just an enhanced LSP if that is all you want it to be. my job is to solve problems, and AI can slightly speed that up.
It's more like an assistant that can help you write a class to do something. You could write on your own but feeling lazy. Sometimes it's good, other times it's idioticly bad.
Need to keep it in check and keep telling it what it needs to do because it has a tendency to dig holes it can't get out of. Breaking things up into
Smaller classes helps to a degree.
Now this isn't a killer argument, but your examples are about readability and safety, respectively - the quality of the result. LLMs seem to be more about shoveling the same or worse crap faster.
I have seen the results of other people. Code LLMs seem to do some annoying stuff more quickly than manually and are sometimes able to improve prose in comments and such. But they also mess up when it gets moderately difficult, especially when there are "long distance" connections between pieces.
That and the probably seductive (to some) ability to crank out working, but repetitive or partially nonsensical code, is what I call shoveling crap faster.
I dunno what to tell you, I am able to get consistently good quality work out of eg 3.7 sonnet and it’d saved me a ton of time. Garbage in garbage out, maybe the people you’ve observed don’t know how to write good prompts.
I guess I should play around with that one then. My general impression was that we're already in the diminishing returns part of the sigmoid curve (calendar time or coefficient array size vs quality) for LLMs, until there's maybe a change other than making them bigger.
Honestly, o3 has completely blown my mind in terms of ability to come up with useful abstractions beyond what I would normally build. Most people claiming LLMs are limited just arent using the tools enough, and cant see the trajectory of increasing ability
> Most people claiming LLMs are limited just rent using the tools enough
The old quote might apply:
~"XML is like violence. If it's not working for you, you need to use more of it".
(I think this is from Tim Bray -- it was certainly in his .signature for a while -- but oddly a quick web search doesn't give me anything authoritative. I asked Gemma3, which suggests Drew Conroy instead)
I'm sorry to say, but the author of this post doesn't appear to have much, if any experience with AI and sounds like he's just trying to justify not using it and pretend hes better without it.
seriously. if you want to say that AI will likely reduce wages and supply for certain more boilerplate jobs; or that they are comparatively much worse for the environment than normal coding; or that they are not particularly good once you get into something quite esoteric or complex; or that they've led certain companies to think that developing AGI is a good idea; or that they're mostly centralised into the hands of a few unpleasant actors; then any of those criticisms, and certainly others, are valid to me, but to say that they're not actually useful or provide little value? it's just nonsense or ragebait
Couldn't agree more. And I'm regards to some of the comments here, generating the text isn't the hard OR time consuming part of development, and that's even assuming the generated code was immediately trustworthy. Given that it isn't must be checked, it's really just not very valuable
I wish people would realize you can replace pretty much any LLM with GitHub code search. It's a far better way to get example code than anything I've used.
Which models have you tried to date? Can you come up with a top 3 ranking among popular models based on your definition of value?
What can be said about the ability of an LLM to translate your thinking represented in natural language to working code at rates exceeding 5-10x your typing speed?
Mark my words: Every single business that has a need for SWEs will obligate their SWEs to use AI coding assistants by the end of 2026, if not by the end of 2025. It will not be optional like it is today. Now is the time you should be exploring which models are better at "thinking" than others, and discerning which thinking you should be doing vs. which thinking you can leave up to ever-advancing LLMs.
I've had to yank tokens out of the mouths of too many thinking models stuck in loops of (internally, within their own chain of thought) rephrasing the same broken function over and over again, realizing each time that it doesn't meet constraints, and trying the same thing again. Meanwhile, I was sat staring at an opaque spinner wondering if it would have been easier to just write it myself. This was with Gemini 2.5 Pro for reference.
Drop me a message on New Year's Day 2027. I'm betting I'll still be using them optionally.
I've experienced gemini get stuck as you describe a handful of times. With that said, my predication is made on the observation that these tools are already force multipliers, and they're only getting better each passing quarter.
You'll of course be free to use them optionally in your free time and on personal projects. It won't be the case at your place of employment.
This reminds me of the story a few days ago about "what is your best prompt to stump LLMs", and many of the second level replies were links to current chat transcripts where the LLM handled the prompt without issue.
I think there are a couple of problems at play: 1) people who don't want the tools to have value, for various reasons, and have therefore decided the tools don't have value; 2) people who tried the tools six months or a year ago and had a bad experience and gave up; and 3) people who haven't figured out how to make good use of the tools to improve their productivity (this one seems to be heavily impacted by various grifters who overstate what the coding assistants can do, and people underestimating the effort they have to put in to get good at getting good output from the models.)
4) People that likes having reliable tools which frees them from "reviewing" the output of these tools to see if the tool didn't make an error.
Using AI is like driving a car that decides to turn even if you keep the steering wheel straight. Randomly. At various degree. If you like this because some times it let you turn in a curve without you having to steer, you do you. But some people do prefer having a car turn when and only when they turn the wheel.
That's covered under point #1. I'm not claiming these tools are perfect. Neither are most people, but from the standpoint of an employer, the question is going to be "does the tool, after accounting for errors, make my employees more or less productive?" A lot of people are seeing the answer to that - today - is the tools offer a productivity advantage.
Every single business that has a need for SWEs will obligate their SWEs to use AI coding assistants by the end of 2026, if not by the end of 2025.
If businesses mandated speed like that, then we’d all have been forced to use emacs decades ago. Businesses mandate correctness and AI doesn’t as clearly help to that end.
These articles keep popping up, analyzing an hypothetical usage of AI (and guessing it won’t be useful) as if it wasn’t something already being used in practice. It’s kinda weird to me.
“It won’t deal with abstractions” -> try asking cursor for potential refactors or patterns that could be useful for a given text.
“It doesn’t understand things beyond the code” -> try giving them an abstract jira ticket or asking what it things about certain naming, with enough context
“Reading code and understanding whether it’s wrong will take more time than writing it yourself” -> ask any engineer that saves time with everything from test scaffolding to run-and-forget scripts.
It’s as if I wrote an article today arguing that exercise won’t make you able to lift more weight - every gymgoer would raise an eyebrow, and it’s hard to imagine even the non-gymgoers would be sheltered enough to buy the argument either.
While I tend to agree with your premise that the linked article seems to be reasoning to the extreme off the basis of a very small code snippet, I think the core critique the author wants to make stands:
AI agents alone, unbounded, currently cannot provide huge value.
> try asking cursor for potential refactors or patterns that could be useful for a given text.
You, the developer, will be selecting this text.
> try giving them an abstract jira ticket or asking what it things about certain naming, with enough context
You still selected a JIRA ticket and provided context.
> ask any engineer that saves time with everything from test scaffolding to run-and-forget scripts.
Yes that is true, but again, what you are providing as a counterfactual are very bounded, aka easy contexts.
In any case, the industry (both the LLM providers as well as tooling builders and devs) is clearly going into the direction of constantly etching out small imoprovements by refining which context is deemed relevant for a given problem and most efficient ways to feed it to LLMs.
And let's not kid ourselves, Microsoft, OpenAI, hell Anthropic all have 2027-2029 plans where these things will be significantly more powerful.
Here's an experience I've had with Claude Code several times:
1. I'll tell Claude Code to fix a bug.
2. Claude Code will fail, and after a few rounds of explaining the error and asking it to try again, I'll conclude this issue is outside the AI's ability to handle, and resign myself to fixing it the old fashioned way.
3. I'll start actually looking into the bug on my own, and develop a slightly deeper understanding of the problem on a technical level. I still don't understand every layer to the point where I could easily code a solution.
4. I'll once again ask Claude Code to fix the bug, this time including the little bit I learned in #3. Claude Code succeeds in one round.
I'd thought I'd discovered a limit to what the AI could do, but just the smallest bit of digging was enough to un-stick the AI, and I still didn't have to actually write the code myself.
(Note that I'm not a professional programmer and all of this is happening on hobby projects.)
> I once again ask Claude Code to fix the bug, this time including the little bit I learned in #3. Claude Code fixes the problem in one round.
Context is king, which makes sense since LLM output is based on probability. The more context you can provide it, the more aligned the output will be. It's not like it magically learned something new. Depending on the problem, you may have to explain exactly what you want. If the problem is well understood, a sentence would most likely be suffice.
>If the problem is well understood, a sentence would most likely be suffice.
I feel this falls flat for the rather well-bounded use case I really want: a universal IDE that can set up my environment with a buildable/runnable boilerplate "hello world" for arbitrary project targets. I tried vibe coding an NES 6502 "hello world" program with Cursor and it took way more steps (and missteps) than me finding an existing project on GitHub and cloning that.
Absolutely! What surprises me is how rarely I actually have to get all the way down to writing the code myself.
I had Claude go into a loop because I have cat aliased as bat
It wanted to check a config json file, noticed that it had missing commas between items (because bat prettifies the json) and went into a forever loop of changing the json to add the commas (that were already there) and checking the result by 'cat'ing the file (but actually with bat) and again finding out they weren't there. GOTO 10
The actual issue was that Claude had left two overlapping configuration parsing methods in the code. One with Viper (The correct one) and one 1000% idiotic string search system it decided to use instead of actually unmarshaling the JSON :)
I had to use pretty explicit language to get it stop fucking with the config file and look for the issue elsewhere. It did remember it, but forgot on the next task of course. I should've added the fact to the rule file.
(This was a vibe coding experiment, I was being purposefully obtuse about not understanding the code)
Why does it matter that you're doing the thinking? Isn't that good news? What we're not doing any more is any the rote recitation that takes up most of the day when building stuff.
I think "AI as a dumb agent for speeding up code editing" is kind of a different angle and not the one I wrote the article to address.
But, if it's editing that's taking most of your time, what part of your workflow are you spending the most time in? If you're typing at 60WPM for an hour then that's over 300 lines of code in an hour without any copy and paste which is pretty solid output if it's all correct.
But that’s just it: 300 good lines of reasonably complex working code in an hour vs o4-mini can churn out 600 lines of perfectly compilable code in less than 2 minutes, including the time it takes me to assemble the context with a tool such as repomix (run locally) or pulling markdown docs with Jina Reader.
The reality is, we humans just moved one level up the chain. We will continue to move up until there isn’t anywhere for us to go.
> perfectly compilable code
Isn't that the bare minimum attribute of working code? If something is not compilable, it is WIP. The difficulty is having correct code, then efficiently enough code.
Which is why you dictate series of tests for the LLM to generate, and then it generates way more test coverage than you ordinarily would have. Give it a year, and LLMs will be doing test coverage and property testing in closed-loop configurations. I don't think this is a winnable argument!
Certainly, most of the "interesting" decisions are likely to stay human! And it may never be reasonable to just take LLM vomit and merge it into `main` without reviewing it carefully. But this idea people have that LLM code is all terrible --- no, it very clearly is not. It's boring, but that's not the same thing as bad; in fact, it's often a good thing.
> it generates way more test coverage than you ordinarily would have.
Test coverage is a useless metric. You can cover the code multiple time and not test the right values. Nor test the right behavior.
Don't get hung up on the word "coverage". We all know test coverage isn't a great metric.
I just used IntelliJ AI to generate loads of tests for some old code I couldn't be bothered to finish.
It wrote tests I wouldn't have written even if I could be bothered. So the "coverage" was certainly better. But more to the point, these were good tests that dealt with some edge cases that were nice to have.
You don't do it for bugs, you do it for features in this case.
Contrived example: You want a program that prints out the weather for the given area.
First you write the tests (using AI if you want) that test for the output you want.
Then you tell the AI to implement the code that will pass the tests and explicitly tell it NOT to fuck with the tests (as Claude 3.7 specifically will do happily, it'll mock the tests so far it's not touching a line of the actual code to be tested...)
With bugs you always write a test that confirms the exact case the bug caused so that it doesn't reappear. This way you'll slowly build a robust test suite. 1) find bug 2) write test for correct case 3) fix code until test passes
In lots of jobs, the person doing work is not the one selecting text or the JIRA ticket. There's lots of "this is what you're working on next" coding positions that are fully managed.
But even if we ignored those, this feels like goalpost moving. They're not selecting the text - ok, ask LLM what needs refactoring and why. They're not selecting the JIRA ticket with context? Ok, provide MCP to JIRA, git and comms and ask it to select a ticket, then iterate on context until it's solvable. Going with "but someone else does the step above" applies to almost everyone's job as well.
Not OP, but might be an autocorrection for "eking out"
I think maybe you have unrealistic expectations.
Yesterday I needed to import a 1GB CSV into ClickHouse. I copied the first 500 lines into Claude and asked it for a CREATE TABLE and CLI to import the file. Previous day I was running into a bug with some throw-away code so I pasted the error and code into Claude and it found the non-obvious mistake instantly. Week prior it saved me hours converting some early prototype code from React to Vue.
I do this probably half a dozen times a day, maybe more if I'm working on something unfamiliar. It saves at a minimum an hour a day by pointing me in the right direction - an answer I would have reached myself, but slower.
Over a month, a quarter, a year... this adds up. I don't need "big wins" from my LLM to feel happy and productive with the many little wins it's giving me today. And this is the worst it's ever going to be.
Out if interest, what kind of codebases are you able to get AI to do these things on. Everytime I have tried it with even simpler things than these it has failed spectacularly. Every example I see of people doing this kind of thing seems to be on some kind if web development so I have a hypothesis that AI might currently be much worse for the kinds of codebases I work on.
I currently work for a finance-related scaleup. So backend systems, with significant challenges related to domain complexity and scalability, but nothing super low level either.
It does take a bit to understand how to prompt in a way that the results are useful, can you share what you tried so far?
I have tried on a lot of different projects.
I have a codebase in Zig and it doesn't understand Zig at all.
I have another which is embedded C using zephyr RTOS. It doesn't understand zephyr at all and even if it could, it can't read the documentation for the different sensors nor can it plug in cables.
I have a tui project in rust using ratatui. The core of the project is dealing with binary files and the time it takes to explain to it how specific bits of data are organised in the file and then check it got everything perfectly correct (it never has) is more than the time to just write the code. I expect I could have more success on the actual TUI side of things but haven't tried too much since I am trying to learn rust with this project.
I just started an android app with flutter/dart. I get the feeling it will work well for this but I am yet to verify since I need to learn enough flutter to be able to judge it
My dayjob is a big C++ codebase making a GUI app with Qt. The core of it is all dealing with USB devices and Bluetooth protocols which it doesn't understand at all. We also have lots of very complicated C++ data structures, I had hoped that the AI would be able to at least explain them to me but it just makes stuff up everytime. This also means that getting it to edit any part of the codebase touching this sort if thing doesn't work. It just rips up any thread safety or allocates memory incorrectly etc. It also doesn't understand the compiler errors at all, I had a circular dependency and tried to get it to solve it but I had to give so many clues I basically told it what the problem was.
I really expected it to work very well for the Qt interface since building UI is what everyone seems to be doing with it. But the amount of hand holding it requires is insane. Each prompt feels like a monkey's paw. In every experiment I've done it would have been faster to just write it myself. I need to try getting it to write an entirely new pice of UI from scratch since I've only been editing existing UI so far.
Some of this is clearly a skill issue since I do feel myself getting better at prompting it and getting better results. However, I really do get the feeling that it either doesn't work or doesn't work as well on my code bases as other ones.
> I have a codebase in Zig and it doesn't understand Zig at all.
> I have another which is embedded C using zephyr RTOS. It doesn't understand zephyr at all and even if it could, it can't read the documentation for the different sensors nor can it plug in cables.
If you use Cursor, you can let it index the documentation for whatever language or framework you want [0], and it works exceptionally well. Don't rely solely on the LLM's training data, allow it to use external resources. I've done that and it solves many of the issues you're talking about.
[0] https://docs.cursor.com/context/@-symbols/@-docs
The Cursor docs indexing works very well and it’s probably the biggest thing missing from Windsurf. The other key is to stop the response when you see something going wrong and go back to your first message to add more context, like adding docs or links to library source files (a url to Github just fine) or attaching more files with types and annotations. Restarting your request with more context works better than asking it to fix things because the wrong code will pollute the probability space of future responses.
I work in Python, Swift, and Objective-C. AI tools work great in all of these environment. It's not just limited to web development.
I suppose saying that I've only seen it in web development is a bit of an exaggeration. It would be more accurate to say that I haven't seen any examples of people using AI on a codebase that looks like on of the ones I work on. Clearly I am biased just lump all the types of coding I'm not interested in into "web development"
It's not about backend or frontend, it's mostly about languages. If you're using niche stuff or languages that don't have an online or digital presence, the LLMs will be confused. Stuff like assembler or low-level C aren't really in their vocabulary.
C#, Go, Python for example work perfectly well and you can kinda test the LLMs preference by asking them to write a program to solve a problem, but don't specify the language.
That’s my experience too. It also fails terribly with ElasticSearch probably because the documentation doesn’t have a lot of examples. ChatGPT, copilot and claude were all useless for that and gave completely plausible nonsense. I’ve used it with most success for writing unit tests and definitely shell scripts.
Agreed. It isn’t like crypto where the proponents proclaimed some use case that would prove value always on the verge of arriving. AI is useful right now. People are using these tools now and enjoying them.
> Observer bias is the tendency of observers to not see what is there, but instead to see what they expect or want to see.
Unfortunately, people enjoying a thing and thinking that it works well doesn't actually mean much on its own.
But, more than that I suspect that AI is making more people realize that they don't need to write everything themselves, but they never needed to to begin with, and they'd be better off to do the code reuse thing in a different way.
I'm not sure that's a convincing argument given that crypto heads haven't just been enthusiastically chatting about the possibilities in the abstract. They do an awful lot of that, see Web3, but they have been using crypto.
Even in 2012 bitcoin could very concretely be used to order drugs. Many people have used it to transact and preserve value in hostile economic environments. Etc etc. Ridiculous comment.
Personally i have still yet to find LLMs useful at all with programming.
bitcoin tracks the stock market
People are using divining rods now and enjoying them: https://en.wikipedia.org/wiki/Dowsing
I don't (use AI tools), I've tried them and found that they got in the way, made things more confusing, and did not get me to a point where the thing I was trying to create was working (let alone working well/safe to send to prod)
I am /hoping/ that AI will improve, to the point that I can use it like Google or Wikipedia (that is, have some trust in what's being produced)
I don't actually know anyone using AI right now. I know one person on Bluesky has found it helpful for prototyping things (and I'm kind of jealous of him because he's found how to get AI to "work" for him).
Oh, I've also seen people pasting AI results into serious discussions to try and prove the experts wrong, but only to discover that the AI has produced flawed responses.
I don't actually know anyone using AI right now.
I believe you, but this to me is a wild claim.
Ha! I think the same way when I see people saying that AI is in widespread use - I believe that it's possible, but it feels like an outlandish claim
I’d say 500M WAUs on chatGPT alone qualifies as widespread use.
Ok, how much of that is developers using it to help them code?
Essentially the same for me, I had one incident where someone was arguing in favor of it and then immediately embarrassed themselves badly because they were misled by a chatgpt error. I have the feeling that this hype will collapse as this happens more and people see how bad the consequences are when there are errors
If AI gives a bad experience 20% of the time, and if there are 10M programmers using it, then about 3000 of them will have a bad experience 5 times in a row. You can't really blame them for giving up and writing about it.
It’s all good to me - let these folks stay in the simple times while you and i arbitrage our efforts against theirs? I agree, there’s massive value in using these tools and it’s hilarious to me when others don’t see it. My reaction isn’t going to be convince them they’re wrong, it’s just to find ways to use it to get ahead while leaving them behind.
I need some information/advice -> I feed that into an imprecise aggregator/generator of some kind -> I apply my engineering judgement to evaluate the result and save time by reusing someone's existing work
This _is_ something that you can do with AI, but it's something that a search engine is better suited to because the search engine provides context that helps you do the evaluation, and it doesn't smash up results in weird and unpredictable ways.
Y'all think that AI is "thinking" because it's right sometimes, but it ain't thinking.
If I search for "refactor <something> to <something else>" and I get good results, that doesn't make the search engine capable of abstract thought.
AI is usually a better search engine than a search engine.
AI alone can't replace a search engine very well at all.
AI with access to a search engine may be present a more useful solution to some problems than a bare search engine, but the AI isn't replacing a search engine it is using one.
The "Deep Research" modes in web-based LLMs are quite useful. They can take a days worth of reading forums and social media sites and compress into about 10 minutes.
For example I found a perfect 4k120Hz capable HDMI switch by using ChatGPTs research mode. It did suggest the generic Chinese random-named ones off Amazon, but there was one brand with an actual website and a history - based in Germany.
I hadn't seen it recommended anywhere on my attempts, but did find it by searching for the brand specifically and found only good reviews. Bought it, love it.
This seems like a great example of someone reasoning from first principles that X is impossible, while someone else doing some simple experiments with an open mind can easily see that X is both possible and easily demonstrated to be so.
Y'all think that AI is "thinking" because it's right sometimes, but it ain't thinking.
I know the principles of how LLMs work, I know the difference between anthropomorphizing them and not. It's not complicated. And yet I still find them wildly useful.
YMMV, but it's just lazy to declare that anyone who sees it differently than you just doesn't understand how LLMs work.
Anyway, I could care less if others avoid coding with LLMs, I'll just keep getting shit done.
If you observe it at the right time, a broken clock will appear to be working, because it's right twice a day.
weird metaphor, because a gym goer practices what they are doing by putting in the reps in order to increase personal capacity. it's more like you're laughing at people at the gym, saying "don't you know we have forklifts already lifting much more?"
That’s a completely different argument, however, and a good one to have.
I can buy “if you use the forklift you’ll eventually lose the ability to lift weight by yourself”, but the author is going for “the forklift is actually not able to lift anything” which can trivially be proven wrong.
More like, "We had a nice forklift, but the boss got rid of it replaced it with a pack of rabid sled dogs which work sometimes? And sometimes they can also sniff out expiration dates on the food (although the boxes were already labeled?). And, I'm pretty sure one of them, George, understands me when I talk to him because the other day I asked him if he wanted a hotdog and he barked (of course, I was holding a hotdog at the time). But, anyway, we're using the dogs, so they must work? And I used to have to drive the forklift, but the dogs just do stuff without me needing to drive that old forklift"
I see it as almost the opposite. It’s like the pulley has been invented but some people refuse to acknowledge its usefulness and make claims that you’re weaker if you use it. But you can grow quite strong working a pulley all day.
"If you want to be good at lifting, just buy an exoskeleton like me and all my bros have. Never mind that your muscles will atrophy and you'll often get somersaulted down a flight of stairs while the exoskeleton makers all keep trying, and failing, to contain the exoskeleton propensity for tossing people down flights of stairs."
It's the barstool economist argument style, on long-expired loan from medieval theology. Responding to clear empirical evidence that X occurs: "X can't happen because [insert 'rational' theory recapitulation]"
There are people at the gym I go to benching 95 lbs and asking what does it take to get to 135, or 225? The answer is "lift more weight" not "have someone help you lift more weight"
If you already know how to code, yes AI/LLMs can speed you along at certain tasks, though be careful you don't let your skills atrophy. If you can bench 225 and then you stop doing it, you soon will not be able to do that anymore.
> If you already know how to code, yes AI/LLMs can speed you along at certain tasks, though be careful you don't let your skills atrophy.
This isn't a concern. Ice-cutting skills no longer have value, and cursive writing is mostly a 20th century memory. Not only have I let my assembly language skills atrophy, but I'll happily bid farewell to all of my useless CS-related skills. In 10 years, if "app developer" still involves manual coding by then, we'll talk about coding without an AI partner like we talk about coding with punch cards.
Maybe. I've seen a lot of "in 10 years..." predictions come and go and I'm still writing code pretty much the same way I did 40 years ago: in a terminal, in a text editor.
I don’t think the argument from such a simple example does much for the authors point.
The bigger risk is skill atrophy.
Proponents say, it doesn’t matter. We shouldn’t have to care about memory allocation or dependencies. The AI system will eventually have all of the information it needs. We just have to tell it what we want.
However, knowing what you want requires knowledge about the subject. If you’re not a security engineer you might not know what funny machines are. If someone finds an exploit using them you’ll have no idea what to ask for.
AI may be useful for some but at the end of the day, knowledge is useful.
I don't know. Cursor is decent at refactoring. ("Look at x and ____ so that it ____." With some level of elaboration, where the change is code or code organization centric.)
And it's okay at basic generation - "write a map or hash table wrapper where the input is a TZDB zone and the output is ______" will create something reasonable and get some of the TZDB zones wrong.
But it hasn't been that great for me at really extensive conceptual coding so far. Though maybe I'm bad at prompting.
Might be there's something I'm missing w/ my prompts.
For me, the hard part of programming is figuring out what I want to do. Sometimes talking with an AI helps with that, but with bugs like “1 out of 100 times a user follows this path, a message gets lost somewhere in our pipeline”, which are the type of bugs that really require knowledge and skill, AIs are completely worthless.
There really is a category of these posts that are coming from some alternate dimension (or maybe we're in the alternate dimension and they're in the real one?) where this isn't one of the most important things ever to happen to software development. I'm a person who didn't even use autocomplete (I use LSPs almost entirely for cross-referencing --- oh wait that's another thing I'm apparently never going to need to do again because of LLMs), a sincere tooling skeptic. I do not understand how people expect to write convincingly that tools that reliably turn slapdash prose into median-grade idiomatic working code "provide little value".
> I do not understand how people expect to write convincingly that tools that reliably turn slapdash prose into median-grade idiomatic working code "provide little value".
Honestly, I'm curious why your experience is so different from mine. Approximately 50% of the time for me, LLMs hallucinate APIs, which is deeply frustrating and sometimes costs me more time than it would have taken to just look up the API. I still use them regularly, and the net value they've imparted has been overall greater than zero, but in general, my experience has been decidedly mixed.
It might be simply that my code tends to be in specialized areas in which the LLM has little training data. Still, I get regular frustrating API hallucinations even in areas you'd think would be perfect use cases, like writing Blender plugins, where the documentation is poor (so the LLM has a relatively higher advantage over reading the documentation) and examples are plentiful.
Edit: Specifically, the frustrating pattern is: (1) the LLM produces some code that contains hallucinated APIs; (2) in order to test (or even compile) that code, I need to write some extra supporting code to integrate it into my project; (3) I discover that the APIs were hallucinated because the code doesn't work; (4) now I not only have to rewrite the LLM's code, but I also have to rewrite all the supporting code I wrote, because it was based around a pattern that didn't work. Overall, this adds up to more time than if I had just written the code from scratch.
You're writing Rust, right? That's probably the answer.
The sibling comment is right though: it matters hugely how you use the tools. There's a bunch of tricks that help and they're all kind of folkloric. And then you hear "vibe coding" stories of people who generate their whole app from a prompt, looking only at the outputs; I might generate almost my whole project from an LLM, but I'm reading every line of code it spits out and nitpicking it.
"Hallucination" is a particularly uninteresting problem. Modern LLM coding environments are closed-loop ("agentic", barf). When an LLM "hallucinates" (ie: is wrong, like I am many times a day) about something, it figures it out pretty quick when it tries to build and run it!
I haven’t had much of a problem writing Rust code with Cursor but I’ve got dozens of crates docs, the Rust book, and Rustinomicon indexed in Cursor so whenever I have it touch a piece of code, I @-include all of the relevant docs. If a library has a separate docs site with tutorials and guides, I’ll usually index those too (like the cxx book for binding C++ code).
I also monitor the output as it is generated because Rust Analyzer and/or cargo check have gotten much faster and I find out about hallucinations early on. At that point I cancel the generation and update the original message (not send it a new one) with an updated context, usually by @-ing another doc or web page or adding an explicit instruction to do or not to do something.
One of the frustrating things about talking about this is that the discussion often sounds like we're all talking about the same thing when we talk about "AI".
We're not.
Not only does it matter what language you code in, but the model you use and the context you give it also matter tremendously.
I'm a huge fan of AI-assisted coding, it's probably writing 80-90% of my code at this point, but I've had all the same experiences that you have, and still do sometimes. There's a steep learning curve to leveraging AIs effectively, and I think a lot of programmers stop before they get far enough along on that curve to see the magic.
For example, right now I'm coding with Cursor and I'm alternating between Claude 3.7 max, Gemini 2.5 pro max, and o3. They all have their strengths and weaknesses, and all cost for usage above the monthly subscription. I'm spending like $10 per day on these models at the moment. I could just use the models included with the subscription, but they tend to hallucinate more, or take odd steps around debugging, etc.
I've also got a bunch of documents and rules setup for Cursor to guide it in terms of what kinds of context to include for the model. And on top of that, there are things I'm learning about what works best in terms of how to phrase my requests, what to emphasize or tell the model NOT to do, etc.
Currently I usually start by laying out as much detail about the problem as I can, pointing to relevant files or little snippets of other code, linking to docs, etc, and asking it to devise a plan for accomplishing the task, but not to write any code. We'll go back and forth on the plan, then I'll have it implement test coverage if it makes sense, then run the tests and iterate on the implementation until they're green.
It's not perfect, I have to stop it and backup often, sometimes I have to dig into docs and get more details that I can hand off to shape the implementation better, etc. I've cursed in frustration at whatever model I'm using more than once.
But overall, it helps me write better code, faster. I never could have built what I've built over the last year without AI. Never.
> tools that reliably turn slapdash prose into median-grade idiomatic working code
This may be the crux of it.
Turning slapdash prose into median-grade code is not a problem I can imagine needing to solve.
I think I'm better at describing code in code than I am in prose.
I Want to Believe. And I certainly don't want to be "that guy", but my honest assessment of LLMs for coding so far is that they are a frustrating Junior, who maybe I should help out because mentoring might be part of my job, but from whom I should not expect any near-term technical contribution.
It is most of the problem of delivering professional software.
Not in my experience.
The only slapdash prose in the cycle is in the immediate output of a product development discussion.
And that is inevitably too sparse to inform, without the full context of the team, company, and industry.
Sorry, are you saying "the only place where there's slapdash prose is right before it would be super cool to have an alpha version of the code magically appear, that we can iterate on based on the full context of the team, company, and industry"?
I didn't say anything about "slapdash".
Well said. It's not that there would not be much to seriously think about and discuss – so much is changing, so quickly – but the stuff that a lot of these articles focus is a strange exercise in denial.
> “It won’t deal with abstractions” -> try asking cursor for potential refactors or patterns that could be useful for a given text.
That is not what abstraction is about. Abstraction is having a simpler model to reason about, not simply code rearranging.
> “It doesn’t understand things beyond the code” -> try giving them an abstract jira ticket or asking what it things about certain naming, with enough context
Again, that is still pretty much coding. What matters is the overall design (or at least the current module).
> “Reading code and understanding whether it’s wrong will take more time than writing it yourself” -> ask any engineer that saves time with everything from test scaffolding to run-and-forget scripts.
Imagine having a script and not checking the man pages for expected behavior. I hope the backup games are strong.
I have not had the same experience as the author. The code I have my tools write is not long. I write a little bit at a time, and I know what I expect it to generate before it generates it. If what it generates isn't what I expect, that's a good hint to me that I haven't been descriptive enough with my comments or naming or method signatures.
I use Cursor not because I want it to think for me, but because I can only type so fast. I get out of it exactly the amount of value that I expect to get out of it. I can tell it to go through a file and perform a purely mechanical reformatting (like converting camel case to snake case) and it's faster to review the results than it is for me to try some clever regexp and screw it up five or six times.
And quite honestly, for me that's the dream. Reducing the friction of human-machine interaction is exactly the goal of designing good tools. If there was no meaningful value to be had from being able to get my ideas into the machine faster, nobody would buy fancy keyboards or (non-accessibility) dictation software.
I'm like 80% sure people complaining about AI doing a shit job are just plain holding it wrong.
The LLM doesn't magically know stuff you don't tell it. They CAN kinda-sorta fetch new information by reading the code or via MCP, but you still need to have a set of rules and documentation in place so that you don't spend half your credits on the LLM figuring out how to do something in your project.
I was wanting to build a wire routing for a string of lights on a panel. Looked up TSP, and learned of Christofides herustic. Asked Claude to implement Christofides. Went on to do stuff I enjoy more than mapping Wikipedia pseudo code to runnable code.
Sure, it would be really bad if everyone just assumes that the current state of the art is the best it will ever be, so we stop using our brains. The thing is, I'm very unlikely to come up with a better approximation to TSP, so I might as well use my limited brain power to focus on domains where I do have a chance to make a difference.
This is exactly the way I succeed. I ask it to do little bits at a time. I think that people have issues when they point the tools at a large code base and say "make it better". That's not the current sweet spot of the tools. Getting rid of boiler plate has been a game changer for me.
I think my currently average code writing speed is 1 keyword per hour or something as nearly all my time coding is spent either reading the doc (to check my assumptions) or copy-pasting another block of code I have. The very short bust of writing code like I would write prose is done so rarely I don't even bother remembering them.
I've never written boilerplate. I copy them from old projects (the first time was not boilerplate, it was learning the technology) or other files, and do some fast editing (vim is great for this).
It's the "Day-50" problem.
On Day-0, AI is great but by Day-50 there's preferences and nuance that aren't captured through textual evidence. The productivity gains mostly vanish.
Ultimately AI coding efficacy is an HCI relationship and you need different relationships (workflows) at different points in time.
That's why, currently, as time progresses you use AI less and less on any feature and fall back to human. Your workflow isn't flexible enough.
So the real problem isn't the Day-0 solution, it's solving the HCI workflow problem to get productivity gains at Day-50.
Smarter AI isn't going to solve this. Large enough code becomes internally contradictory, documentation becomes dated, tickets become invalid, design docs are based on older conceptions. Devin, plandex, aider, goose, claude desktop, openai codex, these are all Day-0 relationships. The best might be a Day-10 solution, but none are Day-50.
Day-50 productivity is ultimately a user-interface problem - a relationship negotiation and a fundamentally dynamic relationship. The future world of GPT-5 and Sonnet-4 still won't read your thoughts.
I talked about what I'm doing to empower new workflows over here: https://news.ycombinator.com/item?id=43814203
You pinpoint a truly important thing, even though I cannot put words onto it, I think that getting lost with AI coding assistants is far worse than getting lost as a programmer. It is like doing vanilla code or trying to make a framework suit your needs.
AI coding assistants provide 90% of the time more value than the good old google search. Nothing more, nothing less. But I don't use AI to code for me, I just use it to optimize very small fractions (ie: methods/functions at most).
> The future world of GPT-5 and Sonnet-4 still won't read your thoughts. Chills ahead. For sure, it will happen some day. And there won't be any reason to not embrace it (although I am, for now, absolutely reluctant to such idea).
It's why these no-code/vibe-code solutions like bolt, lovable, and replit are great at hackathons, demos, or basic front-ends but there's a giant cliff past there.
Scroll through things like https://www.yourware.so/ which is a no-code gallery of apps.
There's this utility threshold due to a 1967 observation by Melvin Conway:
> [O]rganizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.
https://en.wikipedia.org/wiki/Conway%27s_law
The next step only comes from the next structure.
Lovable's multiplayer mode (https://lovable.dev/blog/lovable-2-0) combined with Agno teams (https://github.com/agno-agi/agno) might be a suitable solution if you can define the roles right. Some can be non or "semi"-human (if you can get the dynamic workflow right)
> It's why these no-code/vibe-code solutions like bolt, lovable, and replit are great at hackathons, demos, or basic front-ends but there's a giant cliff past there.
Back in the day, basically every "getting started in Ruby on Rails" tutorial involved making a Twitter-like thing. This seemed kind of magic at the time. Now, did Rails ultimately end up fundamentally end up totally changing the face of webdev, allowing anyone to make Twitter in an afternoon? Well, ah, no, but it made for a good tech demo.
4 lines of JS. A screenful of “reasoning”. Not much I can agree with.
Meanwhile I just asked Gemini in VS Code Agent Mode to build an HTTP-like router using a trie and then refactor it as a Python decorator, and other than a somewhat dumb corner case it failed at, it generated a pretty useful piece of code that saved me a couple of hours (I had actually done this before a few years ago, so I knew exactly what I wanted).
Replace programmers? No. Well, except front-end (that kind of code is just too formulaic, transactional and often boring to do), and my experiments with React and Vue were pretty much “just add CSS”.
Add value? Heck yes - although I am still very wary of letting LLM-written code into production without a thorough review.
Not even front end, unless it literally is a dumb thin wrapper around a back end. If you are processing anything on that front end, AI is likely to fall flat as quickly as it would on the backend.
based on what?
My own experience writing a web-based, SVG-based 3D modeler. No traditional back end, but when working on the underlying 3D engine it shits the bed from all the broken assumptions and uncommon conventions used there. And in the UI, the case I have in mind involved pointer capture and event handling, it chases down phantoms declaring it's working around behavior that isn't in the spec. I bring it the spec, I bring it minimal examples producing the desired behavior, and it still can't produce working code. It still tries to critique existing details that aren't part of the problem, as evidenced by the fact it took me 5 minutes to debug and fix myself when I got tired of pruning context. At one point it highlighted a line of code and suggested the problem could be a particular function getting called after that line. That function was called 10 lines above the highlighted line, in a section it re-output in a quote block.
So yes, it's bad for front end work too if your front end isn't just shoveling data into your back end.
AI's fine for well-trodden roads. It's awful if you're beating your own path, and especially bad at treading a new path just alongside a superhighway in the training data.
it built the meat of the code, you spent 5 minutes fixing the more complex and esoteric issues. is this not the desired situation? you saved time, but your skillset remained viable
> AI's fine for well-trodden roads. It's awful if you're beating your own path, and especially bad at treading a new path just alongside a superhighway in the training data.
I very much agree with this, although I think that it can be ameliorated significantly with clever prompting
I sincerely wish that had been the case. No, I built the meat of the code. The most common benefit is helping to reducing repetitive typing, letting me skip writing 12 minor variations of `x1 = sin(r1) - cos(r2)`.
Similar to that, in this project it's been handy translating whole mathematical formulas to actual code processes. But when it comes out of that very narrow box it makes an absolute mess of things that almost always ends in a net waste of time. I roped it into that pointer capture issue earlier because it's an unfamiliar API to me, and apparently for it, too, because it hallucinated some fine wild goose chases for me.
wrt unfamiliar APIs, I don't know if this would have worked in your case (perhaps not) but I find that most modern LLMs are very comfortable with simply reading and using the docs or sample code here and now if you pass them the link or a copy paste or html containing the relevant info
> I had actually done this before a few years ago, so I knew exactly what I wanted
Oh :) LLMs do work sometimes when you already know what you want them to write.
> that kind of code is just too formulaic, transactional and often boring to do
No offense, but that sounds like every programmer that hasn't done front-end development to me. Maybe for some class of front-ends (the same stuff that Ruby on Rails could generate), but past that things tend to get not boring real fast.
I do a fair amount of dashboards and data handling stuff. I don’t really want to deal with React/Vue at all, and AÍ takes most of the annoyance away.
> I had actually done this before a few years ago, so I knew exactly what I wanted
Why not just use the one you already wrote?
> Why not just use the one you already wrote?
Might be owned by a previous employer.
This is a funny opinion, because tools like Claude Code and Aider let the programmer spend more of their time thinking. The more time I spend diddling the keyboard, the less time I have available to be thinking about the high-level concerns.
If I can just thinking "Implement a web-delivered app that runs in the browser and uses local storage to store state, and then presents a form for this questionnaire, another page that lists results, and another page that graphs the results of the responses over time", and that's ALL I have to think about, I now have time to think about all sorts of other problems.
That's literally all I had to do recently. I have chronic sinusitis, and wanted to start tracking a number of metrics from day to day, using the nicely named "SNOT-22" (Sino-Nasal Outcome Test, I'm not kidding here). In literally 5 minutes I had a tool I could use to track my symptoms from day to day. https://snot-22.linsomniac.com/
I asked a few follow-ups ("make it prettier", "let me delete entries in the history", "remember the graph settings"). I'm not a front-end guy at all, but I've been programming for 40 years.
I love the craft of programming, but I also love having an idea take shape. I'm 5-7 years from retirement (knock on wood), and I'm going to spend as much time thinking, and as little time typing in code, as possible.
I think that's the difference between "software engineer" and "programmer". ;-)
> But AI doesn't think -- it predicts patterns in language.
Boilerplate code is a pattern, and code is a language. That's part of why AI-generated code is especially effective for simple tasks.
It's when you get into more complicated apps that the pros/cons of AI coding start to be more apparent.
not even necessarily complicated, but also obscure
A programmer's JOB is not to think. It's to deliver value to their employer or customers. That's why programmers get paid. Yes, thinking hard about how to deliver that value with software is important, but when it comes to a job, it's not the thought that counts; it's the results.
So if I, with AI augmentation, can deliver the same value as a colleague with 20% less thought and 80% less time, guess whose job is more secure?
I know, I know, AI tools aren't on par with skilled human programmers (yet), but a skilled human programmer who uses AI tools effectively to augment (not entirely replace) their efforts can create value faster while still maintaining quality.
The value is in working and shipped features. This value increase when there's no technical debt dragging it down. Do the 20% less thought and 80% less time still hold?
I haven't been using AI for coding assistance. I use it like someone I can spin around in my chair, and ask for any ideas.
Like some knucklehead sitting behind me, sometimes, it has given me good ideas. Other times ... not so much.
I have to carefully consider the advice and code that I get. Sometimes, it works, but it does not work well. I don't think that I've ever used suggested code verbatim. I always need to modify it; sometimes, heavily.
So I still have to think.
It seems like the traditional way to develop good judgement is by getting experience with hands-on coding. If that is all automated, how will people get the experience to have good judgement? Will fewer people get the experiences necessary to have good judgement?
Compilers, for the most part, made it unnecessary for programmers to check the assembly code. There are still compiler programmers that do need to deal with that, but most programmers get to benefit from just trusting that the compilers, and by extension the compiler programmers, are doing a good job
We are in a transition period now. But eventually, most programmers will probably just get to trust the AIs and the code they generate, maybe do some debugging here and there at the most. Essentially AIs are becoming the English -> Code compilers
In my experience, compilers are far more predictable and consistent than LLMs, making them suitable for their purpose in important ways that LLMs are not.
I honestly think people are so massively panicking over nothing with AI. even wrt graphic design, which I think people are most worried about, the main, central skill of a graphic designer is not the actual graft of sitting down and drawing the design, it's having the taste and skill and knowledge to make design choices that are worthwhile and useful and aesthetically pleasing. I can fart around all day on Stable Diffusion or telling an LLM to design a website, but I don't know shit about UI/UX design or colour theory or simply what appeals to people visually, and I doubt an AI can teach me it to any real degree.
yes there are now likely going to be less billable hours and perhaps less joy in the work, but at the same time I suspect that managers who decide they can forgo graphic designers and just get programmers to do it are going to lose a competitive advantage
I disagree. It's all about how you're using them. AI coding assistants make it easy to translate thought to code. So much boilerplate can be given to the assistant to write out while you focus on system design, architecture, etc, and then just guide the AI system to generate the code for you.
Call it AI, ML, Data Mining, it does not matter. Truth is these tools have been disrupting the SWE market and will continue to do so. People working with it will simply be more effective. Until even them are obsolete. Don't hate the player, hate the game.
This is becoming even more of a consensus now as in it feels like the tech is somewhat already there, or just about to come out.
As a software professional what makes it more interesting is that the "trick" (reasoning RL in models) that unlocked disruption of the software industry isn't really translating to other knowledge work professions. The disruption of AI is uneven. I'm not seeing in my circles other engineers (e.g. EE's, Construction/Civil, etc), lawyers, finance professionals, anything else get disrupted as significantly as software development.
The respect of the profession has significantly gone down as well. From "wow you do that! that's pretty cool..." to "even my X standard job has a future; what are you planning to do instead?" within a 3 year period. I'm not even in SV, NY or any major tech hubs.
So theres no value in dealing with the repeatable stuff to free the programmer up to solve new problems? Seems like a stretch.
There is no new value that we didn’t already recognize. We’ve know for many decades that programming languages can help programmers.
Software ate the world, it's time for AI to eat the software :)
Anything methodical is exactly what the current gen AI can do. Its phenomenal in translations, be it human language to human language or an algorithm description into computer language.
People like to make fun with the "vibe coding" but that's actually a purification process where humans are getting rid of the toolset that we used to master to be able to make the computer do what we tell it to do.
Most of todays AI developer tools are misguided because they are trying to orchestrate tools that were created to help people write and manage software.
IMHO the next-gen tools will write code that is not intended for human consumption. All the frameworks, version management, coding paradigms etc will be relics of the past. Curiosities for people who are fascinated for that kind of things, not production material.
I am very sceptical and cautious user of AI tools, but this sounds like someone who didn't figure out a workflow which works for himself:
> Nothing indicates how this should be run.
That's why I usually ask it to write a well defined function or class, with type annotations and all that. I already know how to call it.
Also you can ask for calling examples.
> ... are not functions whose definitions are available within the script. Without external context, we don't know what they do.
Are already solved by having proper IDE or LSP.
> run in E environments with V versions
Fair enough, stick to "standard" libraries which don't change often. Use boring technology.
> The handler implicitly ignores arguments
Because you probably didn't specify how arguments are to be handled.
In general, AI is very helpful to reduce tedium in writing common pieces of logic.
In ideal world, programming languages and libraries are as expressive as natural language, and we don't need AI. We can marshal our thoughts into code as fast as we marshal it into english, and as succinctly.
But until that happens "AI" helps with tedious logic and looking up information. You will still have to confirm the code, so being at least a bit familiar with the stack is a good thing.
I think there's some truth here in that AI can be used as a band-aid to sweep issues of bad abstractions or terse syntax under the rug.
For example, I often find myself reaching for Cursor/ChatGPT to help me with simple things in bash scripts (like argument parsing, looping through arrays, associative maps, handling spaces in inputs) because the syntax just isn't intuitive to me. But I can easily do these things in Python without asking an AI.
I'm not a web developer but I imagine issues of boilerplate or awkward syntax could be solved with more "thinking" instead of using the AI as a better abstraction to the bad abstractions in your codebase.
In the past I've worked at startups that hired way too many bright junior developers and at companies that insisted on only hiring senior developers. The arguments for/against AI coding assistants feel very reminiscent of the arguments that occur around what seniority balance we want on an engineering team. In my experience it's a matter of balancing between doing complex work yourself and handing off simple work.
If AI coding assistants provide little value then why is Cursor IDE a 300m company and why does this study say it makes people more 37% more productive?
https://exec.mit.edu/s/blog-post/the-productivity-effects-of...
That study shows nothing of the sort. It essentially showed ChatGPT is better at pumping out boilerplate than humans. Here are the tasks: https://www.science.org/action/downloadSupplement?doi=10.112...
I get massive value out of Agentic coding.
I no longer need to worry about a massive amount of annoying, but largely meaningless implementation details. I don’t need to pick a random variable/method/class name out of thin air. I don’t need to plan ahead on how to DRY up a method. I don’t need to consider every single edge case up front.
Sure, I still need to tweak and correct things but we’re talking about paint by number instead of starting with a blank canvas. It’s such a massive reduction in mental load.
I also find it reductionist to say LLM don’t think because they’re simply predicting patterns. Predicting patterns is thinking. With the right context, there is little difference between complex pattern matching and actual thinking. Heck, a massive amount of my actual, professional software development work is figuring out how to pattern matching my idea into an existing code base. There’s a LOT of value in consistency.
Some problems require using a different kind of modeling other than language:
https://medium.com/@lively_burlywood_cheetah_472/ai-cant-sol...
>But AI doesn't think -- it predicts patterns in language.
We've moved well beyond that. The above sentence tells me you haven't used the tools recently. That's a useful way to picture what's happening, to remove the magic so you can temper your expectations.
The new tooling will "predict patterns" at a higher level, a planning level, then start "predicting patterns" in the form of strategy, etc... This all, when you start reading the output of "thinking" phases. They sound a lot likea conversation I'd have with a colleague about the problem, actually.
This is a tired viewpoint.
There's a percentage of developers, who due to fear/ego/whatever, are refusing to understand how to use AI tooling. I used to debate but I've started to realize that these arguments are mostly not coming from a rational place.
Title is a bit provocative and begs the question (is thinking the part being replaced?), but the bigger issue is what “little” means here. Little in absolute terms? I think that’s harsh. Little in relation to how it’s touted? That’s a rational conclusion, I think.
You need three things to use LLM based tools effectively: 1) an understanding of what the tool is good at and what it isn’t good at; 2) enough context and experience to input a well formulated query; and 3) the ability to carefully verify the output and discard it if necessary.
This is the same skillset we’ve been using with search engines for years, and we know that not everyone has the same degree of Google-fu. There’s a lot of subjectivity to the “value”.
It's nice that the author was kind enough to make his obviously wrong thesis righ in the title.
If you write code professionally, you're really doing yourself a disservice if you aren't evaluating and incorporating AI coding tools into your process.
If you've tried them before, try them again. The difference between Gemini 2.5 Pro and what came before is as different as between GPT 3.5 and 4.
If you're a hobbyist, do whatever you want: use a handsaw, type code in notepad, mill your own flour, etc.
I prefer a more nuanced take. If I can’t reliably delegate away a task, then it’s usually not worth delegating. The time to review the code needs to be less than the time it takes to write it myself. This is true for people and AI.
And there are now many tasks which I can confidently delegate away to AI, and that set of tasks is growing.
So I agree with the author for most of the programming tasks I can think of. But disagree for some.
"Writing code is easy"
If you're the 1% of Earth's population for which this is true, then this headline makes sense. If you're the 99% for which this isn't at all true, then don't bother reading this, because AI coding assistance will change your life.
"Writing code is easy" once you learn the tools and have thought through the design. But most people are skipping the latter two and complain about the first one.
It's like doing math proofs. It's easy when you know maths and have a theoretical solution. So, the first steps is always learning maths and think about a solution. Not jump head first into doing proofs.
On the contrary. Just yesterday, we've had here on HN one of the numerous reposts of "Notation as a tool of thought" by Ken Iverson, the creator of APL.
Think of AI bots as a tool of thought.
The term "assistant" is key.
In any given field of expertise, the assistant isn't supposed to be doing the professional thinking.
Nonetheless, the value that a professional can extract from an assistant can vary from little, to quite significant.
One point AI helps me with is to keep going.
Does it do things wrong (compared to what I have in my mind?). Of course. But it helps to have code quicker on screen. Editing / rolling back feels faster than typing everything myself.
engineering workflows should be more about thinking and discussing than writing code
This is also the best case for using AI. You think, you discuss, then instruct the AI to write, then you review.
If it’s of such little value, does he really want to compete against developers trying to do the same thing he is but that have the benefit of it?
I believe we need to take a more participatory approach to intelligence orchestration.
It’s not humans vs machines.
A programmers job is to provide value to the business. Thinking is certainly a part of the process, but not the job in itself.
I agree with the initial point he's making here - that code takes time to parse mentally, but that does not naturally lead to the conclusion that this _is_ the job.
As a SWE the comments on this page scare me if I'm being honest. If we can't define the value of a programmer vs an AI in a forum such as this then the obvious question is there to ask from an employer's perspective - in the world of AI is a programmer/SWE no longer worth employing/investing in long term? This equally applies to any jobs in tech where the job is "to do" vs "to own" (e.g. DevOps, Testing, etc etc)
Many defenders of AI tools in this thread are basically arguing against the end conclusion of the article which is that "to think" is no longer the moat it once was. I don't buy into the argument either that "people who know how to use AI tools" will somehow be safe - logically that's just a usability problem that has a lot of people seem to be interested in solving.
The impression I'm getting is that even the skill of "using/programming LLM's" is only a transitory skill and another form of cope from developers pro AI - if AI is smart enough you won't need to "know how to use it" - it will help you. That's what commoditization of intelligence is by definition - anything like "learning/intelligence/skills" is no longer required since the point is to artificially create this.
To a lay person reading this thread - in a few years (maybe two) there won't be a point of doing CS/SWE anymore.
And yet I keep meeting programmers who say AI coding assistants are saving them tons of time or helping them work through problems they otherwise wouldn't have been able to tackle. I count myself among that group at this point. Maybe that means I'm just not a very good programmer if I need the assistance, but I'd like to think my work speaks for itself at this point.
Some things where I've found AI coding assistants to be fantastic time savers:
programmer's job is to think, AI coding assistant's job is to do?
Reading just the title:
It is _because_ a programmer's job is to think that AI Coding assistants may provide value. They would (and perhaps already do) complete the boiler plate, and perhaps help you access information faster. They also have detriments, may atrophy some of your capabilities, may tempt you to go down more simplistic paths etc., but still.
Reading the post as well: It didn't change my mind. As for what it actually says, my reaction is a shrug, "whatever".
It doesn't seem like the author has ever used AI to write code. You definitely can ask it to refactor. Both ChatGPT and Gemini have done excellent work for me on refactors, and they have also made mistakes. It seems like they are both quite good at making lengthy, high-quality suggestions about how to refactor code.
His argument about debugging is absolutely asinine. I use both GDB and Visual Studio at work. I hate Visual Studio except for the debugger. GDB is definitely better than nothing, but only just. I am way, way, way more productive debugging in Visual Studio.
Using a good debugger can absolutely help you understand the code better and faster. Sorry but that's true whether the author likes it or not.
I am the first to criticize LLMs and dumb AI hype. there is no nothing wrong with using an LSP, and a coding assistant is just an enhanced LSP if that is all you want it to be. my job is to solve problems, and AI can slightly speed that up.
Do non-AI coding assistants provide value?
It's more like an assistant that can help you write a class to do something. You could write on your own but feeling lazy. Sometimes it's good, other times it's idioticly bad. Need to keep it in check and keep telling it what it needs to do because it has a tendency to dig holes it can't get out of. Breaking things up into Smaller classes helps to a degree.
Object-oriented languages provide little value because a programmer’s job is to think
Memory-safe languages provide little value because a programmer’s job is to think
…
Using deterministic methods as counter arguments about a probabilistic one. Something apples, something oranges….
Now this isn't a killer argument, but your examples are about readability and safety, respectively - the quality of the result. LLMs seem to be more about shoveling the same or worse crap faster.
Have you tried an AI coding assistant, or is that just the impression you get?
I have seen the results of other people. Code LLMs seem to do some annoying stuff more quickly than manually and are sometimes able to improve prose in comments and such. But they also mess up when it gets moderately difficult, especially when there are "long distance" connections between pieces. That and the probably seductive (to some) ability to crank out working, but repetitive or partially nonsensical code, is what I call shoveling crap faster.
I dunno what to tell you, I am able to get consistently good quality work out of eg 3.7 sonnet and it’d saved me a ton of time. Garbage in garbage out, maybe the people you’ve observed don’t know how to write good prompts.
I guess I should play around with that one then. My general impression was that we're already in the diminishing returns part of the sigmoid curve (calendar time or coefficient array size vs quality) for LLMs, until there's maybe a change other than making them bigger.
Not comparable at all.
What if they help you to think ?
I know LLMs are masters of averages and I use that to my advantage.
Honestly, o3 has completely blown my mind in terms of ability to come up with useful abstractions beyond what I would normally build. Most people claiming LLMs are limited just arent using the tools enough, and cant see the trajectory of increasing ability
> Most people claiming LLMs are limited just rent using the tools enough
The old quote might apply:
~"XML is like violence. If it's not working for you, you need to use more of it".
(I think this is from Tim Bray -- it was certainly in his .signature for a while -- but oddly a quick web search doesn't give me anything authoritative. I asked Gemma3, which suggests Drew Conroy instead)
Last I heard the phrase, it was attributed to Jamie Zawinski.
I'm sorry to say, but the author of this post doesn't appear to have much, if any experience with AI and sounds like he's just trying to justify not using it and pretend hes better without it.
It’s okay to be a sceptic, I am too, but the logic and reasoning in the post is just really flimsy and makes our debate look weak.
seriously. if you want to say that AI will likely reduce wages and supply for certain more boilerplate jobs; or that they are comparatively much worse for the environment than normal coding; or that they are not particularly good once you get into something quite esoteric or complex; or that they've led certain companies to think that developing AGI is a good idea; or that they're mostly centralised into the hands of a few unpleasant actors; then any of those criticisms, and certainly others, are valid to me, but to say that they're not actually useful or provide little value? it's just nonsense or ragebait
Couldn't agree more. And I'm regards to some of the comments here, generating the text isn't the hard OR time consuming part of development, and that's even assuming the generated code was immediately trustworthy. Given that it isn't must be checked, it's really just not very valuable
this is just stupid, anyone who's used ai to code knows this is wrong empirically
I've used it and haven't had much success.
"Spellcheck provides little value because an authors job is to write." - rolls eyes
[dead]
[flagged]
I wish people would realize you can replace pretty much any LLM with GitHub code search. It's a far better way to get example code than anything I've used.
If you use github, sure
Most developers use languages that lack expressivity. LLMs allow them to generate the text faster, bringing it closer to the speed of thought.
I fear this will not age well.
Which models have you tried to date? Can you come up with a top 3 ranking among popular models based on your definition of value?
What can be said about the ability of an LLM to translate your thinking represented in natural language to working code at rates exceeding 5-10x your typing speed?
Mark my words: Every single business that has a need for SWEs will obligate their SWEs to use AI coding assistants by the end of 2026, if not by the end of 2025. It will not be optional like it is today. Now is the time you should be exploring which models are better at "thinking" than others, and discerning which thinking you should be doing vs. which thinking you can leave up to ever-advancing LLMs.
I've had to yank tokens out of the mouths of too many thinking models stuck in loops of (internally, within their own chain of thought) rephrasing the same broken function over and over again, realizing each time that it doesn't meet constraints, and trying the same thing again. Meanwhile, I was sat staring at an opaque spinner wondering if it would have been easier to just write it myself. This was with Gemini 2.5 Pro for reference.
Drop me a message on New Year's Day 2027. I'm betting I'll still be using them optionally.
I've experienced gemini get stuck as you describe a handful of times. With that said, my predication is made on the observation that these tools are already force multipliers, and they're only getting better each passing quarter.
You'll of course be free to use them optionally in your free time and on personal projects. It won't be the case at your place of employment.
I will mark my calendar!
> Every single business that has a need for SWEs will obligate their SWEs to use AI coding assistants by the end of 2026, if not by the end of 2025.
Exactly why I authored https://ghuntley.com/ngmi - it's already happening...
I was rooting for the strawberry. You delivered!
This reminds me of the story a few days ago about "what is your best prompt to stump LLMs", and many of the second level replies were links to current chat transcripts where the LLM handled the prompt without issue.
I think there are a couple of problems at play: 1) people who don't want the tools to have value, for various reasons, and have therefore decided the tools don't have value; 2) people who tried the tools six months or a year ago and had a bad experience and gave up; and 3) people who haven't figured out how to make good use of the tools to improve their productivity (this one seems to be heavily impacted by various grifters who overstate what the coding assistants can do, and people underestimating the effort they have to put in to get good at getting good output from the models.)
4) People that likes having reliable tools which frees them from "reviewing" the output of these tools to see if the tool didn't make an error.
Using AI is like driving a car that decides to turn even if you keep the steering wheel straight. Randomly. At various degree. If you like this because some times it let you turn in a curve without you having to steer, you do you. But some people do prefer having a car turn when and only when they turn the wheel.
That's covered under point #1. I'm not claiming these tools are perfect. Neither are most people, but from the standpoint of an employer, the question is going to be "does the tool, after accounting for errors, make my employees more or less productive?" A lot of people are seeing the answer to that - today - is the tools offer a productivity advantage.
Every single business that has a need for SWEs will obligate their SWEs to use AI coding assistants by the end of 2026, if not by the end of 2025.
If businesses mandated speed like that, then we’d all have been forced to use emacs decades ago. Businesses mandate correctness and AI doesn’t as clearly help to that end.
There's nothing natural about using emacs in the way that an LLM can convert natural language to working code and productivity gains.
For better or worse, you won't find correctness on any business' income statement. Sure, it's a latent variable, but so is efficiency.