- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
“No Duh,” say senior developers everywhere.
The article explains that vibe code often is close, but not quite, functional, requiring developers to go in and find where the problems are - resulting in a net slowdown of development rather than productivity gains.
You mean relying blindly on a statistical prediction engine to attempt to produce sophisticated software without any understanding of the underlying principles or concepts doesn’t magically replace years of actual study and real-world experience?
But trust me, bro, the singularity is imminent, LLMs are the future of human evolution, true AGI is nigh!
I can’t wait for this idiotic “AI” bubble to burst.
For most large projects, writing the code is the easy part anyway.
Writing new code is easier than editing someone else’s code but editing a portion is still better than writing the entire program again from start to end.
Then there is LLMs which force you to edit the entire thing from start to end.
This article sums up a Stanford study of AI and developer productivity. TL;DR - net productivity boost is a modest 15-20%, or as low as negative to 10% in complex, brownfield codebases. This tracks with my own experience as a dev.
https://www.linkedin.com/pulse/does-ai-actually-boost-developer-productivity-striking-çelebi-tcp8f
About that “net slowdown”. I think it’s true, but only in specific cases. If the user already knows well how to write code, an LLM might be only marginally useful or even useless.
However, there are ways to make it useful, but it requires specific circumstances. For example, you can’t be bothered to write a simple loop, you can use and LLM to do it. Give the boring routine to an LLM, and you can focus on naming the variables in a fitting way or adjusting the finer details to your liking.
Can’t be bothered to look up the exact syntax for a function you use only twice a year? Let and LLM handle that, and tweak the details. Now, you didn’t spend 15 minutes reading stack overflow posts that don’t answer the exact question you had in mind. Instead, you spent 5 minutes on the whole thing, and that includes the tweaking and troubleshooting parts.
If you have zero programming experience, you can use an LLM to write some code for you, but prepare to spend the whole day troubleshooting something that is essentially a black box to you. Alternatively, you could ask a human to write the same thing in 5-15 minutes depending on the method they choose.
This is a sane way to use LLM. Also, pick your poison, some bots are better than others for a specific task. It’s kinda fascinating to see how other people solve coding problems and that is essentially on tap with a bot, it will churn out as many examples as you want. It’s a really useful tool for learning syntax and libraries of unfamiliar languages.
On one extreme side of LLM there is this insane hype and at the other extreme a great pessimism but in the middle is a nice labour saving educational tool.
so is the profit it was foretold to generate, but it actually costs money than its actually generating.
Its great for stupid boobs like me, but only to get you going. It regurgitates old code, it cannot come up with new stuff. Lately there have been less Python errors, but again the stuff you can do is limited. At least for the free stuff that you can get without signing up.
Yea, I use it for home assistant, it’s amazingly powerful… And so incredibly dumb
It will take my if and statements, and shrunk it to 1/3 the length, while being twice as to robust… While missing that one of the arguments is entirely in the wrong place.
It regurgitates old code, it cannot come up with new stuff.
The trick is, most of what you write is basically old code in new wrapping. In most projects, I’d say the new and novel part is maybe 10% of the code. The rest is things like setting up db models, connecting them to base logic, set up views, api endpoints, decoding the message on the ui part, displaying it to user, handling input back, threading things so UI doesn’t hang, error handling, input data verification, basic unit tests, set up settings, support reading them from a file or env vars, making UI look not horrible, add translatable text, and so on and on and on. All that has been written in some variation a million times before. All can be written (and verified) by a half-asleep competent coder.
The actual new interesting part is gonna be a small small percentage of the total code.
I totally agree with this. However, you can’t get there without coding experience and knowledge of the problem as well as education in computer science or experience in the field. I’m a generalist, I’m loving what I can do at home. But I still get the run around using AI. I have to read and understand the code to try to nudge the AI in the right direction or I’ll end up going in circles if I don’t.
I’m not a programmer in any sense. Recently, I made a project where I used python and raspberry pi and had to train some small models on a KITTI data set. I used AI to write the broad structure of the code, but in the end, it took me a lot of time going through python documentation as well as the documentation of the specific tools/modules I used to actually get the code working. Would an experienced programmer get the same work done in an afternoon? Probably. But the code AI output still had a lot of flaws. Someone who knows more than me would probably input better prompts and better follow up requirements and probably get a better structure from the AI, but I doubt they’ll get a complete code. In the end, even to use AI, you have to know what you’re doing to use AI efficiently and you still have to polish the code into something that actually works.
From my experience, AI just seems to be a lesson in overfitment. You can’t use it to do things nobody has done before. Furthermore, you only really get good responses from prompts related to Javascript
Perhaps it should read “All AI is over hyped, over done and we should be over it”
I think maybe a good comparison is to written papers/ assignments. It can generate those just like it can generate code.
But it is not about the words themself, but about the content.
These types of articles always fail to mention how well trained the developers were on techniques and tools. In my experience that makes a big difference.
My employer mandates we use AI and provides us with any model, IDE, service we ask for. But where it falls short is providing training or direction on ways to use it. Most developers seem to go for results prompting and get a terrible experience.
I on the other hand provide a lot of context through documents and various mcp tooling, I talk about the existing patterns in the codebase and provide sources to other repositories as examples, then we come up with an implementation plan and execute on it with a task log to stay on track. I spend very little time fixing bad code because I spent the setup time nailing down context.
So if a developer is just prompting “Do XYZ”. It’s no wonder they’re spending more time untangling a random mess.
Another aspect is that everyone seems to always be working under the gun and they just don’t have the time to figure out all the best practices and techniques on their own.
I think this should be considered when we hear things like this.
I have 3 questions, and I’m coming from a heavily AI-skeptic position, but am open:
-
Do you believe that providing all that context, describing the existing patterns, creating an implementation plan, etc, allows the AI to both write better code and faster than if you just did it yourself? To me, this just seems like you have to re-write your technical documentation in prose each time you want to do something. You are saying this is better than ‘Do XYZ’, but how much twiddling of your existing codebase do you need to do before an AI can understand the business context of it? I don’t currently do development on an existing codebase, but every time I try to get these tools to do something fairly simple from scratch, they just flail. Maybe I’m just not spending the hours to build my AI-parsable functional spec. Every time I’ve tried this, asking something as simple as (and paraphrased for brevity) “write an Asteroids clone using JavaScript and HTML 5 Canvas” results in a full failure, even with multiple retries chasing errors. I wrote something like that a few years ago to learn Javascript and it took me a day-ish to get something that mostly worked.
-
Speaking of that context. Are you running your models locally, or do you have some cloud service? If you give your entire codebase to a 3rd party as context, how much of your company’s secret sauce have you disclosed? I’d imagine most sane companies are doing something to make their models local, but we see regular news articles about how ChatGPT is training on user input and leaking sensitive data if you ask it nicely and I can’t imagine all the pro-AI CEOs are aware of the risks here.
-
How much pen-testing time are you spending on this code, error handling, edge cases, race conditions, data sanitation? An experienced dev understands these things innately, having fixed these kinds of issues in the past and knows the anti-patterns and how to avoid them. In all seriousness, I think this is going to be the thing that actually kills AI vibe coding, but it won’t be fast enough. There will be tons of new exploits in what used to be solidly safe places. Your new web front-end? It has a really simple SQL injection attack. Your phone app? You can tell it your username is admin’[email protected] and it’ll let you order stuff for free since you’re an admin.
I see a place for AI-generated code, for instant functions that do something blending simple and complex. “Hey claude, write a function to take a string and split it at the end of every sentence containing an uppercase A”. I had to write weird functions like that constantly as a sysadmin, and transforming data seems like a thing an AI could help me accelerate. I just don’t see that working on a larger scale, though, or trusting an AI enough to allow it to integrate a new function like that into an existing codebase.
Thank you for reading my comment. I’m on the train headed to work and I’ll try to answer completely. I love talking about this stuff.
Do you believe that providing all that context, describing the existing patterns, creating an implementation plan, etc, allows the AI to both write better code and faster than if you just did it yourself?
For my work, absolutely. My work is a lot of tickets that were setup from multiple stories and multiple epics. It would be like asking me if I am really framing a house faster with a nail gun and compressor. If I were just hanging up a picture or a few pictures in the hallway, it’s probably faster to use a hammer than to set up the compressor and nail gun, plus cleanup.
However, a lot of that documentation already exists by the time it gets to me. All of the Design Documents and Product Requirement Documents have already been formed, discussed, and approved by our architecture team and team leads. Imagine if you already had this documentation for the asteroid game; how much better do you think your LLM would do? Maybe this is the benefit of using LLMs for development at an established company. Btw, a lot of those Documents were also created with the assistance of AI by the Product Team, Architects, and Principle/Staff/Leads anyway.
how much twiddling of your existing codebase do you need to do before an AI can understand the business context of it?
With the help of our existing documents and codebase(s) I feel I dont have any issues with the model knowing what we’re doing. I do have to set up my own context for how I want it to be done. To me this is like explaining to a Junior Engineer what I need them to help me with. If you’re familiar with “Know when to Direct, when to Delegate, or when to Develop” I would say it lands in between Direct and Delegate. I have markdown files with my rules and guidelines and provide that as context. I use Augment Code which is pretty good with codebase context.
write an Asteroids clone using JavaScript and HTML 5 Canvas
I would try “Let’s plan out the steps needed to write an Asteroids game using JavaScript and HTML 5. Identify and explain each step of the development plan. The game must build with no errors, be playable, and pass all tests. Do not wrote code at this time until our plan is approved” Then once it comes back with the initial steps, I would guide it further if needed. Finally I would approve the plan and tell it to execute while tracking it’s steps (Augment Code uses a task log).
Are you running your models locally, or do you have some cloud service? If you give your entire codebase to a 3rd party as context, how much of your company’s secret sauce have you disclosed?
We are required to use the frontier models that my employer has contracts with and are forbidden from using local models. In our enterprise contracts we have negotiated for no training on our data. I imagine we pay for that. I’m not involved in that level of interaction on the accounts.
How much pen-testing time are you spending on this code, error handling, edge cases, race conditions, data sanitation? An experienced dev understands these things innately, having fixed these kinds of issues in the past and knows the anti-patterns and how to avoid them. In all seriousness, I think this is going to be the thing that actually kills AI vibe coding, but it won’t be fast enough. There will be tons of new exploits in what used to be solidly safe places.
We have other teams that handle a lot of these tasks. These teams are also using AI tools to get the job done. In addition, we have static testing tools on our repo like CodeRabbit and another one I can’t remember the name of that looks specifically for security concerns. It will comment on the PR directly and our merge would be blocked until handled. Code coverage for testing is at 85% or it blocks the merge and we have a full QA department of Analysts and SDETs to QA. In addition to that we still have human approvals required (2 devs + Sr+). All of these people involved are still using AI tools to help them in each step.
I hope that answers your questions and gives you some insight into how I’ve found success in my experience with it. I will say that on my personal projects I don’t go this far with process and I don’t experience the same AI output that I do at work.
Thanks for your reply, and I can still see how it might work.
I’m curious if you have any resources that do some end-to-end examples. This is where I struggle. If I have an atomic piece of code I need and I can maybe get it started with a LLM and finish it by hand, but anything larger seems to just always fail. So far the best video I found to try a start-to-finish demo was this: https://www.youtube.com/watch?v=8AWEPx5cHWQ
He spends plenty of time describing the tools and how to use them, but when we get to the actual work, we spend 20 minutes telling the LLM that it’s doing stuff wrong. There’s eventually a prototype, but to get there he had to alternate between ‘I still can’t jump’ and ‘here’s the new error.’ He eventually modified code himself, so even getting a ‘mario clone’ running requires an actual developer and the final result was underwhelming at best.
For me, a ‘game’ is this tiny product that could be a viable unit. It doesn’t need to talk to other services, it just needs to react to user input. I want to see a speed-run of someone using LLMs to make a game that is playable. It doesn’t need to be “fun”, but the video above only got to the ‘player can jump and gets game over if hitting enemy’ stage. How much extra effort would it take to make the background not flat blue? Is there a win condition? How to refactor this so that the level is not hard-coded? Multiple enemy types? Shoot a fireball that bounces? Power Ups? And does doing any of those break jump functionality again? How much time do I have to spend telling the LLM that the fireball still goes through the floor and doesn’t kill an enemy when it hits them?
I could imagine that if the LLM was handed a well described design document and technical spec that it could do better, but I have yet to see that demonstrated. Given what it produces for people publishing tutorials online, I would never let it handle anything business critical.
The video is an hour long, and spends about 20 minutes in the middle actually working on the project. I probably couldn’t do better, but I’ve mostly forgotten my javascript and HTML canvas. If kaboom.js was my focus, though, I imagine I could knock out what he did in well under 20 minutes and have a better architected design that handled the above questions.
I’ve, luckily, not yet been mandated that I embed AI into my pseudo-developer role, but they are asking.
-
Even though this shit was apparent from day fucking 1, at least the Tech Billionaires were able to cause mass layoffs, destroy an entire generation of new programmers’ careers, introduce an endless amount of tech debt and security vulnerabilities, all while grifting investors/businesses and making billions off of all of it.
Sad excuses for sacks of shit, all of them.
Look on the bright side, in a couple of years they will come crawling back to us, desperate for new things to be built so their profit machines keep profiting.
Current ML techniques literally cannot replace developers for anything but the most rudimentary of tasks.
I wish we had true apprenticeships out there for development and other tech roles.
I mean… At best it’s a stack overflow/google replacement.
There’s some real perks to using AI to code - it helps a ton with templatable or repetitive code, and setting up tedious tasks. I hate doing that stuff by hand so being able to pass it off to copilot is great. But we already had tools that gave us 90% of the functionality copilot adds there, so it’s not super novel, and I’ve never had it handle anything properly complicated at all successfully (asking GPT-5 to do your dynamic SQL calls is inviting disaster, for example. Requires hours of reworking just to get close.)
But we already had tools that gave us 90%
More reliable ones.
Deterministic ones
Fair, I’ve used it recently to translate a translations.ts file to Spanish.
But for repetitive code, I feel like it is kind of a slow down sometimes. I should have refactored instead.
Some code is boilerplate and can’t be distilled down more. It’s nice to point an AI to a database schema and say “write the Django models, admin, forms, and api for this schema, using these authentication permissions”. Yeah I’ll have to verify it’s done right, but that gets a lot of the boring typing out of the way.
I use it for writing code to call APIs and is a huge boon.
Yeah, you have to check the results, but it’s way faster than me.
That’s fair.
This is a thing people miss. “Oh it can generate repetitive code.”
OK, now who’s going to maintain those thousands of lines of repetitive unit tests, let alone check them for correctness? Certainly not the developer who was too lazy to write their own tests and to think about how to refactor or abstract things to avoid the repetition.
If someone’s response to a repetitive task is copy-pasting poorly-written code over and over we call them a bad engineer. If they use an AI to do the copy-paste for them that’s supposed to be better somehow?
Similarly I find it very useful for if I’ve written a tool script and really don’t want to write the command line interface for it.
“Here’s a well-documented function - write an argparser for it”
…then I fix its rubbish assumptions and mistakes. It’s probably not drastically quicker but it doesn’t require as much effort from me, meaning I can go harder on the actual function (rather than keeping some effort in reserve to get over the final hump).
I’ve had plenty of success using it to build things like docker compose yamls and the like, but for anything functional, it does often take a few tries to get it right. I never use its raw for anything in production. Only as a leaping off point to structure things.
For the missing 10% : the folder with copies of the code you have already wrote doing that.
(asking GPT-5 to do your dynamic SQL calls is inviting disaster, for example. Requires hours of reworking just to get close.)
Maybe it’s the dynamic SQL calls themselves that are inviting disaster?
Dynamic SQL in of itself not an issue, but the consequences (exacerbated by SQL’s inherent irrecoverability from mistakes - hope you have backups) have stigmatized its use heavily. With an understanding of good practice, a proper development environment and a close eye on the junior devs, there’s no inherent issue to using it.
With an understanding of good practice, a proper development environment and a close eye on the junior devs, there’s no inherent issue to using it.
My feelings about C/C++ are the same. I’m still switching to Rust, because that’s what the company wants.
So much of the AI hype has been pointing to ten year old technology repackaged in a slick new interface.
AI is the iPod to the Zune of yesteryear.
Repackaging old technology in slick new interfaces is what we have been calling progress in computer software for 40+ years.
I mean… I like to think we’ve done a bit more than that. FFS, file compression alone has made leaps and bounds since the 3.25" floppy days.
Also, as a T-SQL guy, I gotta say there’s a world of difference between SQL 2008 and SQL 2022.
But I’ll spot you that a lot of the last 10-15 years has produced herculean efforts in answering the question “How can we squeeze a few more ads into your GUI?”
There have been a few “milestone moments” like map-reduce Hadoop, etc. Still, there’s a whole lot of eye candy wrapped around the same old basic concepts.
I found that it only does well if the task is already well covered by the usual sources. Ask for anything novel and it shits the bed.
That’s because it doesn’t understand anything and is just vomiting forth output based on the code that wad fed into it.
At absolute best.
My experience is it’s the bottom stack overflow answers. Making up bullshit and nonexistent commands, etc.
They should make it more like SO and have it chastise you for asking a stupid question you should already know the answer to lol
If you know what you want, its automatic code completion can save you some typing in those cases where it gets it right (for repetitive or trivial code that doesn’t require much thought). It’s useful if you use it sparingly and can see through its bullshit.
For junior coders, though, it could be absolute poison.
At least when I’m cleaning up after shit devs who used Stack Overflow, I can usually search using a fragment of their code and find where they swiped it from and get some clue what the hell they were thinking. Now that they’re all using AI chatbots, there’s no trace.
In the beginning there were manufacturer’s manuals, spec sheets, etc.
Then there were magazines, like Byte, InfoWorld, Compute! that showed you a bit more than just the specs
Then there were books, including the X for Dummies series that purported to teach you theory and practice
Then there was Google / Stack Overflow and friends
Somewhere along there, where depends a lot on your age, there were school / University courses
Now we have “AI mode”
Each step along that road has offered a significant speedup, connecting ideas to theory to practice.
I agree, all the “magic bullet” AI hype is far overblown. However, with AI something I new I can do is, interactively, develop a specification and a program. Throw out the code several times while the spec gets refined, re-implemented, tried in different languages with different libraries. It’s still only good for “small” projects, but less than a year ago “small” meant less than 1000 lines of code. These days I’m seeing 300 lines of specification turn into 1500-3000 lines of code and have it running successfully within half a day.
I don’t know if we’re going to face a Kurzweilian singularity where these things start improving themselves at exponential rates, or if we’ll hit another 30 year plateau like neural nets did back in the 1990s… As things are, Claude helps me make small projects several times faster than I could ever do with Google and Stack Overflow. And you can build significant systems out of cooperating small projects.
“No Duh,” say senior developers everywhere.
I’m so glad this was your first line in the post
No duh, says a layman who never wrote code in his life.
And many between “seniour developers everywhere” and “a layman who never wrote code in his life”.
Like me, I’m saying it too. A big ol “No duh”.
Disbelieve the hype.
Heh. That’s a fun chart. If that’s programming aptitude, I scored 80 on that part of the broad spectrum aptitude test I got a sneak-peek chance to do several parts of. Well now I know why I’m so easily in agreement with “senior coders”, if it is programming aptitude quotient. If it’s just iq, … pulls hood up to block the glare.
Daunting that there may be a middling bias getting apparent advantages. Evolution may not serve us well like that.
Oddly enough, my grasp of coding is probably the same as the guy in the middle but I still know that LLM generated code is garbage.
All the best garbage to learn from, to debug, debug, debug, sharpening those skills.
That’s kinda wrong though. I’ve seen llm’s write pretty good code, in some cases even doing something clever I hadn’t thought of.
You should treat it as any junior though, and read the code changes and give feedback if needed.
Yeah, I actually considered putting the same text on all 3, but we gotta put the idiots that think it’s great somewhere! Maybe I should have put it with the dumbest guy instead.
I guess I’m one of the idiots then, but what do I know. I’ve only been coding since the 90s
Think this one needs a bimodal curve with the two peaks representing the “caught up in the hype” average coder and the realistic average coder.
Agreed, that’s why it didn’t feel quite right when I made it.
Yeah, there’s definitely morons out there who never bothered to even read about the theory of good code design.
Thing is both statements can be true.
Used appropriately and in the right context, LLMs can accelerate some select work.
But the hype level is ‘human replacement is here (or imminent, depending on if the company thinks the audience is willing to believe yet or not)’. Recently Anthropic suggested someone could just type ‘make a slack clone’ and it’ll all be done and perfect.
This. Like with any tool you have to learn how and when to use it. I’ve started to get the hang of what tasks it improves but I don’t think I’ve regained the hours I’ve spent learning it yet.
But as the tool and my understanding of it improves it’ll probably happen some day.
If not to editorialize, what else is the text box for? :)
AI coding is the stupidest thing I’ve seen since someone decided it was a good idea to measure the code by the amount of lines written.
It did solve my impostor syndrome though. Turns out a bunch of people I saw to be my betters were faking it all along.
More code is better, obviously! Why else would a website to see a restaurant menu be 80Mb? It’s all that good, excellent code.
The most immediately understandable example I heard of this was from a senior developer who pointed out that LLM generated code will build a different code block every time it has to do the same thing. So if that function fails, you have to look at multiple incarnations of the same function, rather than saying “oh, let’s fix that function in the library we built.”
Yeah, code bloat with LLMs is fucking monstrous. If you use them, get used to immediately scouring your code for duplications.
Yeah if I use it and it generatse more than 5 lines of code, now I just immediately cancel it out because I know it’s not worth even reading. So bad at repeating itself and falling to reasonably break things down in logical pieces…
With that I only have to read some of it’s suggestions, still throw out probably 80% entirely, and fix up another 15%, and actually use 5% without modification.
There are tricks to getting better output from it, especially if you’re using Copilot in VS Code and your employer is paying for access to models, but it’s still asking for trouble if you’re not extremely careful, extremely detailed, and extremely precise with your prompts.
And even then it absolutely will fuck up. If it actually succeeds at building something that technically works, you’ll spend considerable time afterwards going through its output and removing unnecessary crap it added, fixing duplications, securing insecure garbage, removing mocks (God… So many fucking mocks), and so on.
I think about what my employer is spending on it a lot. It can’t possibly be worth it.