> I think what we should really ask ourselves is: “Why do LLM experiences vary so much among developers?”
Some possible reasons:
* different models used by different folks, free vs paid ones, various reasoning effort, quantizations under the hood and other parameters (e.g. samplers and temperature)
* different tools used, like in my case I've found Continue.dev to be surprisingly bad, Cline to be pretty decent but also RooCode to be really good; also had good experiences with JetBrains Junie, GitHub Copilot is *okay*, but yeah, lots of different options and settings out there
* different system prompts, various tool use cases (e.g. let the model run the code tests and fix them itself), as well as everything ranging from simple and straightforward codebases that are dime a dozen out there (and in the training data), vs something genuinely new that would trip up both your average junior dev, as well as the LLMs
* open ended vs well specified tasks, feeding in the proper context, starting new conversations/tasks when things go badly, offering examples so the model has more to go off of (it can predict something closer to what you actually want), most of my prompts at this point are usually multiple sentences, up to a dozen, alongside code/data examples, alongside prompting the model to ask me questions about what I want before doing the actual implementation
* also sometimes individual models produce output for specific use cases badly, I generally rotate between Sonnet 4.5, Gemini Pro 2.5, GPT-5 and also use Qwen 3 Coder 480B running on Cerebras for the tasks I need done quickly and that are more simple
With all of that, my success rate is pretty great and the statement about the tech not being able to "...barely follow a simple instruction" holds untrue. Then again, most of my projects are webdev adjacent in mostly mainstream stacks, YMMV.
> Then again, most of my projects are webdev adjacent in mostly mainstream stacks
This is probably the most significant part of your answer. You are asking it to do things for which there are a ton of examples of in the training data. You described narrowing the scope of your requests too, which tends to be better.
Some possible reasons:
With all of that, my success rate is pretty great and the statement about the tech not being able to "...barely follow a simple instruction" holds untrue. Then again, most of my projects are webdev adjacent in mostly mainstream stacks, YMMV.