I think there’s always a danger of these foundational model companies doing RLHF...

miki123211 · 2025-10-10T00:22:59 1760055779

This feels like RLVR, not RLHF.

With RLVR, the LLM is trained to pursue "verified rewards." On coding tasks, the reward is usually something like the percentage of passing tests.

Let's say you have some code that iterates over a set of files and does processing on them. The way a normal dev would write it, an exception in that code would crash the entire program. If you swallow and log the exception, however, you can continue processing the remaining files. This is an easy way to get "number of files successfully processed" up, without actually making your code any better.

eru · 2025-10-10T02:29:19 1760063359

> This is an easy way to get "number of files successfully processed" up, without actually making your code any better.

Well, it depends a bit on what your goal is.

Sometimes the user wants to eg backup as many files as possible from a failing hard drive, and doesn't want to fail the whole process just because one item is broken.

orisho · 2025-10-10T03:22:48 1760066568

You're right, but the way to achieve this is to allow the error to propagate at the file level, then catch it one function above and continue to the next one.

However, LLM generated code will often, at least in my experience, avoid raising any errors at all, in any case. This is undesirable, because some errors should result in a complete failure - for example, errors which are not transient or environment related but a bug. And in any case, a LLM will prefer turning these single file errors into warnings, though the way I see it, they are errors. They just don't need to abort the process, but errors nonetheless.

eru · 2025-10-10T06:45:13 1760078713

Yes, that's cleaner.

> And in any case, a LLM will prefer turning these single file errors into warnings, though the way I see it, they are errors.

Well, in general they are something that the caller should have opportunity to deal with.

In some cases, aborting back to the caller at the first problem is the best course of action. In some other cases, going forward and taking note of the problems is best.

In some systems, you might event want to tell the caller about failures (and successes) as they occur, instead of waiting until the end.

It's all very similar to the different options people have available when their boss sends them on an errand and something goes wrong. A good underling uses their best judgement to pick the right way to cope with problems; but computer programs don't have that, so we need to be explicit.

See https://en.wikipedia.org/wiki/Mission-type_tactics for a related concept in the military.

cma · 2025-10-09T22:53:17 1760050397

And more advanced users are more likely to opt out of training on their data, Google gets around it with a free api period where you can't opt out and I think from did some of that too, through partnerships with tool companies, but not sure if you can ever opt out there.

cma · 2025-10-10T20:37:10 1760128630

*grok, not 'from'

justatdotin · 2025-10-10T01:06:41 1760058401

'over-commenting simple code' is preparing it for future agent work. pay attention to those comments to learn how you can better scaffold for agents.

mnahkies · 2025-10-10T07:27:25 1760081245

They do seem to leave otherwise useless comments for itself. Eg: on the level of

// Return the result

return result;

I find this quite frustrating when reading/reviewing code generated by AI, but have started to appreciate that it does make subsequent changes by LLMs work better.

It makes me wonder if we'll end up in a place where IDEs hide comments by default (similar to how imports are often collapsed by default/automatically managed), or introduce some way of distinguishing between a more valuable human written comment and LLM boilerplate comments.

stuaxo · 2025-10-10T14:18:01 1760105881

They should have a step to remove those sorts of comments, they only add noise to the code.