I can’t help but think that when a model gets to select which model and how much...

jstummbillig · 2025-08-10T17:54:51 1754848491

How would we not be able to know?

If we don't know because it's good optimization that does not impact us in a noticeable way, then that seems like a fine trade-off.

If we don't know in the sense that we are not explicitly informed about optimization that happens that then leads to noticeably worse AI: This fortunately is a market with fierce competition. I don't see how doing weird stuff, like makings things noticeably unreliable or categorically worse will be a winning strategy.

In either case "not knowing" is really not an issue.

scratcheee · 2025-08-10T18:03:46 1754849026

Easy: provide high quality output when being tested for a new task, The moment you are done outperforming the competition in the tests and have hit production you slowly ramp down quality, perhaps with exceptions when the queries look like more testing.

Same problem as ai safety, but the actual problem is now the corporate greed of humans behind the ai rather than an actual agi trying to manipulate you.

jstummbillig · 2025-08-10T19:06:45 1754852805

This is confusing on so many levels.

cherry_tree · 2025-08-11T04:20:04 1754886004

See also: Volkswagen emissions test scandal

hellisothers · 2025-08-10T19:48:51 1754855331

How do you notice hallucinations in a field you’re not familiar with? You may value focusing on different types of inputs or outputs than the model picker does and now you have no control.

We don’t know what we don’t know, we can’t always judge what is categorically right or wrong to make an informed decision. What we can do is decide who we want to ask a question based on competence.

jstummbillig · 2025-08-10T19:59:20 1754855960

With 700 million(?) users, we have a lot of people familiar with every field. I am no biochemist, but if chatgpt starts spouting nonsense in that field biochemists will notice and speak up, and I will notice that they do.

What's the idea? How does creeping, far reaching incompetence continually get past all of us?

Topfi · 2025-08-11T05:56:58 1754891818

Every individual user would have to be consistently paying attention to discussions outside their expertise and interest. Considering prior stories of LLM usage among multiple legal professionals, wherein the model repeatedly output the potential for errors/“hallucinations”, I highly doubt that will happen. Heck, part of the outcry to reintroduce my personally least useful model 4o was grounded in a preference for subjective agreeableness in the output.

The idea would/could be not intentional dissemination of missinformation, but purely financial. Models are expensive to run, hardware, rack space and power limited and making newer releases seem more robust subjectively can be a powerful incentive.

With prior models we already have seen quantization post release and it’s been a personal pet peeve of mine that this should be communicated via a changelog, with the router there is one more quite powerful, potentially even less transparent way for providers to put their thumb on the scale. For now, GPT-5 does very impressively in my limited use cases and testing, especially considering pricing, but the concern that this may (and past experience tells me likely) change soon enough remains.

jstummbillig · 2025-08-11T08:37:23 1754901443

Side note, responding to AI written HN comments is something I will still have to get used to

4d4m · 2025-08-10T17:18:31 1754846311

+1 this release feels more like agent orchestrator updates to save on cost to serve

ralusek · 2025-08-10T17:21:21 1754846481

It’s not obvious that the most profitable path for OpenAI would be saving on costs, it might be that the model is actually tuned to overthink because they can charge on those extra thinking tokens.

splatzone · 2025-08-10T17:27:12 1754846832

That would make sense for the API where usage is metered. But outside of that, most ChatGPT users will be free or paying a flat monthly fee, so there's a real incentive for OpenAI to optimise for cost.

tough · 2025-08-10T17:26:36 1754846796

ChatGPT is a sub based model, their api pricing is based on usage

prob different incentives at each

taskforcegemini · 2025-08-11T16:36:59 1754930219

I'm thinking this is what happened to google search. Definitely feels this way.

interestica · 2025-08-10T18:30:18 1754850618

I mean this is already kind of the case with general search (eg google) as it is now.