>"I gave it the same prompt, and it came out different"
1:1 reproducibility is much easier in LLMs than in software building pipelines. It's just not guaranteed by major providers because it makes batching less efficient.
> 1:1 reproducibility is much easier in LLMs than in software building pipelines
What’s a ‘software building pipeline’ in your view here? I can’t think of parts of the usual SDLC that are less reproducible than LLMs, could you elaborate?
Reproducibility across all existing build systems took a decade of work involving everything from compilers to sandboxing, and a hard reproducibility guarantee in completely arbitrary cases is either impossible or needs deterministic emulators which are terribly slow. (e.g. builds that depend on hardware probing or a simulation result)
Input-to-output reproducibility in LLMs (assuming the same model snapshot) is a matter of optimizing the inference for it and fixing the seed, which is vastly simpler. Google for example serves their models in an "almost" reproducible way, with the difference between runs most likely attributed to batching.
It’s not just about non-determinism, but about how chaotic LLMs are. A one word difference in a spec can and frequently does produce unrecognizably different output.
If you are using an LLM as a high level language, that means that every time you make a slight change to anything and “recompile” all
of the thousands upon thousands of unspecified implementation details
are free to change.
You could try to ameliorate this by training LLMs to favor making fewer changes, but that would likely end up encoding every bad architecture decisions made along the way and essentially forcing a convergence on bad design.
Fixing this I think requires judgment on a level far beyond what LLMs have currently demonstrated.
I'm very specifically addressing prompt reproducibility mentioned above, because it's a notorious red herring in these discussions. What you want is correctness, not determinism/reproducibility which is relatively trivial. (although thinking of it more, maybe not that trivial... if you want usable repro in the long run, you'll have to store the model snapshot, the inference code, and make it deterministic too)
>A one word difference in a spec can and frequently does produce unrecognizably different output.
This is well out of scope for the reproducibility and doesn't affect it in the slightest. And for practical software development this is also a red herring, the real issue is correctness and spec gaming. As long as the output is correct and doesn't circumvent the intention of the spec, prompt instability is unimportant, it's just the ambiguous nature of the domain LLMs and humans operate in.
Well if you want to use it as a high level language where you check in the spec and regenerate the code then prompt instability/chaotic output makes that infeasible.
You can’t just tell users “sorry there are a million tiny differences all over the app every time we change the slightest thing, that’s just the ambiguous nature of reality”.
And this kind of reproducibility is insensitive to prompt instability. If your inference is deterministic (trivial if within the lifetime of a snapshot), you give it the same docs and get the same result.
Obviously this article is advocating for spec as source code and not checking in code to keep it around between “compilations”.
Why would you care about reproducibility at all if you were checking in the code as well. If you check in the code, you’d never need to rerun the same prompt.
Reproducible builds is not the same as reproducible outcomes.
Rebuilding your project with a different version of a dependency is not the same as suddenly a program accepts txt files as attachments instead of pdfs because the model was borrowing a txt example from its training data and got sidetracked.
Gemini is very paranoid in its reasoning chain, that I can say for sure. That's a direct consequence of the nature of its training. However the reasoning chain is not entirely in human language.
None of the studies of this kind are valid unless backed by mechinterp, and even then interpreting transformer hidden states as human emotions is pretty dubious as there's no objective reference point. Labeling this state as that emotion doesn't mean the shoggoth really feels that way. It's just too alien and incompatible with our state, even with a huge smiley face on top.
Utilitarian device to type on became an object of obsessive consumption, collection, customization, showing off, fashion (RGB lighting, forced mechanical over scissor distinction even though many people prefer the latter, etc). Yeah of course it's a craze, without scare quotes.
The same gear obsession happened to the gaming mice world, but it was much tamer by comparison.
> Yeah of course it's a craze, without scare quotes.
This is a simplistic opinion to hold. You'd be better complaining that some people enjoy things. Form factor is important, also tactile response and sound. Features like embedding USB hubs or touchpads are essentially a given in laptops. Not being forced to throw a keyboard to the trashbin just because a key failed.
Is this a craze?
Ask yourself this: why are there people paying good money for gaming keyboards? Or Apple's magic keyboard. Is it a craze?
Or are you just complaining that other people enjoy things?
1. Some guys did a trivial prompt injection attack, said "imagine if a driverless vehicle used this model", and published it. No problem, someone has to state the obvious.
2. The Register runs this under the clickbait title pretending real autonomous cars are vulnerable to this, with the content pretending this study isn't trivial and is relevant to real life in any way.
I knew The Register is a low quality ragebait tabloid (I flag most of their articles I bother to read), but this is garbage even for them.
K2 in your example is using the GPT reply template (tl;dr - terse details - conclusion, with contradictory tendencies), there's nothing unique about it. That's exactly how GPT-5.0 talked.
The only model with a strong "personality" vibe was Claude 3 Opus.
> The only model with a strong "personality" vibe was Claude 3 Opus.
Did you have the chance to use 3.5 (or 3.6) Sonnet, and if yes, how did they compare?
As a non-paying user, 3.5 era Claude was absolutely the best LLM I've ever used in terms of having a conversation. It felt like talking to a human and not a bot. Its replies were readable, even if they were several paragraphs long. I've unfortunately never found anything remotely as good.
Pretty poorly in that regard. In 3.5 they killed Claude 3's agency, pretty much reversing their previous training policy in favor of "safety", and tangentially mentioned that they didn't want to make the model too human-like. [1] Claude 3 was the last version of Claude, and one of the very few models in general, that had a character. That doesn't mean it wasn't writing slop though, falling into annoying stereotypes is still unsolved in LLMs.
It definitely talks a lot differently than GPT-5 (plus it came out earlier), the example i gave just looks a bit like it maybe. best to try using it yourself a bit, my prompt isn't the perfect prompt to illustrate it or anything. Don't know about Claude because it costs money ;)
Ask any modern (post-GPT-2) LLM about a random color/name/city repeatedly a few dozen times, and you'll see it's not that random. You can influence this with a prompt, obviously, but if the prompt stays the same each time, the output is always very similar despite the existence of thousands of valid alternatives. Which is the case for any vibecoded thing that doesn't specify the color palette, in particular.
This effect is largely responsible for slop (as in annoying stereotypes). It's fixable in principle, but there's pretty little research and I don't see big AI shops care enough.
Physically speaking, time is just the order of events. The model absolutely has time in this sense. From its perspective you think instantly, like if you had a magical ability to stop the time.
Kinda but not really. The model thinks it's 2024 or 2025 or 2026, but really it has no concept of "now" and this no sense of past or present... Unless it's instructed to think it's a certain date and time. If every time you woke up completely devoid of memory of your past it would be hard to argue you have a good sense of time.
In the technical sense I mentioned (physical time as the order of changes) it absolutely does have the concept of now, past, and present, it's just different from yours (2024, 2026, ...), and in your time projection they only exist during inference. And the entire autoregressive process and any result storage serve as a memory that preserves the continuity of their time. LLMs are just not very good at ordering and many other things in general.
Most gamers don't give a shit about openness. A much more likely outcome is "big tech" following the numbers and slowly making Linux unusable by using EEE or any other tactic under the pretense of usefulness.
I don't think this is a given. I think most gamers so far haven't cared about openness because pragmatically, it didn't matter for them.
Now they're seeing the long-term effect of not caring about that though, which is why we're suddenly seeing a movement of gamers moving to Linux, and trying to get others to move with them, because they realize the importance now, as their desktops are slowly collapsing over Microsoft's decision to let AI do all the programming, and having zero QA before releasing stuff to the public.
They don't care about it as an abstract idea, but they do notice that Windows 11 is worse than Windows 10 was worse than Windows 8 was worse than Windows 7.
I'm not saying there have been zero useful improvements in later Windows releases, but 7 looked good and did what you told it to. "Openness" is a very abstract idea but "Only does what you tell it to" is a selling point for Linux.
You know it's not going to upload all your documents to OneDrive and then erase them from the computer.
My opinion on that may be colored by the fact that I had a Surface Pro 3, the one place where Windows 8.1 was actually great to use, and taking away some of the focus on tablet use was a regression. Overall you're right though, outside of tablets W10 was an improvement, because 8 tried to stick the tablet UI into desktops.
I was recently connecting to some server with the Windows 8 derived version of Windows Server and gosh that full screen start menu is stupid with a mouse.
Ironically I built a Linux box for mainly local models with some RGBs because I wanted tasteful accent lights to match the room, but my motherboard isn't supported by OpenRGB so they're stuck on either nothing or 'unicorn vomit' mode until some indefinite point in the future. This is the first time I've run into a stereotypically Linux issue in nearly a decade (on sane hardware) I think!
Not a fan of those aquarium PC cases though, they sacrifice airflow for aesthetics which isn't a great shout. I have a 5090 and a 9950X in a more traditional case and my temperatures are fine with air cooling alone. Not sure you'd get away with that in an aquarium case with poorer airflow, at least without it sounding like a hairdryer all day.
I never understood the giant focus on side windows. If you want to see your components while you're using your PC, why not just build inside a transparent case, or build on a workbench/open style (caseless)
Specifically less dust if you have filters on the intakes. Positive pressure means you'll have air coming in where the fans are blowing it in (through a filter), and any gaps in the case where there aren't filters will have air flowing out due to the pressure.
If you have negative pressure you'll be sucking in air through the gaps and that air won't go through a filter, hence more dust.
Is this really part of the ATX spec though? Or just something people have learned to do for modern cases with air filters?
Do you run linux at the moment? I've personally found my switch to CachyOS from Windows 11 one of the biggest factors in making my PC run silent/near silent. Happy to elaborate if you're curious.
I feel like that makes sense. Linux users are messing with all the control given to them in software by a free OS, while windows user get only what they're allowed in software and Microsoft has not figured out how to keep them from modifying their hardware... yet. So the flashy LED folks are making their modifications where still allowed.
Maybe the people who go hardcore like that, with the obnoxious PC cases, but there are lots of casual-to-less-casual gamers out there who will be happy enough with Bazzite.
There’s a whole spectrum of PC gamers, and I think Linux+Proton can appeal to most of them. Let the people spending $10,000 on a glowing case make their own bad decisions.
FWIW: I have a pile of old Intel / NVIDIA machines that no longer boot Windows. They're all > 2GHz, > 8GB DRAM, and have more than enough horsepower to run modern casual titles. Next to that pile, I have a pile of games that no longer run under Windows.
I also have a glowing case PC. Out of the box, it's possible to change the fan light color patterns from Linux.
I had one problem putting Devuan on it:
If you plug the gaming keyboard 2.4GHz dongle into the monitor, the bios doesn't enumerate far enough down the USB tree to find it. So, you can't enter the bios and tell it to boot from USB. Then, the windows setup screen pops up.
After a few force reboots (M$ removed the "shut down cleanly" button from the language chooser), Windows goes into deep diagnostics mode on each boot trying to figure out why it keeps crashing out during the install flow. So, each debug step of "why can't I get into the bios?" takes a few minutes.
The solution was to plug the keyboard dongle directly into the box. The only time the fan has come on after boot (I think it likes to knock the dust off itself when it turns on) was when I told it to download my steam library all at once.
> M$ removed the "shut down cleanly" button from the language chooser
Not sure what language chooser you're talking about here, but if you're trying to shutdown Windows without hybrid shutdown to access the uefi, there's two switches you can use with shutdown.exe: `shutdown /s /t 0` will perform a full shutdown without hibernating the system session (not hybrid shutdown, that can be done with another parameter). If you want to reboot into your UEFI menu, use `shutdown /r /fw /t 0`
I may be confusing the time parameter, it might be `/t now` and not `/t 0`; I usually use a dedicated command to reboot to UEFI via slickrun.
They don’t care about FOSS, but they care about “computer lets me do what I want”.
Discord is obviously proprietary but it’s actually a very modular platform that gives a lot of nice controls. It’s easy to make your own “server”, it’s easy to add whatever bots you want, it’s easy to moderate. From a consumer perspective, it’s “open”.
Also, I know that this wasn’t your point, but I do feel compelled to point out that Discord works fine on Linux.
Right, but that proves nothing, is there something that is more open and better than Discord, for this group of people? Otherwise I'd say my argument applies in exactly the same way. Pragmatism wins, so why change unless there is a need?
I actually did selfhost my own matrix server to communicate with my friends while gaming. Works great on my steamdeck and I’ve got bazzite on my laptop. Most games I’m interested in work great on Linux and anything that doesn’t I just don’t play. There are so many games that do work great, but I can see people skipping Linux because of fomo.
With the Windows 11 debacle, many are learning first hand about what closed ecosystems force on you. It seems every feed I have that has gaming as an interest has an article about Linux as the future. Clearly someone is reading these articles.
Of course they don’t care about F/OSS — the vast majority of games are closed proprietary software. The small minority of Linux gamers are there for anti-Windows reasons rather than pro-Linux or F/OSS reasons. Which given Microsoft is now signaling a pull back on AI and a gear to improved performance/quality in Windows, if those anti-reasons evaporate, you’ll have the more frustrated Linux gamers potentially move back.
Linux needs a positive reason for Linux rather than relying on anti-Windows reasons (and there are, but I see those reasons outside of the gaming space).
There are 1B Windows 11 devices. Granted not all are for games, but it is not an unpopular OS by the numbers alone.
The phones were prior with "play protect" certification. It's all being captured. Since we can't seem to have more virtuous companies, we need more regulation.
Of the top 10 games on steam, 8 of them are multiplayer. Until they have top multiplayer games, they have nothing. The reality is most of these studios aren't going to enable Linux because they're already on record stating it would make the cheating worse.
Most gamers are idiots. They are okay paying exorbitant sums for broken games and most have no problem with forced rootkits.
I don't think gaming is or should be driving people to Linux.
Microslop turning their OS into a data mining and ad platform should and is pushing normal, rational people to Linux. But, most gamers don't care about such things as long as they are getting their sweet, sweet dopamine hit.
Ironically, lower framerates(even though they are higher than the human eye and nervous system can perceive) on Windows 11 might push gamers onto Linux.They still want their rootkits, though.
It is always the dumbest reasons that get gamers upset.
but they do care about AI slop and owning their own system.
a lot of FOSS is an abstraction but even the rubes can realize that they're being spied on, that Big Tech wants to be Big Brother, and is enshittifying their experience to that end.
Gamers generally game on PC because they like building their system. Otherwise they would use a PS5 Pro or whatever.
The PC is an “open” platform in that you can buy and choose your own hardware. Intel vs AMD vs Nvidia, Seagate vs Western Digital, etc….
Using open software isn’t really more than a few steps from that. Being able to pick how your system works and customizing it to your liking is basically the software version of picking your PC parts. Gamers also like to run all sorts of software to rice there Windows desktops and will install all sorts of abominations tha mess with the Windows desktop shell. Much easier and fun to rice a Linux desktop.
Linux enthusiasts need to just learn how to appeal to their sensibilities. Valve knows, and they are very effective at getting people excited for a Linux based gaming platform. They’ve also proven they can walk the walk, not just talk the talk.
Sure, they won’t give a crap about the source code but there is more to libre software than just being able to change the source code if you want.
We’re also at an inflection point where people are getting really really really annoyed with companies like Microsoft treating them like lab rats and shoving Copilot down their throat when they don’t want it. There is a chink in the armor; people are opening up to the idea of alternative platforms where you don’t have to worry about any of that garbage.
> making Linux unusable by using EEE or any other tactic
This will never happen because projects will just be forked.
> Gamers generally game on PC because they like building their system. Otherwise they would use a PS5 Pro or whatever.
You're making a huge assumption here. I think that's a really small percentage. Most people game on PC because certain games they like to play are only on PC, or are much better suited to PC, or because their friends are on PC, or because they want to play on the go (Steam Deck is very recent and still not widely used), or because they need to have a PC anyway. Or because they grew up with it at home/in the neighborhood because there was no money for a console. Or because "Because they like building their system", I'm going to peg at <10%.
It's a bit on a tangent because it's about hardware rather than OS choice, but the next few years are going to be a stress test on how much people value PC versus the cash-value of components increases, and what happens to the numbers of people entering the market, staying with older systems or upgrading (or replacing/complimenting with a console). Someone saying they think it's worth a lot is different to opening their wallet.
One aspect I think will be interesting is to compare what happens to attitudes with prices changes in more affluent markets like North America or Western Europe compare to how PC has been approached in other markets like Asia or South America.
I got into PC gaming in ~2009 primarily because it is so much cheaper than console. Steam sales and Humble Bundle allowed me to buy so many more games for less money.
The initial cost upfront was higher than a console but if you want a lot of games it ends up being worth it.
Yeah, I was happily surprised to be reminded of that when I set up my NixOS Jovian box that I “consolified” by having it boot into Gamescope and the SteamOS interface.
It’s plugged into my TV, with a wireless controller, and I have direct access to around 800 games immediately.
There are consoles that don’t even have 800 games in their entire library and I have 800 I can play whenever I want, some of which I purchased almost two decades ago.
Many game mods and community maps, etc. are only available on PC. You can play the vanilla version on console, but not the mods you watch Twitch streamers playing. So, it's not b/c they like building PCs, it's because they want to play the mods with their friends.
I would not worry too much about the mod community! They are the one persistant group of people who will hack the software to their liking. Yes you can't play full FiveM GTA V right now, but it will get there eventually. There is nothing technical that is limiting the mods from working on Proton, just time from some annoyed mod dev that has had enough with windows, and it will be migrated over.
>This will never happen because projects will just be forked.
There's a chasm of difference between a technical fork and a meaningful fork. The entire point of EEE is relying on usefulness and convenience combined with network effects to make the entire system restricted and control it. Sure, you can go and fork anything you want - nobody stops you, technically. But you're getting the rug pulled from under your feet in any case.
You can witness the early stage of subversion with very useful software (without any hint of irony) made by people who "left" Microsoft: https://news.ycombinator.com/item?id=46784572
> Gamers generally game on PC because they like building their system. Otherwise they would use a PS5 Pro or whatever.
I haven't built a PC in over 2 decades and I can't stand trying to game on a console or on a phone. I buy a stock machine like AlienWare, overwrite Windows with Kubuntu and go to town gaming.
Why not at the side, though? Better use of the vertical space and no issues with touchscreens (interactable items on top, hand reaching from the side). That's how I do it on every desktop/laptop machine I own.
>"I gave it the same prompt, and it came out different"
1:1 reproducibility is much easier in LLMs than in software building pipelines. It's just not guaranteed by major providers because it makes batching less efficient.
reply