The point is that "internal database of statistical correlations" is a world model of sorts. We all have an internal representation of the world featuring only probabilistic accuracy after all. I don't think the distinction is as clear as you want it to be.
> "internal database of statistical correlations" [would be] a world model of sorts
Not in the sense used in the article: «memorizing “surface statistics”, i.e., a long list of correlations that do not reflect a causal model of the process generating the sequence».
A very basic example: when asked "two plus two", would the interface reply "four" because it memorized a correlation of the two ideas, or because it counted at some point (many points in its development) and in that way assessed reality? That is a dramatic difference.
so humans don't typically have world models then. you ask most people how they arrived at their conclusions (outside of very technical fields) and they will confabulate just like an LLM.
the best example is phenomenology, where people will grant themselves skills that they don't have, to reach conclusions. see also heterophenomenology, aimed at working around that: https://en.wikipedia.org/wiki/Heterophenomenology
Let me rephrase it, there could be a misunderstanding: "Surely many people cannot think properly but some have much more ability than others: the proficient ability to think well is a potential (expressed in some and not expressed in many".
To transpose that to LLMs, you should present one that systematically gets it right, not occasionally.
And anyway, the point was about two different processes before statement formulation: some output the strongest correlated idea ("2+2" → "4"); some look at the internal model and check its contents ("2, 2" → "1 and 1, 1 and 1: 4").
Did (could) Einstein think about things long and hard? Yes - that is how he explained having solved problems ("How did you do it?" // "I thought about long and hard").
The artificial system in question should (1) be able to do it, and (2) do it systematically, because it is artificial.
In a way the opposite, I'd say: the archetypes in Plato are the most stable reality and are akin to the logos that the past and future tradition hunted - knowing it is to know how things are (how things work), hence knowledge of the state of things, hence a faithful world model.
To utter conformist statements spawned from surface statistics would be "doxa" - repeating "opinions".
If you mean that just like the experiencer in the cave, seeing shadows instead of things (really, things instead of Ideas), the machine sees words instead of things, that would be in a way very right.
But we could argue it could not be impossible to create an ontology (a very descriptive ontology - "this is said to be that, and that, and that...") from language alone. Hence the question whether the ontology is there. (Actually, the question at this stage remains: "How do they work - in sufficient detail? Why the appearance of some understanding?")
Yeah, what I'm saying is that something very similar to an ontology is there. (It's incomplete but extensive, not coherent, and it's deeper in details than anything anybody ever created.)
It's just that it's a kind of a useless ontology, because the reality it's describing is language. Well, only "kind of useless" because it should be very useful to parse, synthesize and transform language. But it doesn't have the kind of "knowledge" that most people expect an intelligence to have.
Also, its world isn't only composed of words. All of them got a very strong "Am I fooling somebody?" signal during training.