LLM's have not "read" social science research and they do not "know" about the outcomes, they have been trained to replicate the exact text of social science articles.
The articles will not be mutually consistent, and what output the LLM produces will therefore depend on what article the prompt most resembles in vector space and which numbers the RNG happens to produce on any particular prompt.
I don’t think essentialist explanations about how LLMs work are very helpful. It doesn’t give any meaningful explanation of the high level nature of the pattern matching that LLMs are capable of. And it draws a dichotomic line between basic pattern matching and knowledge and reasoning, when it is much more complex than that.
It's especially important not to antropomorphise when there is a risk people actually mistake something for a humanlike being.
What is least helpful is using misleading terms like this, because it makes reasoning about this more difficult. If we assume the model "knows" something, we might reasonably assume it will always act according to that knowledge. That's not true for an LLM, so it's a term that should clearly be a oided.
They present a statistical model of an existing corpus of text.
If this existing corpus includes useful information it can regurgitate that.
It cannot, however, synthesize new facts by combining information from this corpus.
The strongest thing you could feasibly claim is that the corpus itself models the world, and that the LLM is a surrogate for that model. But this is not true either. The corpus of human produced text is messy, containing mistakes, contradictions, and propaganda; it has to be interpreted by someone with an actual world model (a human) in order for it to be applied to any scrnario; your typical corpus is also biased towards internet discussions, the english language, and western prejudices.
If we focus on base models and ignore the tuning steps after that, then LLMs are "just" a token predictor. But we know that pure statistical models aren't very good at this. After all we tried for decades to get Markov chains to generate text, and it always became a mess after a couple of words. If you tried to come up with the best way to actually predict the next token, a world model seems like an incredibly strong component. If you know what the sentence so far means, and how it relates to the world, human perception of the world and human knowledge, that makes guessing the next word/token much more reliable than just looking at statistical distributions.
The bet OpenAI has made is that if this is the optimal final form, then given enough data and training, gradient descent will eventually build it. And I don't think that's entirely unreasonable, even if we haven't quite reached that point yet. The issues are more in how language is an imperfect description of the world. LLMs seems to be able to navigate the mistakes, contradictions and propaganda with some success, but fail at things like spatial awareness. That's why OpenAI is pushing image models and 3d world models, despite making very little money from them: they are working towards LLMs with more complete world models unchained by language
I'm not sure if they are on the right track, but from a theoretical point I don't see an inherent fault
1) People only speak or write down information that needs to be added to a base "world model" that a listener or receiver already has. This context is extremely important to any form of communication and is entirely missing when you train a pure language model. The subjective experience required to parse the text is missing.
2) When people produce text, there is always a motive to do so which influences the contents of the text. This subjective information component of producing the text is interpreted no different from any "world model" information.
A world model should be as objective as possible. Using language, the most subjective form of information is a bad fit.
The other issue in this argument is that you're inverting the implication. You say an accurate world model will produce the best word model, but then suddenly this is used to imply that any good word model is a useful world model. This does not compute.
> People only speak or write down information that needs to be added to a base "world model" that a listener or receiver already has
Which companies try to address with image, video and 3d world capabilities, to add that missing context. "Video generation as world simulators" is what OpenAI once called it
> When people produce text, there is always a motive to do so which influences the contents of the text. This subjective information component of producing the text is interpreted no different from any "world model" information.
Obviously you need not only a model of the world, but also of the messenger, so you can understand how subjective information relates to the speaker and the world. Similar to what humans do
> The other issue in this argument is that you're inverting the implication. You say an accurate world model will produce the best word model, but then suddenly this is used to imply that any good word model is a useful world model. This does not compute
The argument is that training neural networks with gradient descent is a universal optimizer. It will always try to find weights for the neural network that cause it to produce the "best" results on your training data, in the constraints of your architecture, training time, random chance, etc. If you give it training data that is best solved by learning basic math, with a neural architecture that is capable of learning basic math, gradient descent will teach your model basic math. Give it enough training data that is best solved with a solution that involves building a world model, and a neural network that is capable of encoding this, then gradient descent will eventually create a world model.
Of course in reality this is not simple. Gradient descent loves to "cheat" and find unexpected shortcuts that apply to your training data but don't generalize. Just because it should be principally possible doesn't mean it's easy, but it's at least a path that can be monetized along the way, and for the moment seems to have captivated investors
You did not address the second issue at all. You are inverting the implication in your argument. Whether gradient descent helps solve the language model problem or not does not help you show that this means it's a useful world model.
Let me illustrate the point using a different argument with the same structure:
1) The best professional chefs are excellent at cutting onions
2) Therefore, if we train a model to cuy onions using gradient descent, that model will be a very good profrssional chef
I think the commenter is saying that they will combine a world model with the word model. The resulting combination may be sufficient for very solid results.
Note humans generate their own non-complete world model. For example there are sounds and colors we don’t hear or see. Odors we don’t smell. Etc…. We have an incomplete model of the world, but we still have a model that proves useful for us.
> they will combine a world model with the word model.
This takes "world model" far too literally. Audio-visual generative AI models that create non-textual "spaces" are not world models in the sense the previous poster meant. I think what they meant by world model is that the vast majority of the knowledge we rely upon to make decisions is tacit, not something that has been digitized, and not something we even know how to meaningfully digitize and model. And even describing it as tacit knowledge falls short; a substantial part of our world model is rooted in our modes of actions, motivations, etc, and not coupled together in simple recursive input -> output chains. There are dimensions to our reality that, before generative AI, didn't see much systematic introspection. Afterall, we're still mired in endless nature v. nurture debates; we have a very poor understanding about ourselves. In particular, we have extremely poor understanding of how we and our constructed social worlds evolve dynamically, and it's that aspect of our behavior that drives the frontier of exploration and discovery.
OTOH, the "world model" contention feels tautological, so I'm not sure how convincing it can be for people on the other side of the debate.
Really all you're saying is the human world model is very complex, which is expected as humans are the most intelligent animal.
At no point have I seen anyone here as the question of "What is the minimum viable state of a world model".
We as humans with our ego seem to state that because we are complex, any introspective intelligence must be as complex as us to be as intelligent as us. Which doesn't seem too dissimilar to saying a plane must flap its wings to fly.
Has any generative AI been demonstrated to exhibit the generalized intelligence (e.g. achieving in a non-simulated environment complex tasks or simple tasks in novel environments) of a vertebrate, or even a higher-order non-vertebrate? Serious question--I don't know either way. I've had trouble finding a clear answer; what little I have found is highly qualified and caveated once you get past the abstract, much like attempts in prior AI eras.
> Planning: We demonstrate that V-JEPA 2-AC, obtained by post-training V-JEPA 2 with only 62
hours of unlabeled robot manipulation data from the popular Droid dataset, can be deployed in new environments to solve prehensile manipulation tasks using planning with given subgoals. Without training on any additional data from robots in our labs, and without any task-specific training or reward, the model successfully handles prehensile manipulation tasks, such as Grasp and Pick-and-Place with novel objects and in new environments.
There is no real bar any more for generalized intelligence. The bars that existed prior to LLMs have largely been met. Now we’re in a state where we are trying to find new bars, but there are none that are convincing.
ARC-AGI 2 private test set is one current bar that a large number of people find important and will be convincing to a large amount of people again if LLMs start doing really well on it. Performance degradation on the private set is still huge though and far inferior to human performance.
The Erdos problem was solved by interacting with a formal proof tool, and the problem was trivial. I also don't recall if this was the problem someone had already solved prior but not reported, but that does not matter.
The point is that the LLM did not model maths to do this, made calls to a formal proof tool that did model maths, and was essentially working as the step function to a search algorithm, iterating until it found the zero in the function.
That's clever use of the LLM as a component in a search algorithm, but the secret sauce here is not the LLM but the middleware that operated both the LLM and the formal proof tool.
That middleware was the search tool that a human used to find the solution.
This is not the same as a synthesis of information from the corpus of text.
Regurgitating facts kind of assumes it is a language model, as you're assuming a language interface. I would assume a real "world model" or digital twin to be able to reliably model relationships between phenomena in whatever context is being modeled. Validation would probably require experts in whatever thing is being modeled to confirm that the model captures phenomena to some standard of fidelity. Not sure if that's regurgitating facts to you -- it isn't to me.
But I don't know what you're asking exactly. Maybe you could specify what it is you mean by "real world model" and what you take fact-regurgitating to mean.
But I don't know what you're asking exactly. Maybe you could specify what it is you mean by "real world model" and what you take fact-regurgitating to mean.
You said this:
If this existing corpus includes useful information it can regurgitate that.It cannot, however, synthesize new facts by combining information from this corpus.
So I'm wondering if you think world models can synthesize new facts.
A world model can be used to learn something about the real system. I said synthesize because in the context that LLM's work in (using a corpus to generate sentences) that is what that would look like.
They model the part of the world that (linguistic models of the world posted on the internet) try to model. But what is posted on the internet is not IRL. So, to be glib: LLMs trained on the internet do not model IRL, they model talking about IRL.
His point is that human language and the written record is a model of the world, so if you train an LLM you're training a model of a model of the world.
That sounds highly technical if you ask me. People complain if you recompress music or images with lossy codecs, but when an LLM does that suddenly it's religious?
An LLM has an internal linguistic model (i.e. it knows token patterns), and that linguistic model models humans' linguistic models (a stream of tokens) of their actual world models (which involve far, far more than linguistics and tokens, such as logical relations beyond mere semantic relations, sensory representations like imagery and sounds, and, yes, words and concepts).
So LLMs are linguistic (token pattern) models of linguistic models (streams of tokens) describing world models (more than tokens).
It thus does not in fact follow that LLMs model the world (as they are missing everything that is not encoded in non-linguistic semantics).
At this point, anyone claiming that LLMs are "just" language models aren't arguing in good faith. LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Tokens can represent anything, not just words. Roughly the same architecture can generate passable images, music, or even video.
In this case this is not so. The primary model is not a model at all, and the surrogate has bias added to it. It's also missing any way to actually check the internal consistency of statements or otherwise combine information from its corpus, so it fails as a world model.
I've found Wiki Edu -edited pages with pages of creative writing exercises. When I have read their sources they were clumsily paraphrasing and misunderstanding the source.
LLMs definitely fit the use-case of Wiki Edu students, who are just looking to pass a grade, not to look into a topic because of their interest.
APL inventor says that he was developing not a programming language, but notation to express as much problems as one can. He found that expressing more and more problems with the notation first made notation grow, then notation size started to shrink.
To develop conceptual knowledge (when one's "notation" starts to shrink) one has to have some good memory (re-expressing more and more problems).
The point is that this particular type of exceptional memory has nothing to do with conceptual knowledge, it's all about experiences. This particular condition also makes you focus on your own past to an excessive amount, which would distract you from learning new technologies.
You can't model systems in your mind using past experiences, at least not reliably and repeatedly.
Your lived experience is not a systematic model of anything, what this type of memory gives you is a vivid set of anecdotes describing personally important events.
Richard Dawkins is a weirdo crank these days who's co-authoring questionable books woth sex offenders about transgender issues. And the one thing Christopher Hitchens was most right about was Israel, he was an anti-zionist.
And the neoliberal west has more in common with Israel than Iran, I don't quite understand why you choose to write broad political comments if you don't have the basic background knowledge that would be needwd in this discussion.
It is not broad political comment. If you read the original text and sunnahs as well as follow the interpretations of a lot of scholars like Zakir Naik and others that are unapologetic the truth that is conveniently hidden in discussions easily comes out.
The entirety of the world does not run on Western neoliberal lens and every region has had its history and challenges and fights that due to cognitive limits during discussion are never given their legitimate space.
This can apply to grooming gangs in UK, the conditions of minorities in middle east (Yazidis or others)
An individual who might have an issue with a broad ideology that considers all non believers as subhuman to be converted, killed or brought into the said ideology by hook or crook can be motivated with their own experiences.
There's no activism because everybody agrees it's terrible. If your govt is already cutting out Iran and sanctioning them, there's no need to demand action.
This is very different from Israel, where our govts are actively supporting a genocide. That requires activism to change course.
Why would people demonstrate if everyone is aligned?
Protests were about US's inaction in Gaza as much as its support for Israel. Why no such protests now? Why aren't there thousands of people gathering demanding US doing something to help Iran's people?
The US was not inactive in Gaza. It was actively supporting, funding and and arming a genocide. Currently the Trump administration is actively engaged in a process to clean up the Gaza strip, rebuild it with the money of other countries, and finally hand it over to Israel for free (for who do you think those nice skyscrapers would be built, for the Palestinians? Lol).
Try and follow. The Palestinians weren't the ones who built their original infrastructure and it wasn't "hand[ed] ... over to Israel". Other than your antipathy towards Israel, what makes you think that whatever other countries pay to rebuild for the Palestinians will be handed over to Israel?
What's your point? Palestinians built the first Zionist settlements too. They were hired as workers before Labor-Zionism made ethnicity be a necessary condition for employment in the settlements, later in Israel.
Israel has already grabbed a huge amount of Gaza after the genocide and Trumps "peace". And in the west bank they have ramped up the annexation as well. Israel has been swallowing Palestine while driving away Palestinians for more than 75 years now.
An apartheid state is not going to give the second class ethnic group any concessions.
His point is to insert in the conversation some disparaging remarks to paint Palestinians as parasites, pretending the occupation, oppression and blockade don't exist. Beyond this there is no consequence at all between his remarks and the point at hand, which is that the "peace" is a ploy to take the heat off Israel, free it from any obligation to pay reparations for the destruction it caused, and eventually hand over to it the Gaza strip.
Do tell how the fraction of the billions that were given to Gazans that weren't used to build tunnels to attack Israel were "hand[ed] over to Israel for free".
Are you just shaking a Magic Eight Ball and replying with whatever non-responsive bullshit it comes up with? Open air prison, genocide, international law, occupation...
No, you missed the point. They have been indicated to "as co-perpetrators for committing the acts jointly with others: the war crime of starvation as a method of warfare; and the crimes against humanity of murder, persecution, and other inhumane acts". Physical destruction can occur without being a war crime and those war crimes can occur without any destruction. So it didn't add any useful information infact it was actively misleading because some people might think they were indicated for destruction.
Because someone is wrong on the internet, isn't that enough? I already explained it and I'm unsure what I can add aside from some examples (Siege of Jerusalem (starvation without destruction), Battle of Raqqa (destructive urban warfare, no allegations of starvation or war crimes), Siege of Mariupol (destructive urban warfare, many alleged war crimes))
If all this does is give you the data from a contact API, why not just let the users directly interact with the API? The LLM is just extra bloat in this case.
Surely a fuzzy search by name or some other field is a much better UI for this.
Did you read my post? The AI is just expensive extra component that complicates the flow. Why would I want a chat interface for something that should give me a structured response in a clean table UI with customizable columns.
Most contributions are required to assign copyright to the FSF, so it's not actually particularly open.
If the FSF is the sole copyright owner they're free to relicense it however they please, if no one else has any controlling interest of the copyright, the GPL doesn't restrict you from relicensing something you're the sole owner of (and it's doubtful there's a legal mechanism to give away rights to something you continue to own)
Again, the FSF under Stallman isn't about freedom it's about control.
I do feel this divide, but from what I've read, and ehat I've observed, it's more a divide between people who understand the limited use-cases where machine learning is useful, and people who believe it should be used wherever possible.
For software engineering, it is useless unless you're writing snippets that already exist in the LLMs corpus.
> For software engineering, it is useless unless you're writing snippets that already exist in the LLMs corpus.
If I give something like Sonnet the docs for my JS framework, it can write code "in it" just fine. It makes the occasional mistake, but if I provide proper context and planning up front, it can knock out some fairly impressive stuff (e.g., helping me to wire up a shipping/logistics dashboard for a new ecom business).
That said, this requires me policing the chat (preferred) vs. letting an agent loose. I think the latter is just opening your wallet to model providers but shrug.
If you need a shipping dashboard, then yeah, that's a very common, very simple use-case. Just hook up an API to a UI. Even then I don't think you'll make a very maintainable app that way, especially if you have multiple views (because the LLMs are not consistent in how they use features, they're always generating from scratch and matching whatever's closest).
What I'm saying is that whenever you need to actually do some software design, i.e. tackle a novel problem, they are useless.
The articles will not be mutually consistent, and what output the LLM produces will therefore depend on what article the prompt most resembles in vector space and which numbers the RNG happens to produce on any particular prompt.
reply