They are learning a grammar, finding structure in the text. In the case of Othello, the rules for what moves are valid are quite simple, and can be represented in a very small model. The slogan is "a minute to learn, a lifetime to master". So "what is a legal move" is a much simpler problem than "what is a winning strategy".
It's similar to asking a model to only produce outputs corresponding to a regular expression, given a very large number of inputs that match that regular expression. The RE is the most compact representation that matches them all and it can figure this out.
But we aren't building a "world model", we're building a model of the training data. In artificial problems with simple rules, the model might be essentially perfect, never producing an invalid Othello move, because the problem is so limited.
I'd be cautious about generalizing from this work to a more open-ended situation.
I don't think the point is that Othello-GPT has somehow modellled the real world training on only games but that tasking it to predict the next move forces it to model its data in a deep way. There's nothing special about Othello games vs internet text except that the latter will force it to model much more things.
It's similar to asking a model to only produce outputs corresponding to a regular expression, given a very large number of inputs that match that regular expression. The RE is the most compact representation that matches them all and it can figure this out.
But we aren't building a "world model", we're building a model of the training data. In artificial problems with simple rules, the model might be essentially perfect, never producing an invalid Othello move, because the problem is so limited.
I'd be cautious about generalizing from this work to a more open-ended situation.