Shows nicely what's going on. If you ask a human, they will answer 3. Sometimes ...

Shows nicely what's going on.

If you ask a human, they will answer 3. Sometimes they say 4. Or 2. That's it.

An LLM produces a text using an example it was trained on. They were trained with these elaborate responses, so that's what they produce.

Whenever chatgpt gets something wrong, someone at openai will analyse it, create a few correct examples, and put these on the pile for retraining. Thats why it gets better - not because it is smarter, but it's retrained on your specific test cases.