Matching braces is part of written natural language and would be beyond the powe...

ben_w · on Jan 23, 2023

I've seen humans mess up matching braces. I've even been that human already this year, only finding out because the compiler told me.

cma · on Jan 23, 2023

But we still generalize it to bigger and bigger without seeing matching brace problems of all sizes we can handle.

In the paper I don't think they are looking for perfect, but they show which models can't seem to learn significantly beyond the size of examples they explicitly saw.

ben_w · on Jan 23, 2023

The {specific (number I (messed up was a depth of 3)}.

I missed it because it was spaced over a few lines and I was tired.

cma · on Jan 23, 2023

But not because you only trained up to examples of two.

ben_w · on Jan 23, 2023

I don't know what point you're making, GPT-3 almost certainly has training examples for greater depth.

Pattern recognition can fail. Mine does, GPT's does.

I count further than I subitize, but from experience my mind may wander and I may lose track after 700 of a thing even if there's literally nothing else to focus on.

But I still didn't notice an error at depth 3.

cma · on Jan 23, 2023

It seems transformers don't generalize in this class and neural turing machines and neural stack machines do.

You get the general rule (in linguistics known as competency) but can't flawlessly do it (performance). Transformers can't seem to get competence here.

https://en.m.wikipedia.org/wiki/Linguistic_competence

ben_w · on Jan 23, 2023

Ah, gotcha. Thanks for the clear explanation and the link! :)

randorand0 · on Jan 23, 2023

this may well be the only real skill of programmers who learned C style syntaxes, andor LISPs

on the ohter hand, thinking with the parenthesis takes language up to the level of functional-thinking. computable stuff is all about substitution of all things inside a parenthesis for a single thing. and this is the essence of all classic computation

charcircuit · on Jan 23, 2023

If there is a finite input it is regular.

cma · on Jan 23, 2023

The paper covers that. They only test to a finite size, but the models that fail can't generalize significantly beyond the size of explicit examples they were trained on.

Formally transformers should be capable of modeling it with the right weights, but they show good evidence those weights can't be learned with gradient descent. Whereas a neural turing machine or stack machine can learn them with gradient descent.

Kranar · on Jan 23, 2023

You might mean something like if you restrict the set of all valid inputs to N characters then the language is regular, but that's a much stronger condition than requiring the input to be finite. Furthermore, such a restriction makes the language non-recursive; it's effectively a trivial language.

In standard automata theory, all inputs to automatons are finite.

charcircuit · on Jan 24, 2023

Yes, I was hoping people would understand that from the context.

smadge · on Jan 23, 2023

As in you construct a FSM that can recognize all strings of a context free language less than a certain length?