Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Matching braces is part of written natural language and would be beyond the power of regular languages wouldn't it? I thought there was something special with transformers allowing to learn that at least within its context window size in a generalizable way.

It also sounds like from the paper transformers are formally proven to capable of it, but the training can't get the weights to make it capable of it, even though there should exist such weights, if I am reading them right.



I've seen humans mess up matching braces. I've even been that human already this year, only finding out because the compiler told me.


But we still generalize it to bigger and bigger without seeing matching brace problems of all sizes we can handle.

In the paper I don't think they are looking for perfect, but they show which models can't seem to learn significantly beyond the size of examples they explicitly saw.


The {specific (number I (messed up was a depth of 3)}.

I missed it because it was spaced over a few lines and I was tired.


But not because you only trained up to examples of two.


I don't know what point you're making, GPT-3 almost certainly has training examples for greater depth.

Pattern recognition can fail. Mine does, GPT's does.

I count further than I subitize, but from experience my mind may wander and I may lose track after 700 of a thing even if there's literally nothing else to focus on.

But I still didn't notice an error at depth 3.


It seems transformers don't generalize in this class and neural turing machines and neural stack machines do.

You get the general rule (in linguistics known as competency) but can't flawlessly do it (performance). Transformers can't seem to get competence here.

https://en.m.wikipedia.org/wiki/Linguistic_competence


Ah, gotcha. Thanks for the clear explanation and the link! :)


this may well be the only real skill of programmers who learned C style syntaxes, andor LISPs

on the ohter hand, thinking with the parenthesis takes language up to the level of functional-thinking. computable stuff is all about substitution of all things inside a parenthesis for a single thing. and this is the essence of all classic computation


If there is a finite input it is regular.


The paper covers that. They only test to a finite size, but the models that fail can't generalize significantly beyond the size of explicit examples they were trained on.

Formally transformers should be capable of modeling it with the right weights, but they show good evidence those weights can't be learned with gradient descent. Whereas a neural turing machine or stack machine can learn them with gradient descent.


You might mean something like if you restrict the set of all valid inputs to N characters then the language is regular, but that's a much stronger condition than requiring the input to be finite. Furthermore, such a restriction makes the language non-recursive; it's effectively a trivial language.

In standard automata theory, all inputs to automatons are finite.


Yes, I was hoping people would understand that from the context.


As in you construct a FSM that can recognize all strings of a context free language less than a certain length?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: