Did you separate the test dataset from the training one?

gwern · on March 26, 2024

Does it matter? It's not a research project trying to rigorously evaluate novel architectural modifications or something, but just a project trying to be useful within the limited resources of a hobbyist. If someone labeled a bunch of the remaining errors, that data would then be better used as more training data than to benchmark.

In practice, the accuracy, whatever it is, appears to be very high and more than adequate to justify its use.

klntsky · on March 29, 2024

Yes it does. You will always get close to 100% accuracy on small datasets if you evaluate the model using the train dataset - due to overfit.