This is neat and promising. I agree with other commenters that the weakness is in telling the user what to do in the form of text like "Move cursor to beginning of line". Routing the muscle signals through text processing in the brain is slow and isn't what goes on when you actually use an editor. i.e., there's no text in a marquee in one's head telling one what one needs to do next. Since keyboard shortcuts are all about habit and speed, you definitely don't want to require any atypical or slow mental processes.
For cursor movement you could show a buffer of text with some kind of target thingy that moves around, and you have to chase it using keyboard commands. That might be fun. It would also be simple to create an efficiency score based on how many commands the user used to get where they wanted to go.
For text manipulation you could provide some auxiliary visual indicator of what needs doing. For example, to practice deletion, have a buffer of text and some visual indicator of a character, word, or line that needs deleting. Say it flashes (probably a bad choice, but whatever). Then the user's job is to use the right shortcut to delete the flashing thing. At first, the software could move the cursor to the right place before each new command. For more advanced users, you could tell them "First move the cursor to the right place and then invoke the right command to delete the flashing thing". As soon as they delete one bit, another bit starts flashing. You could measure how long it takes to delete everything in the buffer this way. That might be fun too.
That looks fun, but also intimidating. It makes me think that maybe tracking efficiency is a bad idea for learning after all. To stop and think about whether one's doing X in the most efficient way stymies flow.
Come to think of it, in Emacs I often do things inefficiently – in the sense of using many more commands than the technical minimum to do a task – because the basic commands are already "compiled" in my head. If I have to stop and grope for a less familiar command that could do the job more directly, that's like switching to interpreter mode, which is much slower. So in the short run, it can be more efficient in time to be less efficient with commands. This is the chicken-and-egg problem where one doesn't invest the effort to acquire new tricks because one's too busy doing one's job with the tricks one already knows.
The goal is to get more tricks into the compiled set (muscle memory) more easily. I suspect this is a "don't make me think" kind of challenge. One has a limited budget of thinking energy and typically needs to spend it on more important things, so one doesn't have anything left to invest in getting better at Emacs or whatever, even though one knows one "should". The challenge is how to move this kind of knowledge into muscle memory using some cheaper pool of energy.
This is probably a solvable problem because the commands we're talking about are so mechanical. They don't need to go through the most expensive cognitive process; our goal is to forget them on that level anyway. But I haven't seen any teaching tool with a low enough cost in this sense. The OP comes the closest, which is already impressive. And if you can learn editor commands this way, there probably are a lot of other useful things you can learn this way.
I took a look and yes, I didn't have to think much about what to do, and nearly all my effort was spent actually toying with h-j-k-l. I don't know vim at all so I'm good for a newbie test there.
Edit: I just realized what bothered me, though: it doesn't look anything like vim. So there's a context loss, which feels like it might degrade the value of the learning.
For cursor movement you could show a buffer of text with some kind of target thingy that moves around, and you have to chase it using keyboard commands. That might be fun. It would also be simple to create an efficiency score based on how many commands the user used to get where they wanted to go.
For text manipulation you could provide some auxiliary visual indicator of what needs doing. For example, to practice deletion, have a buffer of text and some visual indicator of a character, word, or line that needs deleting. Say it flashes (probably a bad choice, but whatever). Then the user's job is to use the right shortcut to delete the flashing thing. At first, the software could move the cursor to the right place before each new command. For more advanced users, you could tell them "First move the cursor to the right place and then invoke the right command to delete the flashing thing". As soon as they delete one bit, another bit starts flashing. You could measure how long it takes to delete everything in the buffer this way. That might be fun too.