Claude wrote a functional NES emulator using my engine's API

worble · 2025-12-31T14:25:59 1767191159

I'd be curious in how well it passes 100th Coin's NES accuracy tests https://github.com/100thCoin/AccuracyCoin

utopiah · 2025-12-31T14:37:13 1767191833

Indeed, that's what I kind of hinted at in https://news.ycombinator.com/item?id=46442195 and coincidentally https://news.ycombinator.com/item?id=46437688 briefly after, namely that OK, one can "generate" a "solution", that's much easier than before... but until we can verify somehow that it actually does what it say it does (and we know of hallucinations and have no reason to believe this changed) then testing itself, especially of well know "problems" is more and more important.

That being said, it doesn't answer the "why" in the first place, an even more important question. At least though it does help somehow to compare with existing alternatives.

garciasn · 2025-12-31T15:34:17 1767195257

Isn’t this how all software development works? Folks commit code, it’s tested, and reviewed, and then deployed.

Why would this be any different?

PaulDavisThe1st · 2025-12-31T15:53:05 1767196385

That's not how software development works.

Folks think, they write code, they do their own localized evaluation and testing, then they commit and then the rest of the (down|up)stream process begins.

LLM's skip over the "actually verify that the code I just wrote does what I intended it to" step. Granted, most humans don't do this step as thoroughly and carefully as would be desirable (sometimes through laziness, sometimes because of a belief in (down|up)stream testing processes). But LLM's don't do it at all.

sally_glance · 2025-12-31T16:00:21 1767196821

They absolutely can do that if you give them the tools. Seeing Claude (I use it with opencode agents) run curl and playwright to verify and then fix it's implementation was a real 'wow' moment for me.

Q6T46nT668w6i3m · 2025-12-31T17:34:56 1767202496

We have different experiences. Often I’ll see Claude, et. al. find creative ways to fulfill the task without satisfying my intent, e.g., changing the implementation plan I specifically asked for, changing tolerances or even tests, and frequently disabling tests.

sally_glance · 2026-01-01T04:47:33 1767242853

Yeah I feel that, if it happens your only way out is to write down a more extensive implementation plan first. For me that is the point where I start regretting to have tried implementing something using AI,.. But admittedly most of the time redacting the implementation plan and running the agent again is still faster than I could have done on my own (I try to make implementation tasks explicit in the form of a markdown file, worked pretty well so far).

Fr0styMatt88 · 2025-12-31T20:55:40 1767214540

I see these “you had a different experience than me” comments around AI coding agents a lot and can concur; I’ll have a different experience with Copilot from day-to-day even, sometimes it’s great and other days I give up on using it at all it’s being so bad.

Makes me honestly wonder — will AGI just give us agents that get into bad moods and not want to work for the day because they’re tired or just don’t feel like it!

ssl-3 · 2025-12-31T22:57:55 1767221875

If part of the goal is to emulate a person's abilities, then surely that includes a person's ability to fuck things up.

DANmode · 2025-12-31T19:48:05 1767210485

Are you a customer?

DANmode · 2025-12-31T22:12:13 1767219133

Don’t downvote because you don’t like the question.

It obviously adds to the discussion: paid and non paid accounts are being conflated daily in threads like these!

They’re not the same tier account!

Free users, especially ones deemed less interesting to learn from for the future, are given table-scraps when they feel it’s necessary for load reasons.

nineteen999 · 2025-12-31T23:11:15 1767222675

Exactly. There's an impedance mismatch between those using the free/cheap tiers and those paying a premium, so the discussion gets squirrely because one side is talking about apples and the other oranges.

DANmode · 2026-01-01T06:23:46 1767248626

Right.

More specifically: One side is talking about apples,

and the other is talking about mushy old apples,

that sometimes you need to wait 12 hours for.

baobun · 2026-01-01T01:24:01 1767230641

All user accounts are also customers. Some are paying with data and contributing to metrics going up.

DANmode · 2026-01-01T02:36:19 1767234979

That’s not how words work.

All users are stakeholders.

They’re emphatically not considered customers.

We can disagree with that, create legal protections for those people - but that doesn’t make them customers to OpenAI, Anthropic, et al.

mapontosevenths · 2025-12-31T16:03:15 1767196995

> LLM's skip over the "actually verify that the code I just wrote does what I intended it to" step.

I'm not sure where this idea comes from. Just instruct it to write and run unit tests and document as it goes. All of the ones I've used will happily do so.

You still have to verify that the unit tests are valid, but that's still far less work than skipping them or writing the code/tests yourself.

butlike · 2025-12-31T20:42:33 1767213753

I disagree it's less work. It just carte blanche rewrites tests. I've seen it rewrite and rewrite tests to the point of undermining the original test intention. So now instead of intentionally writing code and a new unit test, I need to intentionally go and review EVERY unit test it touched. Every. Time.

It also doesn't necessarily rewrite documentation as implementation changes. I've seen documentation code rot happen within the same coding session.

mapontosevenths · 2025-12-31T23:04:07 1767222247

I've seen it do that as well. Especially Gemini 3 lately.

I've started to add an instruction to my GEMINI.md after I'm happy with the tests telling it not to edit them, but to still run them.

I solve the documentation issue the same way. By telling it when and what to update in the .md file.

jimmaswell · 2025-12-31T16:35:57 1767198957

> actually verify that the code I just wrote does what I intended it to

That's what the author did when they ran it.

adventured · 2025-12-31T16:33:04 1767198784

Claude Opus 4.5 will routinely test its own code before handing it off to you, even with zero instruction to do so.

PaulDavisThe1st · 2025-12-31T19:06:33 1767207993

One commercial equivalent to the project I work on, called ProTools (a DAW), has a test "harness" that took 6 people more than a year to write and takes more than a week to execute.

Last month, I made a minor change to our own code and verified that it worked (it did!). Earlier this week, I was notified of an entirely different workflow that had been broken by the change I had made. The only sort of automated testing that would have detected this would have been similar in scope and scale to the ProTools test harness, and neither an individual human nor an LLM is going to run that.

Moreover, that workflow was entirely graphically based, so unless Claude Opus 4.5 or whatever today's flavor of vibe coding LLM agent is has access to a testing system that allows it to inject mouse events into a running instance of our application (hint: it does not), there's no way it could run an effective test for this sort of code change.

I have no doubt that Claude et al. can verify that their carefully defined module does the very limited task it is supposed to do, for cases where "carefully defined" and "very limited" are appropriate. If that's the only sort of coding you do, I am sorry for your loss.

utopiah · 2025-12-31T19:51:34 1767210694

> access to a testing system that allows it to inject mouse events into a running instance of our application

FWIW that's precisely what https://pptr.dev is all about. To your broader point though designing a good harness itself remains very challenging and requires to actually understand what value for user, software architecture (to e.g. bypass user interaction and test the API first), etc.

PaulDavisThe1st · 2025-12-31T22:36:08 1767220568

> Puppeteer is a JavaScript library which provides a high-level API to control Chrome or Firefox

my world is native desktop applications, not in-browser stuff.

nineteen999 · 2025-12-31T23:13:18 1767222798

You suggest a web testing framework as a response to someone working on a real desktop app?

utopiah · 2026-01-01T07:05:19 1767251119

No I was sharing an example of a framework that does include "a testing system that allows it to inject mouse events".

That being said mouse events and similar isn't hard to do, e.g. start with a fixed resolution (using xrandr) then xdotool or similar. Ideally if the application has accessibility feature it won't be as finicky.

My point though was just to show that testing with GUI is not infeasible.

Apparently there is even a "UI Testing for devs & agents" https://www.chromatic.com which I found via Visual TDD https://www.chromatic.com/blog/visual-test-driven-developmen... I can't recommend this but it does show even though the person I was replying with can't use Puppeteer in their context the tooling does exist and the principles would still apply.

PaulDavisThe1st · 2026-01-01T16:42:19 1767285739

> My point though was just to show that testing with GUI is not infeasible.

Indeed, which is why I mentioned the ProTools test harness and the fact that it took 6 people a year to write and takes a week to run (or took a week, at some point in the past; it might be more or less now).

astrange · 2025-12-31T23:44:53 1767224693

Claude can do that, yes.

https://platform.claude.com/docs/en/agents-and-tools/tool-us...

Although if you want to test a UI app, it's better to do it through accessibility APIs rather than actually looking at the screen and clicking.

roger_ · 2025-12-31T15:05:50 1767193550

I’m sure you can point Claude at that page and have it make the necessary changes to pass.

deadbabe · 2025-12-31T15:46:03 1767195963

Or it could loop infinitely, never quite being able to pass all the tests.

hu3 · 2025-12-31T21:50:33 1767217833

which is easily fixable by some human guidance

RAMJAC · 2026-01-01T00:18:21 1767226701

Sorta, I went into this not really knowing how to implement an emulator: https://github.com/RAMJAC-digital/RAMBO

With the NES there are all sorts of weird edge cases, one of which are NMI flags and resets; the PPU in general is kinda tricky to get right. Claude has had *massive** issues with this, and I've had to take control and completely throw out code it's generated. I'm restarting it with a clean slate though, as there are still issues with some of the underlying abstractions. PPU is still the bane of my existence, DMA, I don't like the instruction pipeline, haven't even gotten to the APU. It's getting an 80/130 on accuracy coin.

Though, when it came to creating a WASM target, Claude was largely able to do it with minimal input on my end. Actually, getting the WASM emulator running in the browser was the least painful part of this project.

You will run into three problems: 1) "The Wall" when any project becomes large enough, you need the context window to be *very* specific and scoped, with explicit details of what is expected, the success criteria and deliverables. 2) Ambiguity means Claude is going to choose the path of least resistance, and will pedantically avoid/add things which are not specced. Stubs for functions, "beyond scope", "deferred" are some favorite excuses to not refactoring or implementing obvious issues (anything that will go beyond the context window, Claude knows, but won't tell you will be punted work). 3) Chat bots *loooove* to talk, it will vomit code for days. Removing code/documentation is anathema to Claude. "Backward compatibility", deprecated, and legacy being its favorite.

deadbabe · 2026-01-01T15:51:53 1767282713

This sounds exhausting, once the thrill of seeing code rapidly generated wears off, I wonder if it's even worth it. If someone was going to use code they didn't write, why not just pull down some open source implementation from somewhere and build on top of it? It's basically gets you the same thing but without the LLM hassles, and you can start building on a more sane foundation.

Y_Y · 2025-12-31T14:03:10 1767189790

Git wrote a functional NES emulator for me by simply cloning one of the many publicly available ones!

LunicLynx · 2025-12-31T14:23:21 1767191001

This is the comment.

Give it copy paste / translate tasks and it’s a no brainer (quite literally)

But same can be said of humans.

The question here is, did it implement it because it read the available online documentation about the NES architecture OR did it just see one too many of such implementations.

jacquesm · 2025-12-31T15:43:34 1767195814

> But same can be said of humans.

Indeed, the 'cleanroom' standard always was one team does the RE and writes a spec, another team that has never seen the original (and has written statements with penalty clauses to prove it) then does the re-implementation. If you were to read the implementation, write the spec and then write the re-implementation that would be definitely violating the standard for claiming an original work.

cebert · 2025-12-31T14:01:13 1767189673

It’s a shame that the source code isn’t commented and documented more. At the very least, I would see it being helpful to add some documentation for every CPU op code being emulated.

112233 · 2025-12-31T14:32:13 1767191533

Forbidding LLM to write comments and docstrings (preferrably enforced by build and commit hook) is one of the best "hacks" for using that thing. LLM cannot help itself but emit poisonous comments.

jacquesm · 2025-12-31T15:41:40 1767195700

Or maybe clone the comments from where it cloned the source.

exe34 · 2025-12-31T20:53:22 1767214402

I used to worry that using LLMs to code would let them use my code and train on my hard work. Then I realised how bad my code is, so I'm probably singlehandedly holding off an agi catastrophe.

112233 · 2025-12-31T16:48:18 1767199698

Meh. No human has written the horrors llm produces. At least I am yet to see codebase like that. Let me attempt a theatrical reenactment:

    // Use buffer that is large enough to hold any possible value. Avoid using JSON configuration, this optimizes codebase and prevents possible security exploits! 
    size_t len = 32;


    // this function does not call "sort" utility using shell anymore, but instead uses optimized library function "sort" for extreme perfomance improvement!!!
    void get_permutations() {

... and so on. It basically uses comments as a wall to scribble grandiose graffiti about it's valiant conquests in following explicit instruction after fifth repeat and not commiting egregious violence agains common sense.

theshrike79 · 2025-12-31T20:37:51 1767213471

I'm guessing "it" is Gemini here? Claude rarely adds comments at that level.

112233 · 2025-12-31T20:59:34 1767214774

It was both Opus and Sonnet, actually. You ask it to add some feature, clonky goes

    // use configuration to support previous database scheme
    // json_data = parse_blah_scheme_yadda ...

You, like, "what are you doing??!! What previous version, there is no previous version!!!"

And it, like, "You are absolutely right! This is an excellent observation! Let me implement this optimization right away!"

    // Optimize feature loading by skipping scheme conversion, because previous version data does not exist!!!
    json_data = parse_blah_do_not_scheme_yadda

And you, like, facetable and crycry

butlike · 2025-12-31T20:45:40 1767213940

And since it's vibe coded, no one knows what the opcodes are. LLM won't remember. Human has no comments. Human can't trust post-hoc LLM-generated comments because they're poisonous.

112233 · 2025-12-31T23:35:21 1767224121

If function of vibecode is not self-evident, dispose of it.

Or, to put it differently, having vibe comment does not free you of responsibility to inspect actual vibe code.

If code contradicts comments, LLM is as likely to go by comments. It is bad enough to have heaps of dead, unused code. Comments make everything much worse.

StilesCrisis · 2025-12-31T14:14:43 1767190483

Probably better to look at a human-authored emulator if you want comments containing accurate information anyway.

bugfix · 2025-12-31T15:03:19 1767193399

If you let it, Claude Code will write a comment for almost every single line of code.

mikepurvis · 2025-12-31T15:13:45 1767194025

    # Assign value of x to y
    y = x

ziml77 · 2025-12-31T18:40:38 1767206438

Even if you try to get them to not, they will still overcomment the code. Or at least overcomment it from the perspective of a human. From the perspective of the LLM, I suspect the comments are necessary for it to be able to get the code output correct.

theshrike79 · 2025-12-31T20:38:52 1767213532

It's also a discoverability tool. If the code has good docstrings and decent naming for functions/variables it's a lot easier for the LLM to find the correct places to edit.

delduca · 2025-12-31T13:07:22 1767186442

https://github.com/willtobyte/NES

johnisgood · 2025-12-31T14:02:42 1767189762

Why not use the LLM for more meaningful commit titles & messages as well while you are at it?

giancarlostoro · 2025-12-31T14:09:24 1767190164

Surprised there's no README file at all.

rmckayfleming · 2025-12-31T20:57:33 1767214653

Oh neat, I've been working with claude on an NES emulator in Racket using an SDL3 wrapper also written mostly by Claude.

tabs_or_spaces · 2026-01-01T06:30:47 1767249047

I tried this a while back using gemini 2.5 pro, round about the time gemini cli was released. I never got the emulator to work in the end, so I dropped the idea.

So this is impressive for me in terms of how fast things have progressed.

zorked · 2025-12-31T14:12:43 1767190363

Nice, but NES emulator is one of the most written pet projects anywhere, which makes it considerably less impressive.

StilesCrisis · 2025-12-31T14:17:55 1767190675

Heck, when Satya Nadella wanted to demonstrate Copilot coding, he had it emit an Altair emulator. I guess there's little room for creativity in 8-bit emulator design so LLMs can handle them well. https://thenewstack.io/from-basic-to-vibes-microsofts-50-yea...

ldng · 2025-12-31T15:55:50 1767196550

And said emulator was opensourced and tested by third parties, right ?

Until it's so, it's just hearsay to me by someone having a multi-billion horse in the race.

pragma_x · 2025-12-31T17:21:17 1767201677

This is a good point. I wonder how much NES emulator code is in Claude's training set? Not to knock what the author has done here, but I wonder if this is more of a softball challenge than it looks.

noident · 2025-12-31T14:53:06 1767192786

Somewhere along the line the AI bros stopped separating training and testing sets. It's great for impressing the villagers

swannodette · 2025-12-31T15:13:26 1767194006

WASM and the performance seems catastrophically bad (45ms to render a frame on an M4 laptop)? It would be much more impressive if Claude could optimize it into something that someone would actually want to play? Compare this to a random hit from Google, https://jsnes.org/ which has sound, much smaller payload, and runs really fast (<1ms to render a frame).

The cost of slop is >40X drop in performance? Pick any metric that you care about for your domain perhaps that's what you're going to lose and is the effort to recover that practical with current vibe-coding strategies?

masswerk · 2026-01-01T09:49:13 1767260953

For me on Firefox/macOS it's terribly slow, fails to initialise/resume sound, no keyboard input.

deadbabe · 2025-12-31T15:47:02 1767196022

I will be impressed when new game consoles come to market and it can write the first emulator for it.

luckydata · 2025-12-31T20:53:41 1767214421

a very slow one.

bfrog · 2025-12-31T17:46:00 1767203160

How much was grifted from existing emulators?

endemic · 2025-12-31T20:33:46 1767213226

By definition, all of it.

cgfjtynzdrfht · 2025-12-31T14:33:55 1767191635

Trained on 1000s of NES emulators, it's not really impressive.

Github alone has +4k NES emulator projects: https://github.com/search?q=nes%20emulator&type=repositories

This is more like "wow, it can quote training data".

keyle · 2025-12-31T14:28:20 1767191300

Who care what it did. What did you learn? To live is to learn.

mikkupikku · 2025-12-31T14:35:49 1767191749

When I consider the utility of a hammer, my first priority is to ask what the hammer can teach me.

pygy_ · 2025-12-31T14:41:19 1767192079

There are NES emulators aplenty, the only value in writing a new one is pedagogic, for the writer.

This endeavor had negative net value.

jimmaswell · 2025-12-31T16:38:59 1767199139

It demonstrated the capabilities of an AI to a potentially on-the-fence audience while giving the author experience using the new tools/environment. That's solid value. I also just find it really cool to see that an AI did this.

butlike · 2025-12-31T20:52:15 1767214335

Yeah, it shows the AI is not capable of writing maintainable projects. I'm off the fence. And its cool you find it cool, but reducing the problem space to that of a toy project makes it so much less impressive as to be trivially ignorable.

The new LLM (pattern recognizer/matcher) is not a good tool

mikkupikku · 2025-12-31T15:28:19 1767194899

How about being entertained by the process?

worthless-trash · 2025-12-31T15:44:45 1767195885

They didnt call it the "Nintendo Entertainment System" for nothing.

NoraCodes · 2025-12-31T15:31:42 1767195102

Do you think that the use of a hammer is an innate skill, and that woodworkers learn nothing from their craft?

mikkupikku · 2025-12-31T17:08:03 1767200883

Okay, so let's say the use of a coding agent isn't an innate skill, so the author was gaining experience with the tool.

philipallstar · 2025-12-31T15:58:18 1767196698

Ask not what your hammer can do for you.

jancsika · 2025-12-31T15:44:22 1767195862

If it's a zillion dollar hammerbot the company is offering to your boss for pennies, that had better be your first priority!

risyachka · 2025-12-31T15:22:55 1767194575

Do you like to read posts about what hammer can do? Especially when it has been done 100 times already.

mikkupikku · 2025-12-31T17:10:18 1767201018

I'm no carpenter, but I can honestly say I've probably read a hundred articles about vim..

butlike · 2025-12-31T20:49:45 1767214185

You ask what you learned building the house. The hammer hits the nails.

aoeusnth1 · 2026-01-01T01:44:05 1767231845

Is there zero skill in managing agents?

password54321 · 2025-12-31T15:25:25 1767194725

Yeah I think this is the wrong approach. If they were making money out of it, that would be different. But this is pointless.

RcouF1uZ4gsC · 2025-12-31T16:27:58 1767198478

Is this why you only wrote in machine code until you fully understood the entire compiler front end, back end chain?

postalrat · 2025-12-31T16:51:34 1767199894

I learned claude can write a functional NES emulator. I wonder what else it can do?

jgbuddy · 2025-12-31T14:34:07 1767191647

to live is to build

shriek · 2025-12-31T15:44:59 1767195899

to build what you don't understand is to suffer in future

krapp · 2025-12-31T14:41:42 1767192102

Except OP isn't learning or building. He's telling a computer to do the work for him and padding his resume.

danielbln · 2025-12-31T15:50:50 1767196250

How cynical. Just seeing if the current crop of automation systems can do it can be interesting enough for some of us.

butlike · 2025-12-31T20:53:42 1767214422

It's a waste of time and energy, and when you're older you'll realize energy is the premium here.

skydhash · 2025-12-31T15:55:14 1767196514

A simple git clone is faster.

danielbln · 2025-12-31T16:06:03 1767197163

So is drinking a sip of water, but neither show what an agentic system can cook up.