With much fanfare, I’ve finally completed a full rewrite of opcode decoding logic, and re-implemented the bulk of the opcodes themselves to work the same way that they do on a real 6502 processor. This required some rethinking of overall system timing. Now instead of running an instruction and then advancing the rest of the system some number of clocks, the CPU is also clocked, and can exist in a state halfway between opcodes.
This is a serious boon to overall accuracy, as it means that reads and writes now occur exactly when they should, including extra, dummy, and otherwise wrong reads (and writes) that the processor occasionally performs during certain instructions.
The NES is an older system without a lot of extra features, and this includes any sort of a hardware timer, yet many games for the system require some kind of scroll split, usually to separate the playfield from a status area, and occasionally for gameplay reasons. On newer consoles one might use multiple backgrounds, framebuffers, or an hblank interrupt to achieve this, but the NES has just the one background. Thus, splitting the playfield means you need to wait for the NES to arrive at the right point mid-way through a screen refresh, then change the scrolling or background parameters accordingly.
The NES lacks any sort of a hardware timer outside of the Audio engine’s DMC channel, which is usually too busy playing audio samples to be terribly helpful in this endeavor. Some mappers solve this problem by including IRQs directly in the cartridge that use trickery to count the scanline the PPU is currently accessing, but many more games simply use a well-timed CPU spin-wait to idle the processor until the correct scanline is reached during the draw step.
Now that CPU instructions actually take the correct number of cycles, a lot of these effects are now either working correctly, or are much closer than they were before this change:
As one of my favorite games on the NES, Zelda II has some serious funky business going on here. CPU timing is only half of the story, the beginning of the title card now looks much better, but once it starts scrolling, problems quickly become apparent:
Here, the program is supposed to be using an unusual PPU feature to set the scroll position mid-frame, but I don’t have that implemented correctly. However, this is still much improved compared to the previous title card, which had the scroll split in just entirely the wrong place. On the plus, gameplay is now looking much better, with the scroll split in the status area finally rendering correctly:
This neat little screen wipe effect uses CPU timing to blank out the background on certain scanlines. Before the change, the CPU’s spinwait was finishing far too fast, shifting the entire effect too far up the screen. With accurate CPU delays in place, the effect is centered as it should be.
Similarly, Final Fantasy uses a scanline trick to animate its dialog boxes into place. With the CPU running too fast, they were missing entirely!
Even with the CPU delays in place you’ll notice that the very bottom of this dialog box is not quite positioned correctly. This is almost certainly due to OAM transfers being instant, rather than the expected 512 CPU cycles for the copy.
This is running just a bit better now, although main gameplay is still substantially broken.
Several cutscene bits and the map were apparently relying on CPU timing to switch the CHR bank in at just the right moment. While not perfect, these are looking a lot better:
Main gameplay is, alas, still very very broken. Battletoads is pulling a trick here where it disables the display for several scanlines that would normally be drawn, then updates the scroll position dynamically when it’s done. Since my PPU implementation is too basic and doesn’t properly emulate the trick, we get this mess instead:
That’s all for now! Next up I’m rewriting the PPU so that it is just as cycle accurate as the CPU, which should greatly improve compatability with all the games on this page!