PPU - Cycle Accuracy

Emulation is a fickle thing. If every game released followed the platform’s accepted conventions, writing an NES emulator would be a piece of cake. On paper, it’s a simple system. But of course, games defy convention, and pull all sorts of weird tricks that make a naive approach difficult to achieve. Over the last few months, I threw my own naive approach out the window and decided enough was enough. RusticNES now emulates the PPU down to the last cycle, so that games which rely on obscure hardware behaviors and precise timing have a better chance at behaving correctly.

This was a straightforward undertaking, and kind of scary, since I had to completely destroy the most visible part of the emulator, then replace it with new, untested code. This provided some really fun bugs:

CHR Bank Selection? Bah!

Now instead of rendering the graphics every scanline or every frame, the NES’s PPU exists in software the same way it does in hardware. All the internal registers are implemented, and writes to those registers from the CPU make the appropriate internal changes as on real hardware, including several bugs and unintuittive features. To draw the screen, the PPU is clocked 3 times per CPU cycle (on NTSC) and performs its operations, increments, reads and writes to memory in the same sequence as real hardware. That image above may look broken, but it’s still clearly the Zelda title screen, and that was a major milestone. This evolved rapidly as new features were re-implemented in the PPU:

Better! But the color selection isn't quite right...

Ah, tile attributes are 32x32, not 16x16. That's more like it!

A Timeless Classic

The new implementation revealed some unusual behavior with PPU register writes from the CPU, which perform some funky impartial updates to the internal memory address. I learned that the scrolling “registers” don’t actually exist, and are instead weirdly mapped to the internal lookup address and, crucially, can be updated by a game in real time. This was the missing piece of the graphics puzzle for Zelda II’s title screen, which finally renders correctly:

The most adventuresome PPU yet!

Now that the PPU is working properly, I can implement a new cartridge type, MMU3! This cartridge has a special feature in the form of a scanline counter, which is used by games for special effects and simpler playing field splits without relying on a sprite zero hit or accurate CPU spin loops. For the first time, Super Mario 3 and Kirby’s Adventure are playable in RusticNES:

Super Mario Bros. 3
I will show off these cloud platforms or die trying!

It's Kirby!
The game is polished! My skills are not, however...
This awesome effect is accomplished thanks to the cart's fast bank switching.

With the CPU and PPU accuracy issues worked out, I’m also pleased to report that Battletoads is now playable! At least, the first couple of levels definitely are. Battletoads is famous for its ridiculous difficulty, and for good reason. I can’t get more than 1-2 levels in. If anyone more practiced on the game wants to give it a whirl and let me know if the later levels are all working, I’d very much appreciate the bug reports.

That's looking MUCH better!

Things are coming right along! Major features left to implement mostly include additional cartridge types and peripherals other than standard controllers. The user interface in rusticnes-sdl is also much improved at this point, and I’ll do another blog post on its new features and tricks later. Then maybe a release? Bah, it’ll be ready when it’s ready!

-Zeta

6502 CPU - Cycle Timing

With much fanfare, I’ve finally completed a full rewrite of opcode decoding logic, and re-implemented the bulk of the opcodes themselves to work the same way that they do on a real 6502 processor. This required some rethinking of overall system timing. Now instead of running an instruction and then advancing the rest of the system some number of clocks, the CPU is also clocked, and can exist in a state halfway between opcodes.

This is a serious boon to overall accuracy, as it means that reads and writes now occur exactly when they should, including extra, dummy, and otherwise wrong reads (and writes) that the processor occasionally performs during certain instructions.

The NES is an older system without a lot of extra features, and this includes any sort of a hardware timer, yet many games for the system require some kind of scroll split, usually to separate the playfield from a status area, and occasionally for gameplay reasons. On newer consoles one might use multiple backgrounds, framebuffers, or an hblank interrupt to achieve this, but the NES has just the one background. Thus, splitting the playfield means you need to wait for the NES to arrive at the right point mid-way through a screen refresh, then change the scrolling or background parameters accordingly.

The NES lacks any sort of a hardware timer outside of the Audio engine’s DMC channel, which is usually too busy playing audio samples to be terribly helpful in this endeavor. Some mappers solve this problem by including IRQs directly in the cartridge that use trickery to count the scanline the PPU is currently accessing, but many more games simply use a well-timed CPU spin-wait to idle the processor until the correct scanline is reached during the draw step.

Now that CPU instructions actually take the correct number of cycles, a lot of these effects are now either working correctly, or are much closer than they were before this change:

Zelda II - Title Screen

Dramatic Music

As one of my favorite games on the NES, Zelda II has some serious funky business going on here. CPU timing is only half of the story, the beginning of the title card now looks much better, but once it starts scrolling, problems quickly become apparent:

Now with extra space!

Here, the program is supposed to be using an unusual PPU feature to set the scroll position mid-frame, but I don’t have that implemented correctly. However, this is still much improved compared to the previous title card, which had the scroll split in just entirely the wrong place. On the plus, gameplay is now looking much better, with the scroll split in the status area finally rendering correctly:

Maybe we should just let her sleep?

Final Fantasy

BEFORE: Not Centered
AFTER: Centered

This neat little screen wipe effect uses CPU timing to blank out the background on certain scanlines. Before the change, the CPU’s spinwait was finishing far too fast, shifting the entire effect too far up the screen. With accurate CPU delays in place, the effect is centered as it should be.

Similarly, Final Fantasy uses a scanline trick to animate its dialog boxes into place. With the CPU running too fast, they were missing entirely!

BEFORE: Not much of a talker?
AFTER: That's much better!

Even with the CPU delays in place you’ll notice that the very bottom of this dialog box is not quite positioned correctly. This is almost certainly due to OAM transfers being instant, rather than the expected 512 CPU cycles for the copy.

Battletoads

This is running just a bit better now, although main gameplay is still substantially broken.

There were supposed to be logos here? Neat!

Several cutscene bits and the map were apparently relying on CPU timing to switch the CHR bank in at just the right moment. While not perfect, these are looking a lot better:

This was a glitchy mess before this change.

Main gameplay is, alas, still very very broken. Battletoads is pulling a trick here where it disables the display for several scanlines that would normally be drawn, then updates the scroll position dynamically when it’s done. Since my PPU implementation is too basic and doesn’t properly emulate the trick, we get this mess instead:

And... this is STILL a glitchy mess. More work needed!

That’s all for now! Next up I’m rewriting the PPU so that it is just as cycle accurate as the CPU, which should greatly improve compatability with all the games on this page!

-Zeta

Audio: DMC

I’ve spent this entire week implementing the 5 audio channels on the NES, and the work has finally paid off:

Red: Pulse 1, Orange: Pulse 2, Green: Triangle, Blue: Noise, Purple: DMC, White: Final Output

Simon’s Quest:

The sample above demonstrates The Silence of the Daylight, from Castlevania II - Simon’s Quest. The DMC channel in particular was tricky to get right. It took an embarassing amount of debugging for me to realize that I was repeatedly playing the first byte of each sample, and forgetting to advance the current address. That tends to not sound very good!

Here’s some more quick examples:

Zelda II - Now with more Ganon Laughing:

Super Mario Bros:

I’m pretty happy with audio at this point! There are some remaining minor issues, mostly related to mixing and channel muting, but now I can move on to more interesting things. Working audio means a working frame counter, and hooking the APU’s IRQ up should let me run blargg’s timing tests, and put a number on just how overclocked my CPU is. Exciting times!

-Zeta

Audio: Pulse Channels

At long last, we have working audio! Well, kind of anyway. I’ve implemented the register settings and a good bit of the behavior for the two Pulse channels, which is already enough to get decent sounding audio out of most games, despite lacking some features. Because I haven’t figured out Rust’s audio output capabilities yet, I first worked on getting audio on-screen:

Audio Debugger - It's the little white thing under the screen

Despite looking an awful lot like a square wave and animating roughly in time with the mario theme, this implementation left a lot to be desired. I quickly learned that one sample per CPU cycle, while very, very hardware accurate, results in 1.7 MHz audio, which is just a bit too much. So much that when I tried to dump it out, my emulator ground to a halt just dealing with the data flow. No good.

Some research and a 44.1 KHz sample rate implementation later, I finally got Zelda II to produce the following raw audio dump:

Not bad! Not good either, this is just the frequency data being turned into pulse clocks with no extra features, but it definitely produces the right notes. I noticed that Zelda II was modulating the pitch all on its own… could it perhaps be setting the volume too? I quickly implemented direct volume control, and was pleasantly surprised by the result:

This is pretty good for not having implemented volume envelopes or the frequency sweep register! As it turns out, a lot of games simply don’t use these features very often, leading to pleasant sounding music with just 2 incomplete channels in place:

Super Mario Bros:

The music sounds pretty great! Sound effects not so much, Mario’s jump in particular is making use of the unimplemented sweep unit. This will be a great test case!

Final Fantasy:

The menu’s a bit unfortunate, and I’m pretty sure I’ll need either length counters, or the sweep register and automatic muting to take care of it. More interesting though, battle scenes now crash when selecting an attack! Investigating that will be fun later…

Castlevania II: Simon’s Quest:

The audio here is really good! Unfortunately my playing is not– I tried to head left to record the overworld audio, and promptly died a LOT. This is NES hard at its finest.

Solar Jetman:

Overall, I’m pretty happy with this! Now my challenge is to get Rust outputting the audio directly, and then it’ll just be a matter of implementing the rest of the channels. I can also use hook up the new Audio IRQ and finally run some of the CPU timing tests, which are badly needed; at this point my CPU is still wildly overclocked and in dire need of work in the timing department.

Cheers,
Zeta

Battletoads - First Boot

Every emulator has its share of tricky games to emulate, and the NES is no exception. Its unusual addressing scheme meant that game writers trying to push the performance of the system had to use every trick in the book. One of the hardest games to get working properly is also one of the most famous “NES Hard” games out there, Battletoads!

This game’s a masterpiece on the original hardware. It’s gorgeous and makes good use of the NES’s limited graphics hardware, and is also tough as nails. It also uses a bunch of weird tricks that rely on proper CPU and PPU timing, and relies on being able to change the state of the PPU as very precise points, making it a real pain in the rear for an emulator to run properly. In order for Battletoads to be playable, you need to be very nearly cycle accurate with your timings.

Naturally, I wanted to see just how badly it would run in RusticNES. At this stage, RusticNES processes one instruction at a time all in one go, and has an overclocked CPU: each instruction takes just one cycle. Let’s boot it and see:

Battletoads Title
Story Sequence

So far so good! That this boots at all means my AxROM mapper is working correctly, for some definition of correct. Things go downhill quickly though:

A ship, I think?

I’m not entirely sure what’s going on up there. AxROM uses 8kb of CHR RAM, so the game should be in full control over tile updates, but it seems to miss part of the loading sequence and we get title screen tiles instead of the ship.

Gravity is overrated

This is the meat and potatoes of the problem. I’m nearly certain the game is either relying on unimplemented Audio IRQs, or CPU-based spinwaits to time itself to individual scanlines and decide when to enable the display and adjust scrolling registers. Because my emulated CPU runs way too fast, Battletoads completely misses the mark and we get this instead.

This will be a fun little project! There is no one quick fix I can apply to improve this game. Instead, it should slowly improve and regress as the emulator strives for accuracy. I’ll revisit this game if new emulator features cause it to improve, and we’ll see where the journey takes us!

-Zeta

Hello World

This is the companion blog to RusticNES (name not final), a Nintendo Entertainment System emulator I’m writing to teach myself the Rust programming language. At least, this will eventually be that blog when I get around to writing things.

I’ll try to regularly post new screenshots as games start to work, features, and interesting testing and debugging runs. For example, here is the emulator finally running an MMC1 game for the first time:

The Original Metroid

That’s all for now. Hopefully I get around to writing more than just one blog post, until then, enjoy!

-Zeta