16. Adding Sound Effects

December 1, 2015, 9:05 pm

I made the sound effects in Famitracker, each one as a song, all in one file. (just like last time).

Apparently, there are no restrictions on Fx and such when making Sound Effects for Famitone2 (except the length needs to be fairly short). You need to cut the sound off every channel at the end, and add the effect…C00, which ends the pattern early. Then export the songs into 1 NSF file.

Now, put it in Famitone2/tools. and run the nsf2data from the command line…

nsf2data SoundFx.nsf -ca65

…and it will produce a SoundFx.s file that we can include into our reset.s file…

sounds_data:
.include “MUSIC/SoundFx.s”

And, enable Sound Fx for the Famitone2 engine…

.define FT_SFX_ENABLE 1

And, I edited the Famitone2.s file again, adding some underscored labels…

.export _Play_Fx

_Play_Fx:
ldx #0

FamiToneSfxPlay:

(I have only 1 Sound Fx Stream enabled right now).

And I can call the sound effect by passing the sound effect # (corresponds to the song # in the NSF file, first song = 0) to the Play_Fx() function.

void __fastcall__ Play_Fx(unsigned char effect);

In my example, I have 3 sound effects set up. The first is set to trigger anytime you jump. The second and third are tied to the Up/Down buttons.

if (((joypad1old & UP) == 0)&&((joypad1 & UP) != 0))
Play_Fx(1);

Start button still switches between the 2 songs. Here’s the source code…

http://dl.dropboxusercontent.com/s/q5fvtis646lmh18/lesson13.zip

On a side note, I’m sorry I didn’t use any DPCM samples in any of my examples. I personally think DPCM samples are great for sound effects – even if they use a lot of ROM space. You can use any .wav sound sample, and convert it to a DMC file with Famitracker (inside the instrument options), and you can use Famitone2 to trigger the sample. I really should revisit this subject in the future.

↧

17. Planning a Game

December 1, 2015, 9:15 pm

≫ Next: 18. Game Coding

≪ Previous: 16. Adding Sound Effects

We’re going to make a simple space shooter game.

Vertically scroll, shoot lasers, avoid the enemy ships.

First, we are going to set up several game “states”

Title Screen
Game Mode
Pause Mode
Game Over
Boss Fight Mode
Victory Mode

Here is our outline of the game code…

Initialize
Draw Title Screen
Start Title Loop
1. Get Input (wait for Start)
2. Play Music
Draw Game Screen
Start Game Loop
1. Get Input
2. Move your ship
3. Spawn Ships
4. Move enemy ships and bullets
5. Collisions
6. Play Music
If Run out of Lives, Draw Game Over Screen
-Play Game Over Music
-Loop Back to Title
If reach end, Start Boss Fight Mode
If win, Start Victory Mode, Draw Victory Screen
-and play Victory Music

First, create some graphics for the game.

Then, write some music.

I’m going to get the title screen working, with music, then I’ll get the Game Mode working.

I decided to use a Sprite Zero Hit to split screen the Scoreboard and the action, which will be vertically scrolling. Here’s a quick sketch I did in Photoshop…

Plan

I also decided to use Sprites for some of the text, like ‘Pause’ and ‘Game Over’. I thought that would be easiest.

Let’s get started, I’ll show you some code next time…I’m still trying to stick to the ‘make a game entirely in C’ concept, and this is just example code, so it won’t be too fancy.

↧

18. Game Coding

December 1, 2015, 9:16 pm

≫ Next: 19. Game Coding 2

≪ Previous: 17. Planning a Game

First thing is to get a skeleton of the game working. Title screen loads. Press ‘start’ moves to Game Mode. Press ‘start’ sets Pause Mode. ‘Start’ again goes back to Game Mode.

I opened Photoshop, and made a quick sketch of Title Sceen and Game Mode. Then I designed the Title by playing with fonts…Arial Black seems ok. I gave it some ‘perspective’, added a few filters, and resized it to 128 pixels wide. Converted to 4 color (color/mode/index/custom). Cut and paste it into YY-CHR.

SPACYtitle

The ship, I just sketched in YY-CHR. And some BG stars. Then I opened up NES Screen Tool, and randomly placed some stars, and saved nametable as RLE compressed C header. Similarly with the Title Screen, I arranged the text and Title, and saved that nametable as RLE in C.

SPACYgame

Then I added some code to load the screens and switch between them with Start button presses. I also added ‘Game Over Mode’ and ‘Victory Mode’. And, I’m testing them by tying them to the Select button (to be removed later). I’m also testing some Sound Fx that I made, tying them to some other button presses (Up/Down/B/A)…again, to be removed later. I just want to make sure they work.

For some reason I made all of my sound fx with the noise channel. Maybe this is a bad idea. Oh well.

At this point I got ahead of myself, and added the music. Usually, I wait till the very end to add music. I highly recommend waiting till the end, especially if it’s a very long project, because you will get REALLY sick of hearing the same 1 minute song loop OVER and OVER and OVER.

I just happened to be in a music writing mood, and so I added it right away. Again, I used Famitracker and Famitone2. (See earlier post).

At this point, I added a HUD (the scoreboard), which is updated every frame with a new score. The scoreboard is actually written on the opposite nametable at startup. We will be using a sprite zero hit to switch nametables midframe.

Then, I put the ship in, and gave it some basic Left/Right movement physics. It moves a little slippery (you keep moving even if buttons aren’t pressed), which I thought is appropriate for space. This can be changed later, if gameplay is odd.

Finally, I added the sprite zero hit and vertical scrolling. Switching nametables midframe is no trouble…just write it to $2000. If we were also changing H scroll, we would just need to write to $2005. But, I was doing the more complicated V scroll change. This is the rain dance you have to do to switch V scroll midframe…

Vert_scroll2 = ((Vert_scroll & 0xF8) << 2);
Sprite_Zero(); //wait for sprite zero hit
PPU_ADDRESS = 0;
SCROLL = Vert_scroll;
SCROLL = 0;
PPU_ADDRESS = Vert_scroll2;

I ran into some problems here, and had to do more debugging. It would work fine for a few seconds, then glitch, then it would screw up the sprite zero hit. I had to investigate. In FCEUX, debugger, I set breakpoints to writes to $2000, 2005, 2006, just to make sure.

They were writing the correct numbers. But, then I noticed that the first write to set the top of the screen (which should have happened during V-blank) was happening at Scanline 40! Oh, the music that I added was doing its update in the NMI routine…before anything else was happening. And, it was taking long enough (occasionally) to not get to all the other stuff that needs to happen during V-blank, until well into rendering the screen. (This was code from another game that I cut/pasted in…poorly). And, it was missing the Sprite Zero hit. It didn’t actually crash the game, it was just setting it up WAY TOO late.

I moved the music update to go last on my list of things to do each frame, and it was working perfectly now.

After all that, I’m going to take a break.

Here’s how the different modes work…

void main (void){

  while (1) { // infinite loop

    while (GameMode == TITLE_MODE){ 
    //Title Mode code

    while (GameMode == RUN_GAME_MODE){ 
    //Game Mode code

    while (GameMode == PAUSE_MODE){ 
    //Pause Mode code

    while (GameMode == GAME_OVER_MODE){ 
    //Game Over code

    while (GameMode == VICTORY_MODE){ 
    //Victory Mode code

    }
  }

See you next time. Here’s the code I have so far.

http://dl.dropboxusercontent.com/s/vcnifnoooflgilq/spacy.zip

↧

19. Game Coding 2

December 1, 2015, 9:17 pm

≫ Next: 20. Game Coding 3

≪ Previous: 18. Game Coding

Now, I want to add some more sprites. We have the potential for exceeding the 8 sprite per scanline limit… so we’re going to have to create a sprite priority shuffling system. Sprite Zero can’t shuffle, and I’m using Sprites for the Pause Mode text, that needs to stay on top, and I want to keep the player’s ship from flickering. But, everything else will be changing location in the OAM, so it will be flickering rather than disappearing.

I doubled the RAM space for Sprites. I’m filling the first part with sprites, as they come up in the code, and then I copy them to the second part with slight changes every frame. The second RAM section is being sent to the OAM every frame (at the beginning of every V-blank). This is a simple way to shuffle. It could be rewritten so it only takes up $100 bytes of RAM, but I’m not worried about it right now.

I tested sprite shuffling, by adding some dummy sprites on the screen, to make sure they flicker. They do. I removed the dummy sprites. Now, to add real sprites.

First, I’m going to create some bullets, and use A or B button to trigger. I want it to be auto-repeat, but not every frame of course. So, I set a little timer to not allow new bullet until timer = 0. Eight on the screen at once, max. (I later reduced this).

Now I want to add some enemies. I haven’t made their graphics yet, I’m just going to do some test enemies, and make sure collision routines are working. I set Select button to spawn 4 enemy ships, and code it so I can shoot them. And they can collide with you and reduce your life. And, then I added… If Lives == 0, Game Over Mode.

I used the collision code from the earlier lesson. I’m doing lots and lots of collision checks. Every ship needs to be checked with every bullet.

Now, I want to check how much CPU time I’m using per frame, so I set it to switch to B&W mode at the end of the game mode logic (FCEUX needs to be set to Old PPU to see it). I changed the BG color to blue to see it better.

It looks like we’re ok, we’re only to scanline 170, we could probably fit a few more things in, per frame, if we wanted to.

SPACY2

Here’s what we have so far.

http://dl.dropboxusercontent.com/s/fczfdpahrdgb7rl/spacy2.zip

↧

20. Game Coding 3

December 1, 2015, 9:18 pm

≫ Next: 21. Credits and Thanks

≪ Previous: 19. Game Coding 2

The only thing I still need to do is get the enemies working, the boss mode, and get the victory mode working. I was going to have the enemies shoot bullets, but I’m skipping that. But, the boss will be shooting lasers at you.

I drew the graphics, and planned the final sprite palette. Now I need a way to time enemies showing up. I have a clock variable that goes up based on the frame counter, about once per second. Then I reference an array of times to spawn new enemies. When the clock == the next time, it then fetches data from another list for how new enemies will appear.

I simplified this by just having 5 different patterns…thus each new enemy pattern is only 1 byte long in the array. You will probably want to have a more complex system, where you define x,y,type, of each enemy as it appears. That would be more interesting.

SPACY3

And I have it checking the clock, to see if we reached the end of the level, at which point it will init the Boss fight, and set Boss Mode.

The boss moves in a repeating pattern here. You could make it more dynamic by having a boss react (change behavior) based on the player’s moves / position. It counts down a certain amount of frames, then switches moves. Pretty simple.

I turned off sprite shuffling on the Boss Mode. It actually does have more than 8 sprites per scanline, so some of the Boss’s sprites dissappear, but I actually thought it looks cool, because it kind of contrasts the lasers as they come out.

If you were going to make a Boss bigger than this 6×6, you would have to render it as BG tiles, and shift the Scroll to make it look like it’s moving. You would have to have a solid background behind the boss to do that.

SPACY4

Now, the most important last step…Game Testing. Play it in multiple emulators, make sure there are no bugs. In FCEUX, play the game with all scanlines displayed (Config/Video/Drawing Area/NSTC…set from scanlines 0 to 239). That way you can make sure there won’t be any weird things happening at the very top or bottom.

If we made it any more complicated, we would have to consider rewriting everything to optimize for speed, because we are almost out of CPU time per frame. We can also squeeze a little more in just before the sprite zero hit, but very carefully, because 1 missed sprite zero hit can crash the game.

Another option is to only do half the logic one frame and half the logic the next frame. If you plan it right, you could do this without noticable slow down. Make sure that the music is updated every frame, for example.

This took me about a week. If I made the enemies more complicated, add enemy missiles, make it 5 levels long, I could easily see this taking 2-3 months (maybe 60 hours of work). I think that’s a good length for first games.

Now, go make some games. I’m sure you can do better than this. Good luck. Have fun.

Here’s the final code for our example game.

http://dl.dropboxusercontent.com/s/0ks1vwtzbfmlpyd/spacy3.zip

↧

21. Credits and Thanks

December 24, 2015, 3:45 pm

≫ Next: 22.More

≪ Previous: 20. Game Coding 3

I would like to thank everyone who helped me learn NES programming, especially the people at forum.nesdev.com

I learned a lot from the example code for cc65 written by Shiru. I’ve used a few bits and pieces of code from his example files. Also the Famitone2 code and NES Screen Tool were written by Shiru. Check out his games on his website…

https://shiru.untergrund.net/software.shtml

(scroll down to NES/Famicom) or check out the cc65 examples yourself…

http://shiru.untergrund.net/articles/programming_nes_games_in_c.htm

(click on “these small example programs”).

And, two of his games are available (or soon will be) for sale from GreetingCarts (Retroscribe) here…

http://www.greetingcarts.com/

Or here, maybe…

http://preview.greetingcarts.com/

Also, I want to thank THEFOX for his help when I was getting started using cc65. And also for the cc65 example code I found on his website (which is down right now :( )…

https://www.fauxgame.com/

is where it was…the example platformer was called Seamen Chronicles, but I can’t seem to find a link to it. Google just shrugs its shoulders and asks me if I mean ‘Semen’. No. No I don’t.

You can still get a copy of Streemerz, programmed by THEFOX here…

http://www.romhacking.net/homebrew/13/

I also want to thank Rainwarrior for his Coltrane demo, which was written for cc65. You can find the example code on his website…

http://www.rainwarrior.ca/music/coltrane_src.zip

Also, be sure to check out his Lizard Game (soon to be finished, I believe).

http://lizardnes.com/

Thanks to everyone.

Now, if I can only figure out how to program a SNES game…

↧

22.More

December 26, 2015, 9:21 pm

≫ Next: Contact, My Stuff

≪ Previous: 21. Credits and Thanks

I wanted to make sure I covered everything. I barely mentioned mappers. If you want to make a game bigger than $8000 bytes, you would have to use a better mapper than NROM256. That would allow you to switch PRG-ROM banks in-game, and/or CHR-ROM banks. CC65/CA65 is the perfect tool for setting up the different banks. I wish I could provide source code for this, but I mostly work with small games. There are dozens of mappers. I think MMC3 is a good one. It allows both CHR and PRG swapping, and has a scanline counter.

How to get a game on a cart. You can get a usb / flash cart, such as the Everdrive or the PowerPak. You could also burn your own EPROM and solder them into an actual cartridge, but that’s a little bit above my skill level at the moment.

I skipped over Color Emphasis bits of register $2001. The 3 high bits are B, G, R…and they emphasize those colors by slightly darkening the other colors. Set all the bits will darken the whole screen. Here’s what the palette looks like under various color emphasis bits (Using an Emulator)…

ColorEmphasis

Additional cc65 features.

Ullrich von Bassewitz has apparently added the entire C standard library, and some other functions. It seems you can multiply and divide numbers, it’s just very slow (perhaps a table of presolved answers would be faster).

If you #include “..\include\stdlib.h”

It looks like you can use calloc, malloc, free, and realloc. (I had a hard time testing these…I had to doubly define __STACK_SIZE__ and __STACKSIZE__ in the .cfg file. It seems to me like one of those is a typo, but I’ve seen it both ways.) You will also have to define the HEAP to use these functions…here’s a link that talks about setting up the HEAP…

http://www.cc65.org/mailarchive/2009-05/6581.html

Also interesting is rand (Random Numbers), srand (to seed the random number). And qsort (QuickSort).

If you #include “..\include\cc65.h”

You can use cc65_sin and cc65_cos (Sine and Cosine).

And if you #include “..\include\zlib.h”

You can use the built in decompression library. (I haven’t tested it).

——————————————–

I also never covered the IRQ. It works like NMI. It’s an interrupt that stops the normal code, and jumps to the code (defined by the IRQ vector, at FFFE-FFFF). There are 3 main ways to encounter an IRQ…

if the code encounters a BRK instruction (opcode #00)
music channel IRQ is turned on, and DMC sample ends
mapper generated IRQ, like MMC3’s scanline counter

Only the mapper generated ones seem useful to me, as this could reliably change many PPU settings midframe with great precision…or swap tiles midframe, so you can have different tiles at the top of the screen from the bottom. Etc.

———————-

One last note. Some emulators cut off the top and bottom 8 pixels. The NES produces the full 240 pixel high image, but due to the way old TVs projected their image, they would tend to cut off 8-16 pixels off the top and bottom of the screen. Newer TVs might not cut any pixels off the top or bottom. Long story short, don’t put anything important on the top or bottom 16 pixels of the screen.

And that’s all I can think of right now. Go make some games. I bet you can make something better than I just did.

↧

Contact, My Stuff

December 29, 2015, 7:19 pm

≫ Next: 23. Using DMC Sounds

≪ Previous: 22.More

If anyone really, really needs to reach me, I have a hotmail account that starts like dougfraker2

Also, here’s a game that I’ve been working on for the past 6 months…well, here’s the level 1 + level 2 demos of Vigilante Ninja 2.

vig27title

vig28c

vig27play

Here’s the link…

http://dl.dropboxusercontent.com/s/o42mpk8x0gv1hck/VN2L1.zip

http://dl.dropboxusercontent.com/s/sfnovd85mbb02iz/VN2L2.zip

And, here’s a speech synthesizer program for the NES I made…

talkNES3

talkNES3M

http://dl.dropboxusercontent.com/s/58d3igz64p5x7md/TalkNES4.zip

Check back to the blog in the future, because I will be adding more example code, from time to time.

Thanks for visiting!

Oh, here’s some more stuff…some paintings I did about 2008-9. They are for sale if anyone is interested.

20160107_135831
Girl with the Pixel Earring, 28″x22″, Acrylic on Canvas, no frame, $300

20160107_140052
Data Lisa, 28″x22″, Acrylic on Canvas, no frame, $250

20160107_140219
8-bit Soup, 18″x24″, Acrylic on Canvas, no frame, $50

20160107_140654
Untitled, 24″x20″, Acrylic on Canvas, no frame, $150

↧

23. Using DMC Sounds

January 13, 2016, 11:47 am

≫ Next: 24. MMC3, Bank-switching, IRQs

≪ Previous: Contact, My Stuff

I’m going to explain how to use Famitone2 and Famitracker to add DMC sound samples to your game…first let’s review how the DMC channel works…here’s some code…

*((unsigned char*)0x4015) = 0x0f; //turn off DMC

//this controls which channels are on/off, DMC is the 0x10 bit

*((unsigned char*)0x4010) = 0x0f; //set the sample rate, f = highest

ADDRESS = 0xf000; //or whatever is the location of the DMC sample in the ROM

//the DMC samples MUST be located between c000 and ffff in the ROM

*((unsigned char*)0x4012) = (ADDRESS & 0x3fff) >> 6; // 0xf000 => 0xc0

LENGTH = 0x0101; //Actual length of a sample (in hex)

*((unsigned char*)0x4013) = LENGTH >> 4; // 0x0101 => 0x10

*((unsigned char*)0x4015) = 0x1f; //turn back on DMC channel

//any time DMC channel is turned on, it triggers the DMC sample to play

Ok, well, Famitone2 does all this for you, but if you ever wanted to add a DMC sample without Famitone2, this would the sequence of code to make it work. Here’s some more thoughts about DMC usage. Keep the samples very short. I use 0.1-0.5 second samples. You can use more/longer samples with a slower sample rate, but they will all have an annoying buzzing sound in them and poor quality. Also, DMC samples are only about half as loud as the other music channels…so if you want to balance the mix, you should cut the volume of the other channels by half.

Anyway. We are going to use the simple platformer code again, and add a Famitracker song that uses DMC samples for drums. I’m using some royalty free sounds samples that I got on a free CD many years ago. I edited them for length (shorter = better). Then I imported them into Famitracker. (You have to click on instrument #0, and then the DPCM tab).

DMC2

After importing the files, save them as DMC files. Assign each sample to a Key (on the left) and now you can enter them into the song under the DPCM column. Save the finished song, and export it as a .txt file. Bring that over to Famitone2/tools, and run the text2data program to convert it into a .s and .dmc file. (It’s a command prompt .exe)

text2data DMCmusic.txt -ca65

Now, I had to do a few things to get it to play…I enabled DMC (and SoundFx) for Famitone2 (.define in my reset.s file). And, I defined the start location “FT_DPCM_OFF” to be at $f000. Then, I included the files at the bottom of the reset.s file. I made sure the DMC samples were in the .segment “SAMPLES” which I had to add to the .CFG file (and define it to be at $f000).

At this point, Famitone2 takes care of the rest. I’m just playing the song the same as we did on the “Adding Music” page, and it will trigger the DMC samples for you in the music code of famitone.s.

I also added a non-DMC sound effect by making a Famitracker file and saving as NSF file, and using the famitone2/tools/nsf2data program (see “Adding Sound Effects”), triggered when the character jumps. There, done.

http://dl.dropboxusercontent.com/s/k00shsgd63f9a09/lesson17.zip

Now, I want to use some DMC sound effects. I decided to use the same song, but I changed the drums to Noise Channel (so it won’t cut in and out when a sound effect is played).

Famitone2 makes you add DMC sound effects into the song. Must be on instrument #0. I exported a .txt from Famitracker, and used text2data again. It created a .s and .dmc file. Basically same as above, exept I didn’t use the DMC sample in the song.

DMC3

Now, to call the sound effect from the C code, we need to do a fastcall to the famitone2.s label FamiToneSamplePlay but we need to add an underscore, I just added this above it…_DMC_PLAY (and added the line .export _DMC_PLAY). Now, I added a ‘fastcall’ function to the C code…

void __fastcall__ DMC_PLAY(unsigned char effect);

And, we need to give it a 1 byte value, equal to the DMC samples value. Now, I thought that their values would be #1 and #2. When I tried that, I got nothing. So, I looked closely at the DMCmusic2.s file (which defines each sample), and for some reason, it put the samples at #25 and #27. Here, look…

DMC

I don’t know why they’re at 25 and 27, but…whatever. You call them in the C code like this…

DMC_PLAY(27);

I have it set to play sample 1 when you jump, and play sample 2 when you press ‘START’.

Here’s the link…

http://dl.dropboxusercontent.com/s/eo92hyp4ms5mqhs/lesson18.zip

↧

24. MMC3, Bank-switching, IRQs

January 14, 2016, 10:20 pm

≫ Next: 25. Importing a MIDI to Famitracker

≪ Previous: 23. Using DMC Sounds

(Thanks to thefox for pointing out an error in my example code, see note at the very bottom of the page)

So far, all I’ve been using is small NROM sized .nes files. I’m going to show how to set up a much larger .nes file using the MMC3 mapper. I don’t know every mapper, but I know the MMC3, so I’ll use that for my example. NROM was the first cartridge design, but they soon used up the 32k bytes of PRG ROM (code), and especially the 8k of CHR ROM (graphics). A mapper is a way to trick the NES into accessing more ROM, by swapping (mapping) banks in and out of the CPU memory.

I’m going to pretend that I’m designing for a real actual NES cartridge, and use the actual size of an actual MMC3 board. According to this website…

http://kevtris.org/mappers/mmc3/

The choices we have, are…

Max. 64K PRG, 64K CHR
Max. 512K PRG, 64K CHR
Max. 512K PRG, VRAM
Max. 512K PRG, 256K CHR
Max. 128K PRG, 64K CHR, 8K CHR RAM

(and some others that I omitted). I’m going to use the smallest example, 64k and 64k. First, we have to set this up correctly in the header. The header is a few bytes of metadata that is used by emulators, so it knows which mapper we’re using, and how many banks, etc. Here’s how it should look for our 64/64 MMC3…

.byte $4e,$45,$53,$1a
.byte $04 ; = 4 x $4000 bytes of PRG ROM
.byte $08 ; = 8 x $2000 bytes of CHR ROM
.byte $40 ; = mapper # 4 = MMC3

Now, we need to set up the .cfg file for multiple banks…

#ROM Addresses:
#they are all at $8000, because I will be swapping them into that bank
PRG0: start = $8000, size = $2000, file = %O ,fill = yes, define = yes;
PRG1: start = $8000, size = $2000, file = %O ,fill = yes, define = yes;
PRG2: start = $8000, size = $2000, file = %O ,fill = yes, define = yes;
PRG3: start = $8000, size = $2000, file = %O ,fill = yes, define = yes;
PRG4: start = $8000, size = $2000, file = %O ,fill = yes, define = yes;
PRG5: start = $a000, size = $2000, file = %O ,fill = yes, define = yes;
PRG6: start = $c000, size = $2000, file = %O ,fill = yes, define = yes;
PRG7: start = $e000, size = $1ffa, file = %O ,fill = yes, define = yes;

# Hardware Vectors at end of the ROM
VECTORS: start = $fffa, size = $6, file = %O, fill = yes;

Actually, what the $8000 does here, is every label inside that bank will be given an address indexing from the start of that bank + $8000. I actually only have code in the last bank, so I could put the start address anywhere, and it would work the same, but you will probably have code in some of these banks, and so you need to give the code the correct addresses for where it will actually be going when it’s swapped in.

And, defined these segments in the .cfg file…

SEGMENTS {
HEADER:   load = HEADER,         type = ro;
CODE0:    load = PRG0,           type = ro, define = yes;
CODE1:    load = PRG1,           type = ro, define = yes;
CODE2:    load = PRG2,           type = ro, define = yes;
CODE3:    load = PRG3,           type = ro, define = yes;
CODE4:    load = PRG4,           type = ro, define = yes;
CODE5:    load = PRG5,           type = ro, define = yes;
CODE6:    load = PRG6,           type = ro, define = yes;
STARTUP: load = PRG7,           type = ro, define = yes;
CODE:     load = PRG7,           type = ro, define = yes;
VECTORS: load = VECTORS,        type = ro;
CHARS:    load = CHR,            type = rw;

BSS:      load = RAM,            type = bss, define = yes;
HEAP:     load = RAM,            type = bss, optional = yes;
ZEROPAGE: load = ZP,             type = zp;
#OAM:   load = OAM1,    type = bss, define = yes;
}

(the OAM segment is not used in this example).

Ok, now I’m going to write something in each bank, so we can see how it loads into the ROM file. I’m writting the words “Bank0”, “Bank1”, etc. in every bank. And, I’m going to load those words onto the screen, so we can see the switch visually. (I set it to be triggered by pressing ‘Start’).

I had to write a bunch of PRAGMAs so that each bank will be compiled into the correct bank. Like this…

#pragma rodata-name (“CODE0”)
#pragma code-name (“CODE0”)
const unsigned char TEXT1[]={
“Bank0”};

#pragma rodata-name (“CODE1”)
#pragma code-name (“CODE1”)
const unsigned char TEXT2[]={
“Bank1”};

#pragma rodata-name (“CODE2”)
#pragma code-name (“CODE2”)
const unsigned char TEXT3[]={
“Bank2”};

etc. And when START is pressed, it will switch banks into CPU addresses $8000-9fff, and then load the first 5 bytes of that bank and write it to the screen, with this code…

void Draw_Bank_Num(void){ //this draws some text to the screen
PPU_ADDRESS = 0x20;
PPU_ADDRESS = 0xa6;
for (index = 0;index < 5;++index){
PPU_DATA = TEXT1[index];
}
PPU_ADDRESS = 0;
PPU_ADDRESS = 0;
}

When this compiles, it will write the address of TEXT1 into the code. It’s the only thing in the first bank (bank #0), and in the .cfg file, I defined that bank to start at $8000. So, it will be fetching the first 5 bytes from $8000-8004. That is the bank that I keep switching, so every time it goes here, it will be pulling those 5 bytes from whatever bank is mapped to $8000. Here’s the code that switches which bank will be mapped to addresses $8000-9fff…

if (((joypad1old & START) == 0)&&((joypad1 & START) != 0)){
++PRGbank;
if (PRGbank > 7) PRGbank = 0;
*((unsigned char*)0x8000) = 6; //bankswitch a PRG bank into $8000
*((unsigned char*)0x8001) = PRGbank;
Draw_Bank_Num(); //re-writes the text on the screen

I know, it looks like we’re storing 6 at address $8000. But, you can’t do that, because that’s a ROM address. What this does is send a signal to the MMC mapper that we want to switch a PRG bank into $8000. The next line defines which bank will be swapped in. You can get some more detailed info on the wiki…

http://wiki.nesdev.com/w/index.php/MMC3

I feel like this may be confusing. It’s an unfortunate coincidence, that the bank we’re swapping and the MMC3 register are both called $8000. If you wanted to instead swap a bank into the CPU address $a000-bfff, you would do this…

*((unsigned char*)0x8000) = 7; //bankswitch a PRG bank into $a000
*((unsigned char*)0x8001) = which_PRG_bank;

Is that clearer? Bank swapping is done with a $8000 / $8001 write combination.

I also added a few lines at the start of the main() function, which sets the initial state how everything should be mapped at the start. I don’t know for sure, but I think that the only bank that is certain at RESET is…that last PRG bank will definitely be at $e000-ffff. All our startup code should (ie. MUST) be located in that bank.

Now, that’s cool and all, but you will not be using bank swapping like this (having every bank have things located at fixed positions, like $8000). In reality, you will probably have an array of addresses at the start of every bank, which point to the location of data within the bank (that way, the data can be anywhere within the bank). If you have code in that bank, you will perhaps put a ‘JUMP TABLE’ at the start of the bank…that’s an array of addresses of the start of every function inside the bank. Essentially, the code in a fixed bank will read the address, and then jump (indirect) to that address. Or maybe, use the ‘push address to the stack and RTS’ Trick…

http://wiki.nesdev.com/w/index.php/RTS_Trick

That’s kind of complicated ASM stuff, but it might be worthwhile learning it, if you’re going to make the most of using multiple banks.

Anyway, I wanted to add a cool ‘moving background’ effect. This effect can be done several ways, but I think bank swapping CHR ROM is the easiest. This code waits 4 frames, and then switches which CHR banks will be mapped to $0000-$03ff of the PPU. When the PPU goes to draw tile #23 to the screen, the mapper will direct it to tile #23 of a specific CHR bank in the ROM.

MMC3 actually breaks the CHR ROM into $400 byte chunks (64 tiles), so bank 0 = the first $400 bytes, 1 = the next $400 bytes, etc. It takes multiple MMC map writes to fill the PPU full of new tiles. I’m just changing that first $400 bytes (PPU addresses 0-$3ff). I made a little waterfall. There are 4 nearly identical CHR ROM banks, with the water tiles shifted downward 1 pixel each bank down.

lesson19

Here’s the link to the source code. Press ‘START’ to see the PRG ROM bank switch.

http://dl.dropboxusercontent.com/s/gl73mp2nr1rpdzl/lesson19.zip

Now, that’s not all MMC3 can do…it can also count scanlines. Normally, you would need to set up a Sprite Zero hit to time something to happen midframe, and you can only do that once. With MMC3, you can time multiple things to happen midframe, like changing the Scroll, or swapping CHR ROM, or other cool tricks. I’m going to change the scroll, about every 20 lines. You don’t have to sit and wait for those 20 lines, the MMC3 mapper will count for you, and you can go on to do other things (game logic). It will generate an IRQ, and jump to the IRQ code. I have the IRQ code changing the Horizontal Scroll (several times a frame).

Things we need to do…The first line of every NES game (startup code) is to turn off interrupts. But, that’s what an IRQ is, so in our main() function, I turned interrupts back on.

asm (“cli”); //turns ON IRQ interrupts

Also, I have to make sure that the vectors at the end of the reset.s code is pointing to the address of the IRQ code.

Now, during V-blank (you can turn this on any time, I just wanted to start it at the top of the screen) I set the MMC3 to start counting scanlines with this code…

*((unsigned char*)0xe000) = 1; //turn off MMC3 IRQ
*((unsigned char*)0xc000) = 20; //count 20 scanlines, then IRQ
*((unsigned char*)0xc001) = 20;
*((unsigned char*)0xe001) = 1; //turn on MMC3 IRQ

Note: I think it skips the first scanline, so it actually doesn’t generate an IRQ until at the end of scanline 21. Now, I want to change the horizontal scroll, which isn’t a problem, but if I immediately tried to change the scroll, there would be a slight glitch (misalignment) of the screen at that point. I’m always amazed at how many professional games have these kind of glitches. Anyway, to avoid the glitch, you have to change the scroll during the very very very short H-blank period. What’s an H-blank? When the PPU is drawing the picture to your TV, it goes left to right, then it jumps from right to left (not drawing) very quickly. That’s the H-blank period.

Well, the MMC3 IRQ triggers right at the H-blank period, but by the time it jumps to the IRQ code, and you load a number to the scroll, it’s already drawing the next scanline. So, in order to get it to change the scroll during H-blank, we have to wait for the next one. I wrote a tiny little loop, to wait about 100 CPU cycles, and then switch the scroll position. I think I’m timing it just right, but each emulator seems to be just a tiny bit different (ie. inaccurate), so I can’t be sure.

After the H-scroll is changed, I set up another ‘wait 20 scanlines and IRQ’ bit. It’s a bit tricky to get it to split exactly where you want. I’ve noticed that actual games don’t do the wait loop like I’m doing here. What they do is have nothing but a flat single color at the scroll split point across the entire scanline, so the glitch isn’t visible. Or, they just have a big glitch and don’t worry about it.

If you want to see the glitch (so you know what I’m talking about), edit that tiny wait loop (in the IRQ code in reset.s) to be just 1 number bigger, or 1 number smaller. Recompile it. Glitches at every split! The H-blank is really that small.

lesson20

I still have ‘START’ change the bank number on screen…if you can read it. Here’s the link to the source code…

http://dl.dropboxusercontent.com/s/1435iwsn62kixvg/lesson20.zip

EDIT:

It occurred to me that, maybe people won’t want to recompile this, but still want to see the glitch thing I’ve been yapping about. Here’s an animated gif, with the wait loop off by just 1 loop. Look at the right side of the screen, the last scanline of each segment is misaligned, and that misalignment changes each frame, so it kind of dances around oddly. That’s just being a few pixels off on the scroll split…imagine if the Scroll split was done halfway into the scanline. You’d have one scanline of each segment be as much as 80-100 pixels out of alignment. That would look terrible, and be distracting.

Glitch

You might wonder why we would split the screen like this anyway? For parallax scrolling. Go to YouTube, and search for NES and Parallax Scrolling. You’ll notice also, what I mentioned earlier, about most games do their scroll splits at a flat mono-color portion of the screen so glitches aren’t noticable. That would have been better (and easier) than trying to carefully time an H-blank split.

NOTE about the error I made (and fixed). I named the first bank “CODE” in the .cfg file and defined it to be at $8000, and it was a swappable bank. Apparently “CODE” is the default name that the C compiler uses to put all the C library functions. We can’t have that in a swappable bank, because if you go to call one of those C library functions (really, most code in C uses them) the bank that contains them might not be in place, and the game would crash. So, I called the last bank “CODE”. That’s the fixed bank. It will always be in place. Now, our C library functions will always be in the right position.

The other error that I fixed, was the IRQ code in “lesson20/reset.s”. If the C code was doing anything that required the C stack or the C variables when the IRQ was called…doing any more C functions inside the IRQ handler will screw up the stack/variables, and when the IRQ is finished, and it jumps back to the MAIN() function, the game will promptly crash. So, I rewrote the entire IRQ code in ASM, to make sure none of the C stack / C variables are affected. Here’s a link about that…

http://www.cc65.org/faq.php#IntHandlers

↧

25. Importing a MIDI to Famitracker

January 23, 2016, 10:56 pm

≫ Next: 26. ASM Basics

≪ Previous: 24. MMC3, Bank-switching, IRQs

I’ve read every webpage on this subject, and not one explains how to do this. It’s not easy, because they never got the Import MIDI feature working right in Famitracker…In fact, they removed that feature. You have to download version 0.4.2 to use that feature.

http://famitracker.com/downloads.php

I create MIDI files by connecting a MIDI keyboard to my computer with a MIDI to USB cable. I record with a program called REAPER. It costs $60, and you can use it for free for 60 days… (I’m using version 5.12 for this example). I highly recommend it for all music production, except sound editing…for that I use Audacity. Here’s the REAPER website.

http://www.reaper.fm/

First thing…you’re going to want to use a virtual instrument VST or VSTi. This has no effect on the output file, but it will help you hear the MIDI track. I’m pretty sure that Reaper has some default VST instruments that would work fine. But, I also found this one, which simulates the 2A03 chip (NES sound).

http://www.mattmontag.com/projects-page/nintendo-vst

So. Open Reaper, change the project BPM to 12. (by clicking the little box in the middle that says BPM). That’s not a typo. I mean 12.

Click INSERT / Virtual Instrument. Find the VST-NES. (I set it to polyphonic). Listen to a metronome set to 120 BPM. Try to hit every note in beat with the metronome. By default the VST track you just added should be ‘Armed for record’ (the red circle by the name of the track). That’s what we want. Now, hit record (it’s sort of in the middle on the left – by the play/pause buttons.)

Reaper1

I prefer to play on a MIDI keyboard (while listening to the headphone output of the keyboard — and a metronome — rather than listening to the computer…because there will always be some annoying latency). You can also play using the virtual keyboard (you have to click on it for it to work). It is mapped to letters on your computer keyboard. You can change the mapping by right clicking on the virtual keyboard.

When done recording, hit stop.

Now, double-click on the MIDI track. If you did a fair job of playing on the beat, the notes will be mostly lined up to the grid. Now, click EDIT/QUANTIZE…’use the grid’, ‘all events’, and ‘position and note end’. Now all the notes should be perfectly lined up. If some of them are out of position, move them around as needed.

You might want to only record a single note at a time (ie, no chords). In my example here there will be up to 2 notes at the same time. To get it to load correctly in Famitracker, you have to put the second note on a different channel. By default all the notes will be on channel 1. Select all the notes you want to switch channels. (I like to put bass notes in the triangle channel, because it can go an octave lower than the square channel.) To select notes (from the piano roll MIDI editor) right-click and drag while holding down CTRL. Once you’ve selected all the notes, right-click one of them, and Note Channel, 3 (for example).

Reaper2

OK, now we’re almost done. Famitracker still won’t load the MIDI file exactly. So, here’s a little adjustment to get it to work better… Close the MIDI editor. Change the project BPM to 100. Now right click on the MIDI track, and choose SOURCE PROPERTIES, and click the little box ‘Ignore project tempo’ and set a tempo of 125 BPM. Press ‘OK’. The whole MIDI track will probably seem all wrong at this point. But, bear with me…

Reaper3

Double-click on the MIDI track one more time to open the MIDI editor. Click FILE/EXPORT TO NEW MIDI FILE, and save it.

Now, Open Famitracker 0.4.2. (the last version to have a MIDI import feature).

Click FILE/IMPORT MIDI, and I have the lower notes (Channel 3) going to the Triangle Channel, and the rest (Channel 1) going to square one. I prefer 256 for pattern length, but I think 128 would work fine, too.

Set the speed to about 5 – 10, tempo 150. Add 3 new instruments (with volume and duty sequences). Now save, and close Famitracker, and open the file with the newest (more stable) version of Famitracker.

All the Triangle channel notes will be an Octave too low. Click somewhere on the Triangle track. CTRL-A (select all) and then Right-Click on the Triangle track. TRANSPOSE/INCREASE OCTAVE. If you have multiple frames, you will probably have to repeat this for every frame.

And, we’re done. Sounds right to me. That was easy, right?

FT3

If you need to bump things up or down… INSERT moves everything down from the highlighted point. BACKSPACE will move everything up from the highlighted point (deleting the one above it).

If anyone has an easier way of doing this, please let me know. Thanks.

I’ve tried this with some freeware MIDI editors, but they just don’t have the features of REAPER. I’m sure many of the better DAWs out there will do the same thing, but you just can’t beat the $60 price tag.

↧

26. ASM Basics

March 9, 2016, 10:21 pm

≫ Next: 27. ASM part 2

≪ Previous: 25. Importing a MIDI to Famitracker

Intro to 6502 ASM, with ca65.

I’ve had a lot of questions about 6502 ASM. One of the features of cc65 is the ca65 assembler, which is a very good one. You can write any, or all, your functions in assembly. But, it would help if you knew how 6502 ASM works…so I’m going to write a few tutorials. All my examples will be ca65 specific. It would also help, if you have a strong understanding of binary and hexadecimal numbers.

The NES has 256 zeropage RAM addresses (that is addresses 0 – 0xff) and 1792 non-zeropage RAM addresses (addresses 0x100 – 0x7ff). Some games also have an additional RAM chip on the cartridge, usually mapped to 0x6000-0x7fff. If powered by a battery, it SAVES the game.

The hardware stack goes from 0x100-0x1ff, so let’s not put any variables here. Also, most games put a OAM (sprite) buffer from 0x200-0x2ff. And, furthermore, if you are using cc65, it needs to put its stack and (optionally) heap somewhere…usually at the top (around 0x7ff)…but you can change this in the .cfg file.

Anyway, with these assumptions…I’m going to use the zeropage (0 – 0xff), and 0x300-0x3ff in all the examples. I will usually use $ to indicate hex numbers, rather than 0x. This is requirred syntax by ca65.

Declaring constants.

zip = 0
FOO = $3f
FOO2 = $03e3

When used in assembly code, the assembler will replace the symbol with the value you define.

Examples:

-> here means ‘assembles into’
the left-most part is what you will be typing

LDA #zip -> LDA #0   load A with value 0, the '#' means value, not address
LDA zip  -> LDA 0  load A from the 8-bit address $00
LDA #FOO -> LDA #$3f load A with the value $3f
LDA FOO  -> LDA $3f  load A from the 8-bit address $3f
LDA FOO2 -> LDA $03e3 load A from the 16-bit address $03e3
LDA #<FOO2 -> LDA #$e3 load A with the value (lower byte of $03e3 = $e3)
LDA #>FOO2 -> LDA #$03 load A with the value (upper byte of $03e3 = $03)

< gets the lower byte of a 2 byte expression
> gets the upper byte of a 2 byte expression

LDA #FOO2 -> produces an error. The assembler was expecting an 8-bit number.

Declaring variables.

.segment "ZEROPAGE"
foo: .res 1
bar: .res 2

Assuming this is the first thing the assembler sees… foo will be reserved 1 byte at address $00, and bar will be reserved 2 bytes at addresses $01 and $02.

LDA foo  -> LDA $00 load A from the 8-bit address $00
LDA bar  -> LDA $01 load A from the 8-bit address $01
LDA bar+1  -> LDA $02 load A from the 8-bit address ($01 + 1 = $02)

.segment  "BSS"
fooz: .res 1
baz: .res 2

As I described above, I’m defining the BSS section in the .cfg file as being from $300-$3ff. Therefore, the assembler will reserve 1 byte for fooz at $0300 and 2 bytes for baz at $0301 and $0302.

LDA fooz  -> LDA $0300 load A from the 16-bit address $0300
LDA baz  -> LDA $0301 load A from the 16-bit address $0301
LDA baz+1  -> LDA $0302 load A from the 16-bit address ($0301 + 1 = $0302)

Importantly, we don’t need to know what value is reserved when writing code. The assembler will keep track of the values and addresses of every label, you just have to reference them in your code using the labels/variable name.
* constants and variables should be defined at the very top of the ASM page.

Referencing ROM addresses in code, using labels.

.segment  "CODE"
LDA Table1
LDA Table1+1
...
Table1:
.byte $01, $02

(note, you could also put Table1 in the “RODATA” segment, if you like)

Let’s say that the assembler has calculated that Table1 will be at address $8050. This will assemble into…

LDA $8050 load A from the 16-bit address $8050
LDA $8051 load A from the 16-bit address $8051

when the program is RUNNING, the first line will load A with the value #01 and the second line will load A with the value #02…because those are the values in the ROM at 8050 and 8051.

OK, now that we understand how the labels work, let’s do some code…using C examples, and how to do it in ASM. There are 3 registers in the 6502. A, X, and Y.

foo = 3;

LDA #3  	load A with value 3
STA foo  	store A at address foo

or we could have used the other registers…

LDX #3  	load X with value 3
STX foo  	store X at address foo

or

LDY #3  	load Y with value 3
STY foo  	store X at address foo

There is no difference which register you use for this kind of thing.
bar = $31f; //a 16-bit value

From working with cc65, I now have a habit of using A for low bytes and X for high bytes (as the cc65 compiler tends to do)…

LDA #$1f 	load A with the value $1f
LDX #$03 	load X with the value 3
STA bar  	store A to address bar
STX bar+1 	store X to the address bar+1

we could also have done...
LDA #$1f 	load A with the value $1f
STA bar  	store A to address bar
LDA #$03 	load A with the value 3
sta bar+1 	store A to the address bar+1

baz = bar; //a 16-bit value

LDA bar  	load A from the address bar
LDX bar+1 	load X from the address bar+1
STA baz  	store A to the addres baz
STX baz+1 	store X to the address baz+1

again, we could have done...
LDA bar  	load A from the address bar
STA baz  	store A to the addres baz
LDA bar+1 	load A from the address bar+1
STA baz+1 	store A to the addres baz+1

Next thing…increment / decrement

++foo;

INC foo  add 1 to the value stored at foo

– -foo;

DEC foo  subtract 1 from the value stored at foo

you can also increment and decrement the X and Y registers

INX  add 1 to the X register

INY  add 1 to the Y register

DEX  subtract 1 from the X register

DEY  subtract 1 from the Y register

You will have to use adding/subtraction to ++ or – – the A register.
Which brings us to simple math…in fact very very simple math. The 6502 can only do addition and subtraction, and bit-shift multiplication. And, ONLY the A register can do math or bit-shifting.

Adding is always done ‘with carry’. The 6502 has certain FLAGS to assist math, and for doing comparisons. If the result of addition is > 255, then it sets the carry flag – in case you are doing 16-bit math (or more). If the result of addition is <= 255, the the carry flag is reset to zero. But, it always adds A + value + carry flag. Therefore, we must ‘clear the carry flag’ before addition. Here’s an example…

A reg. + value + carry flag = 
result now in A // carry flag
4+4+0  = A = 8, carry = 0
4+4+1  = A = 9, carry = 0
255+4+0  = A = 3, carry = 1
255+4+1 = A = 4, carry = 1

foo = fooz + 1; //8-bit only

LDA fooz 	load A from address fooz
CLC  		clear the carry flag
ADC #1  	add w carry A + value 1, the result is now in A
STA foo  	store A to the address foo

or, reverse them, get the same result…
foo = 1 + fooz; //8-bit only

LDA #1  	load A with value 1
CLC  		clear the carry flag
ADC fooz 	add w carry A + value at address fooz, result now in A
STA foo  	store A to the address foo

Let’s do a 16-bit example.

bar = baz + $315; 16-bit values

LDA baz  	load A from address baz
CLC  		clear the carry flag
ADC #$15 	add w carry A + value $15 (the low byte)
  if the result is > 255, the carry flag will be set, else reset to zero
STA bar  	store A to the address bar
LDA baz+1 	load A from the address baz+1
  ...notice, we don't clear the carry flag, we are using the
  carry flag result of the previous addition as part of this addition
ADC #$03 	add w carry A + value $03 (the high byte)
STA bar+1 	store A to the address bar+1

And, some subtraction. Like the ADC, subtraction always uses the carry flag, but in reverse. It’s called Subtract with Carry. You need to SET the carry flag before a SBC operation. If the result of subtraction underflows below 0, it will reset the carry flag to zero. Else, it will set the carry flag. Again, this is in case you want to do 16-bit (or more) math.

Here’s some examples…
! = NOT…ie, the opposite

A reg. - value - !carry flag = 
result now in A // carry flag
8-4-!1  = A = 4,  carry = 1
8-4-!0  = A = 3, carry = 1
4-5-!1  = A = 255, carry = 0
4-5-!0  = A = 254, carry = 0

foo = fooz – 1; //8-bit only

LDA fooz 	load A from address fooz
SEC  		set the carry flag
SBC #1  	subtract value 1 from A, result is now in A
STA foo  	store A to address foo

And the reverse, which is a different thing altogether...
foo = 1 - fooz; //8-bit only

LDA #1  	load A with value 1
SEC  		set the carry flag
SBC fooz 	subtract value at address fooz from A, result is now in A
STA foo  	store A to address foo

And a 16-bit example...
bar = baz - $315; //16-bit numbers

LDA baz  	load A from address baz
SEC  		set the carry flag
SBC #$15 	subtract value $15 from A, result is now in A
STA bar  	store A to address bar
LDA baz+1 	load A from addres baz+1
  ...notice, we DON'T set the carry flag. We are using the result
  of the last math to set/reset the carry flag.
SBC #$03 	subtract value 3 from A (and subtract !carry), result now in A
STA bar+1 	store A to the address bar+1


Stay tuned for many more ASM lessons to come.

↧

27. ASM part 2

March 13, 2016, 10:36 am

≫ Next: 28. ASM part 3

≪ Previous: 26. ASM Basics

Bit Shifting

The 6502 has 2 ways of shifting bits left and right. In these examples, I will number each bit…there are 8 bits, numbered 0-7. Only the accumulator (A register) can do bit shifts and bitwise operations. Also, you can do bit shifting to a RAM address, without affecting A.

LSR – shift right

zero -> 76543210 -> carry flag

ROR – roll right

old carry flag -> 76543210 -> new carry flag

ASL – shift left

carry flag <- 76543210 <- zero

ROL – roll left

new carry flag <- 76543210 <- old carry flag

LSR shifts all the bits right one position, and a zero goes into the highest bit. Effectively, this is the same as dividing by 2, with some rounding error.

zero -> 00010000
        00001000, carry = 0

zero -> 00001111
        00000111, carry = 1

ROR works the same as LSR, expect the old carry flag goes in rather than zero.

ASL shifts all the bits left one position, and a zero goes into the lowest bit. Effectively, this is the same as multiplying by 2. Right to left here…

   00010000 <- zero
<- 00100000
carry = 0

   11110000 <- zero
<- 11100000
carry = 1

ROL works the same as ASL, expect the old carry flag goes in rather than zero.

Now, some examples, using C programming examples to start with.

foo = bar << 2; // 8-bit numbers

LDA bar  load A from address bar
ASL A    bit-shift A left
ASL A    bit-shift A left
STA foo  store A to address foo

foo = bar >> 3; //8-bit numbers

LDA bar  load A from address bar
LSR A  bit-shift A right
LSR A  bit-shift A right
LSR A  bit-shift A right
STA foo  store A to address foo

And, here's some 16-bit examples.

foo = bar << 2; // 16-bit numbers

LDA bar+1  load A from address bar+1 (the high byte)
STA foo+1  store A to a address foo+1 (the high byte)
LDA bar    lda A from address bar (the low byte)
ASL A      bit-shift A left (high bit shifted into carry flag)
ROL foo+1  bit-shift left address 'foo+1', rolling that carry flag in
ASL A      bit-shift A left (high bit shifted into carry flag)
ROL foo+1  bit-shift left address 'foo+1', rolling that carry flag in
STA foo    store A to the address foo (the low byte)

foo = bar >> 3; //16-bit numbers

LDA bar   load A from address bar (the low byte)
STA foo   store A to the address foo (the low byte)
LDA bar+1 load A from the address bar+1 (the high byte)
LSR A     bit-shift A right (low bit shifted into carry flag)
ROR foo   bit shift right address 'foo', rolling that carry flag in
LSR A     ...
ROR foo
LSR A
ROR foo   ...3 times
STA foo+1 store A to the address foo+1 (the high byte).

Bitwise Operations.

AND, OR, and XOR…called here AND, ORA, and EOR. These things only work with the A register.

Here’s what AND does. bit by bit.

AND #value
A, AND value= result

0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1

AND only sets a bit if both bits in A and value are 1.

Example:

A      00010001
value  00000101
result 00000001

AND is used to isolate a single bit. The way I handle button presses, I roll them into joypad1. If I want to find out if the Left button is being pressed…I know that it is this bit of joypad1 00000010 ($02). Here’s how the code would usually go…

LEFT = $02 ;defined at the top of the page

LDA joypad1  load A from address joypad1
AND #LEFT    AND A with value 2, result now in A
BEQ :+       branch if the result is zero (to unnamed label)
             skipping this next line of code
JSR Left_Pressed jump to sub-routine handline left button presses
:            just a label

Another use for AND, is to ‘mask’ out certain bits. Let’s say, I have a peice of data, where the upper bit is a special flag, and the lower 7 bits is the data. If I want just the data, I would AND #$7f (01111111) to remove the upper-bit…

LDA data
AND #$7f

ORA (bitwise OR operation)

Here’s what ORA does. bit by bit.

ORA #value
A, ORA value= result
0 ORA 0 = 0
0 ORA 1 = 1
1 ORA 0 = 1
1 ORA 1 = 1

If either A or the value has a bit set, the result will have that bit set.

Example:
A      00010001
value  00000101
result 00010101

ORA is a way ensure that certain bits are set, without effecting the other bits (as math would do).

Music code is a good example. The left 4 bits control the sound. The right 4 bits volume. So, if you want to keep the ‘instrument’ the same, the first 4 bits would always be $C (for example), while the volume may change. So, you might store $c0 in variable ‘instrument’ and store the volume in variable ‘volume’. When you need to combine them, you would use ORA.

LDA instrument  instrument is $c0
ORA volume      volume is 0 - $0f
STA $4000       result stored to music register, address $4000.

EOR (exclusive OR operation), means one or the other, but not both.

EOR #value
A, EOR value= result
0 EOR 0 = 0
0 EOR 1 = 1
1 EOR 0 = 1
1 EOR 1 = 0

Example:
A      00010001
value  00000101
result 00010100

EOR is usually used to get the negative value of a number. Say you have -5 ($fb) and you want to turn it into 5, you EOR #$ff and add 1. The same for converting back to -5.

LDA foo   Let's say foo = $fb (-5)
EOR #$ff  A = 4 now
CLC
ADC #1    A = 5 now

and reverse works too…

LDA foo   Let's say foo = 5
EOR #$ff  A = $fa now
CLC
ADC #1    A = $fb now (-5)

TRANSFERRING registers

TAX A transfers to X
TXA X transfers to A
TAY A transfers to Y
TYA Y transfers to A

TXS X transferred to the stack pointer
TSX stack pointer transferred to X

This is the only way to access the stack pointer. Usually, the stack pointer is set to $ff at the start of the program and never thought of again.

LDX #$ff load X with value $ff
TXS      transfer to stack pointer

(the stack grows down from $ff)

More Stack Operations
PHA push A to the stack (and adjust the stack pointer -1)
PLA pull A (pop A) from the stack (and adjust the stack pointer +1)

PHP push the processor status to the stack (and stack pointer -1)
PLP pull the processor status from the stack (and stack pointer +1)

Unfortunately, you can’t push a few arguments to the stack, jump to a sub-routine and then use those numbers…not easily, at least. Because, the jump to sub-routine also pushes the return address to the stack, on top of your numbers. PHA and PLA can be used as a cheap local variable. But, be careful. If you’re inside a sub-routine and you PHA, and forget to PLA, your program will crash when it tries to pull the return address, and gets your PHA number instead.

A few more things…

NOP does nothing but wastes 2 cycles of CPU time
BRK a non-maskable interrupt…will jump the program to wherever the BRK vector tells it…this usually only happens if a big error has occured, as the machine code for BRK is #00…which indicates that the program has branched to an area of the ROM with nothing there.

And, next time we will go over jumping, branching, and comparison.

↧

28. ASM part 3

March 13, 2016, 11:30 am

≫ Next: 29. ASM part 4

≪ Previous: 27. ASM part 2

Welcome to part 3 of my 6502 ASM lessons.

Jumping, moves the execution of the program somewhere else.

Examples:

  LDA #5
  JMP Skip_Next_Line jump to label 'Skip_Next_Line'
  LDA #7             never does this
Skip_Next_Line:
  STA foo            A = 5, store A at address foo

Infinite_Loop:
  LDA #5
  JMP Infinite_Loop     jump to the label 'Infinite_Loop'
  STA foo               never does this

Indirect jumping

One way to control flow of a program is to have an array of addresses of various parts of the program, and jump to them indirectly.

LDA program_state load A from address program_state
ASL A             multiply by 2, since each address is 2 bytes long
TAX               transfer A to X
LDA ADDRESSES, X  load A from ADDRESSES + X (low byte of an address)
STA jump_address  store A at jump_address
LDA ADDRESSES+1, X load A from ADDRESSES+1 + X (high byte of an address)
STA jump_address+1 store A at jump_address+1
JMP (jump_address) jump to the address pointed to by jump_address

ADDRESSES:
.word FUNCTION1, FUNCTION2 
the assembler will replace these with the address of each label

FUNCTION1:
 ...

FUNCTION2:
 ...

*Warning, there is a bug related to indirect jumps. The first byte of the indirect jump address can’t be on the last byte of a page (such as $3ff). Rather than fetching the second byte from the next page $400, it will fetch the second byte from the same page $300. In the example above, jump_address can’t be located $xff. The addresses of Function1 and Function2 can be anywhere.

Sub-Routines

When you use JSR, you jump to the label, and save the return address on the stack. Once that sub-routine is complete, use RTS to return from where we were before. (the program will pull the return address from the stack)

LDA #2
JSR Multiply_16
STA foo         A=32, store A at address foo

...
Multiply_16:
  ASL A
  ASL A
  ASL A
  ASL A
  RTS

Branching

First lets review the 6502 processor status flags.

c = carry flag
z = zero flag
i = interrupt flag
d = decimal mode (a removed feature, not functioning on the NES)
v = overflow flag
n = negative flag

Most of these flags are useful for comparisons and flow control (branching). Here are some instructions that will set/clear various flags.

CLC = clear the carry flag
SEC = set the carry flag
CLI = clear the interrupt disable flag (allows IRQ interrupts to work)
SEI = set the interrupt disable flag (prevents IRQ interrupts)
CLD = clear decimal flag (set hexadecimal math)
SED = set demical flag (set decimal math) (does not work on the NES)
CLV = clear the overflow flag

It’s important to know why and when each flag is set (and which operations won’t set flags).

ADC, SBC sets the z, n, c, v flags flags
AND, ORA, EOR sets the z,n flags
ASL, LSR, ROR, ROL sets the z, n, c flags
BIT sets the z, n, v flags
CMP, CPX, CPY sets the z, n, c flags
DEC, DEX, DEY sets the z, n flags
INC, INX, INY sets the z, n flags
LDA, LDX, LDY sets the z, n flags
TAX, TXA, TAY, TYA sets the z, n flags
PLA sets z, n flags
JMP, JSR, RTS, and BRANCHES do not set any flags
STA, STX, STY do not set any flags
PHA and PHP do not set any flags
PLP changes ALL the flags…that’s what it’s supposed to do

Between the event that set a flag, and the logic that handles the flag, it is safe to store the value somewhere, and safe to branch/jump to another location.

*note: RTI will wipe all your flags, and replace them from a value stored in the stack. When an NMI or IRQ occur, it pushes the processor status and return address to the stack. RTI will return from these interrupts, and it will restore processor status flags (but not the A,X,Y register values).

COMPARISONS (using flags to branch)

CMP compares A to a value
CPX compares X to a value
CPY compares Y to a value

Comparisons work as if a subtraction happened, but without changing the value of A. So think of CMP #5 as SEC,SBC #5. (Also, you don’t need to SEC before CMP.)
If the result is zero, z = 1, else z = 0.
If the result is negative, n = 1, else n = 0.
If the A < value, c = 0. If A >= value, c = 1.
OK, now some branching examples.

LDA foo
CMP bar   does foo = bar ?
BEQ They_are_equal 
 if zero flag set, branch to They_are_equal
BNE They_are_not_equal 
 if zero flag not set, branch to They_are_not_equal

They_are_equal:
  ...
  JMP Next_code

They_are_not_equal:
  ...

Next_code:

LDA foo
CMP #1   does foo = 1 ?
BEQ Foo_Is_One  
 if zero flag set, branch to Foo_Is_One
BNE Foo_Is_Not_One 
 if zero flag not set, branch to Foo_Is_Not_One

Foo_Is_One:
  ...
  JMP Next_code

Foo_Is_Not_One:
  ...

Next_code:

Also, we can use BEQ/BNE to test if a value is zero, because LDA/LDX/LDY sets the zero flag if the value being loaded is zero.

LDA foo     if foo = 0, zero flag set
BEQ Foo_is_zero
BNE Foo_is_not_zero

Foo_is_zero:
  ...
  JMP Next_code

Foo_is_not_zero:
  ...

Next_code:

I discourage the use of CMP with BMI and BPL. You should use BCC and BCS for > < comparisons. Here’s an example without CMP.

*note $80-ff are considered negative. $0-$7f are considered positive. Look at them in binary…
$80 = 10000000
$7f = 01111111
So, if the upper bit = 1, it’s considered negative. If 0, positive.

LDA foo   if foo = negative, n flag set
BMI Foo_is_negative  branch if n flag set
BPL Foo_is_positive  branch if n flag not set

Foo_is_negative:
  ...
  JMP Next_code

Foo_is_positive:
  ...

Next_code:

Comparisons. BCC is equivalent to ‘Branch if Less Than’. BCS is equivalent to ‘Branch if Greater Than or Equal’.

(if foo < 40)…branch

LDA foo
CMP #40
BCC Somewhere branch if foo < 40

(if foo <= 40)…branch

LDA foo
CMP #40
BCC Somewhere branch if foo < 40
BEQ Somewhere branch if foo = 40

or…

LDA foo
CMP #41
BCC Somewhere branch if foo < 41

or…reverse them

LDA #40
CMP foo
BCS Somewhere branch if 40 >= foo

(if foo >= 40)…branch

LDA foo
CMP #40
BCS Somewhere branch if foo >= 40

(if foo > 40)…branch

LDA foo
CMP #41
BCS Somewhere branch if foo >= 41

or…reverse them…

lda #40
CMP foo
BCC Somewhere branch if 40 < foo

And, CPX for X register. CPY for Y register. They work the same as CMP…
LDX foo
CPX #41
BCS Somewhere branch if foo >= 41

More about BCC and BCS

There are many, many more uses for BCC and BCS.

Let’s say, you want to add numbers, but if result > 255, you want to force it to stay at 255. This works because, if the result of ADC is over 255, the carry flag is set.

LDA foo
CLC
ADC #5   ;carry will only be set if result > 255
BCC Still_Under_256 ;branch if carry clear
  LDA #255  
Still_Under_256:

Similarly, say you want to subtract, but if the result < 0, you want to keep it at zero.

LDA foo
SEC
SBC #5
BCS Still_Zero_Or_More
LDA #0
Still_Zero_Or_More:

You can also use BCC and BCS with ASL/LSR, ROL/ROR. Maybe, you want to use LSR as a modulo 2…to see if a number is even or odd.

LDA foo
LSR A   ;bit-shift right, rightmost bit goes into carry flag
BCC Foo_is_Even
BCS Foo_is_Odd

Foo_is_Even:
  ...
  JMP Next_Code
Foo_is_Odd:
  ...
Next_Code:

One more kind of comparison…

BIT, test a memory without affecting any register A,X,Y.
8-bits are like…(76543210)
bit 7 goes to n (negative flag)
bit 6 goes to v (overflow flag)

example:
BIT foo
BMI foo_is_negative
*final note:you can only branch +127 or -128 bytes (relative to the byte after the branch instruction. Any more and the assembler will give you branch-out-of-range errors. The standard solution is to replace those long branches with the opposite branch, and a jump to the label.

BEQ label
…over 127 bytes of code…error, too far
label:
……………………………….replace it with…

BNE :+
JMP label
:
…
label:
: is an unnamed label. :+ means branch forward to the next unnamed label. :- means branch backwards to the next unnamed label.

↧

29. ASM part 4

March 13, 2016, 12:10 pm

≫ Next: 30. ASM part 5

≪ Previous: 28. ASM part 3

Yet another 6502 ASM lesson.

Arrays

The way to access arrays in 6502 ASM is to use indexed addresses. The X (or Y) register is used as the indexer. As usual, X=0 will get the first byte of the array.

LDX #0         load X with value 0
LDA Array1, X  will load A from the address Array + X
STA foo        A = $3f, store A at foo
...

Array1:
.byte $3f, $4f, $5f, $6f

*Warning, if your array address is in the zero page, and your index would put the address in the next page, it won’t fetch from the $100 page, but rather from zero-page.

This is a bug of zero-page indexing on the 6502 processor. If you absolutely must put an array half in the zero-page and half out (I don’t know why you would), you can force the assembler to use an ‘absolute address’…ie. a 16-bit address, like this…

LDA a:Array2, X ;this will correctly get the byte from the $100 page

Here’s another array example using the Y register.

LDY #1
LDA Array1, Y

And, you can use STA the same way…to fill an array.
LDA #1
LDX #5
STA Array1, X store the value 1, at the address ‘Array1’ + 5

Loops

Loops are fairly easy…

for (X = 0;X < 50; X++)

  LDX #0
Loop:
  ...some code...
  INX 
  CPX #50   compare X to 50
  BNE Loop  not 50, branch back to Loop

It can also be done like this…

  LDX #50
Loop:
  ...some code...
  DEX         X--, if result = 0, sets zero flag
  BNE Loop    if no zero flag, branch back to Loop

Bigger Loop, if you need a loop bigger than 256

  LDY #4
  LDX #0
Loop:
  ...some code...
  DEX
  BNE Loop
  DEY
  BNE Loop  Will loop 1024 times

Way bigger than anyone will ever need Loop, just for fun…

  LDA #5
  STA counter
  LDY #0
  LDX #0
Loop:
  ...some code...
  DEX
  BNE Loop
  DEY
  BNE Loop
  DEC counter
  BNE Loop  256*256*5 = 327680 times

Indirect Indexing

LDA (ZP_address,X)
LDA (ZP_address),Y
The first…(ZP_address, X)…I never use, and I don’t like it, so I’m going to skip it altogether. Sorry. I’ve never seen any code that uses it.

The second…(ZP_address), Y…is very useful. It’s the 6502 equivalent of a pointer. You store an address in the zero-page, and you can access the data at the address that it points to…or index from that address with the Y register.

pointer = 2 zero-page addresses reserved

LDA #<SOME_ARRAY
STA pointer
LDA #>SOME_ARRAY
STA pointer+1
LDY #0
LDA (pointer), y load a from address pointer is pointing to...SOME_ARRAY
  A = $5e
LDY #1
LDA (pointer), y load a from address pointer is pointing to plus Y...SOME_ARRAY + 1, A = $7f
SOME_ARRAY:
.byte $5e, $7f

Let’s say you have multiple rooms in the game, and you want to load the graphics for room #3. So, you index to a list of addresses of each room’s data, and store the address in the zero-page, and now you can indirect index from that address using the Y register as the indexer. In this example, pointer and pointer+1 are zero-page addresses.

  LDA room  room = 3
  ASL A     we multiply by 2, because each address is 2 bytes long
  TAX       transfer A to X
  LDA ADDRESSES, X load A with the low byte of the room address
  STA pointer  store A in the zero-page RAM
  LDA ADDRESSES+1, X load A with the high byte of the room address
  STA pointer+1  store A in the zero-page RAM
  LDY #0
LOOP:
  LDA (pointer), Y load A with the fist byte of the array Room3
  STA somewhere, Y Maybe we store this data to another array, for parsing later
  CMP #$ff         let's say, the data set is terminated with $ff
  BEQ EXIT_LOOP    if = $ff, leave this loop
  INY
  BNE LOOP         it will keep looping for 256 bytes, 
                   when Y wraps around to zero

EXIT_LOOP:

ADDRESSES:
.word Room0, Room1, Room2, Room3 
the assembler will replace these with the addresses of each label.

Room0:
...data for room0
Room1:
...data for room1
Room2:
...data for room2
Room3:
...data for room3

Multiple-condition If/then statements…some more examples.

if ((foo == 0)&&(bar < 20))…do code if both true

LDA foo   load A from address foo, sets a few flags, zero flag if = 0
BNE Skip_Ahead  skip the code if foo != 0
LDA bar   load A from address, bar
CMP #20   compare to value 20
BCS Skip_Ahead  skip the code if bar >= 20
...   some code here

Skip_Ahead:

if ((foo == 0) || (bar < 20))…do code if either true

  LDA foo
  BNE Check_Bar  skip if foo != 0, but also check bar
Do_Code:
  ...
  JMP Ahead
Check_Bar:
  LDA bar
  CMP #20
  BCC Do_Code  branch to Do_Code if bar < 20

Ahead:

↧

30. ASM part 5

March 13, 2016, 1:18 pm

≫ Next: Update – Games I’m Working On

≪ Previous: 29. ASM part 4

Probably the final 6502 ASM lesson. I’m going to try to cover everything I forgot.

Switch (foo){
case 0:
…
break;
case 1:
…
break;
case 2:
…
}

Let’s say we have a variable ‘state’ that if state = 0, we do one thing. If state = 1, we do another thing. Etc. Generally, it would be handled like this…

  LDA foo      load A from address foo, will set a zero flag if foo = 0
  BNE Check_1  branch ahead if foo != 0
  ...
  JMP Done     break

Check_1:       A is still loaded with value from foo
  CMP #1       compare to value 1, sets zero flag if foo = 1
  BNE Check_2  branch ahead if foo != 1
  ...
  JMP Done     break

Check_2:       A is still loaded with value from foo
  CMP #2       compare to value 2, sets zero flag if foo = 2
  BNE Done     branch ahead if foo != 2
  ...

Done:

Comments are done with a ; in ca65

;this is a comment

You add additional ASM source code like this…

.include “Second_ASM_File.asm”

You add binary files like this…

.incbin “Something.bin”

Often, you put a binary file just below a label, so you can index it from the label.

Label_Name:
.incbin “Something.bin”

You might wonder why I never add the lines .P02 (to set to 6502 mode) or -t nes (to set the target as ‘NES’), and the answer is…it makes no difference. The default mode of ca65 assembles exactly how I want, so I don’t bother.

I skipped over the V flag (overflow). The V flag is only set by ADC and SBC (and BIT). This is a way to treat the numbers as signed -128 to +127.

For ADC, the V flag is only set if 2 positive numbers add together to get a negative, or if 2 negative numbers add together to get a positive. Any other ADC operation will clear the V flag.

For SBC, the V flag is only set if Pos-Neg = Neg…or if Neg-Pos = Pos. All other SBC operations will clear the V flag.

What you do with the result is up to you, but you have BVC (branch if V clear) and BVS (branch if V set) to help you decide.

MULTIPLICATION/DIVISION

I went and wrote several routines that would do these as efficiently as I could think…and then I found these webpages, which do the same thing about 10x faster.

http://6502org.wikidot.com/software-math-intmul

http://6502org.wikidot.com/software-math-intdiv

I’ve tried them out. They work great.

OH, and before I forget, I found another webpage with an online assembler, that you can test out code.

https://skilldrick.github.io/easy6502/

I can’t think of anything else at this time, but you can look at these resources for more infomation…

http://www.6502.org/

http://wiki.nesdev.com/w/index.php/Programming_guide

http://wiki.nesdev.com/w/images/7/76/Programmanual.pdf

The last one has lots of info about 65C02 and 65816 processors too, but if you scroll down to Chapter 18 (p.326) it will describe how all the instructions work. Most of them are relevant to 6502 programming also.

↧

Update – Games I’m Working On

May 5, 2016, 3:50 pm

≫ Next: Update – Feb 2017

≪ Previous: 30. ASM part 5

Hello,

One day, I hope to rewrite most of the code on this blog, but I always seem to be too busy working on various projects, and life.

I have been working on my Vigilante Ninja 2 game, off and on, and a few other games.

I thought I’d post the latest…level 3 demo.

http://dl.dropboxusercontent.com/s/qrcr6wnllzz8yji/VN2_L3.zip

Here’s some screen shots…

vig32-2

vig32-3

Hopefully, I can get all my current NES projects done this year (2016).

↧

Update – Feb 2017

February 9, 2017, 1:38 pm

≫ Next: My Neslib Notes

≪ Previous: Update – Games I’m Working On

I edited every cfg. file to include a “ONCE” segment, so that they will compile with the latest version of cc65.

I also included a makefile in every lesson folder, for Linux users, or people that prefer to use Gnu Make over .bat files. Actually, Linux users will have to edit the makefile slightly. Just uncomment the rm *.o lines and comment the del *.o lines.

In other news, Shiru has updated the NES Screen Tool, to allow non standard nametable sizes (ie, larger ones for scrolling games). I’m not sure if he fixed the RLE encoder bug, recently discussed on the nesdev forum.

I submitted 2 entries into the 2016 nesdev game competition. Here they are…

flappy14

Flappy Jack

http://dl.dropboxusercontent.com/s/k3tpdfr5rj11e7s/flappy14.nes

rock1

Rock Paper Scissors (and Rock Paper Scissors Lizard Spock…I mean Sbock).

http://dl.dropboxusercontent.com/s/gfu7usgpxdr6xlg/Rock3.nes

Interesting side note…neither of these games were programmed in C. Both were written in ASM for asm6.

And, I’m working on a top secret project right now, that will probably be sold in stores. Hopefully. No, not the ninja game…I’ve shoved that on the back burner for a bit.

Also, for more cc65 source code, visit the Mojon Twins website (caution, NSFW content). Their games are usually open source.

http://www.mojontwins.com/

↧

My Neslib Notes

April 12, 2017, 5:48 pm

≫ Next: Neslib Example Code

≪ Previous: Update – Feb 2017

Shiru wrote the neslib code, for NES development. These are all my detailed notes on how everything works. I will be adding example code, a little later. I mostly use a slightly modified version of the neslib from

http://shiru.untergrund.net/files/nes/chase.zip

And, here again is the example code

http://shiru.untergrund.net/files/src/cc65_nes_examples.zip

And this link has a version of neslib that works with the most recent version of cc65 (as of 2016) version 2.15

http://forums.nesdev.com/viewtopic.php?p=154078#p154078

pal_all(const char *data);

const unsigned char game_palette[]={…} // define a 32 byte array of chars
pal_all(game_palette);

-pass a pointer to a 32 byte full palette
-it will copy 32 bytes from there to a buffer
-can be done any time, this only updates during v-blank

pal_bg(bg_palette); // 16 bytes only, background

pal_spr(sprite_palette); // 16 bytes only, sprites
-same as pal_all, but 16 bytes
pal_col(unsigned char index,unsigned char color);
-sets only 1 color in any palette, BG or Sprite
-can be done any time, this only updates during v-blank
-index = 0 – 31 (0-15 bg, 16-31 sprite)

#define RED 0x16
pal_col(0, RED); // would set the background color red
pal_col(0, 0x30); // would set the background color white = 0x30

pal_col() might be useful for rotating colors (SMB coins), or blinking a sprite
NOTE: palette buffer is set at 0x1c0-0x1df in example code
PAL_BUF =$01c0, defined somewhere in crt0.s
-this is in the hardware stack. If subroutine calls are more than 16 deep, it will start to overwrite the buffer, possibly causing wrong colors or game crashing

pal_clear(void); // just sets all colors to black, can be done any time

pal_bright(unsigned char bright); // brightens or darkens all the colors
– 0-8, 4 = normal, 3 2 1 darker, 5 6 7 lighter
– 0 is black, 4 is normal, 8 is white
pal_bright(4); // normal

NOTE: pal_bright() must be called at least once during init (and it is, in crt0.s). It sets a pointer to colors that needs to be set for the palette update to work.

Shiru has a fading function in the Chase source code game.c

void pal_fade_to(unsigned to)
{
  if(!to) music_stop();
  while(bright!=to)
  {
    delay(4);
    if(bright<to) ++bright;
    else --bright;
    pal_bright(bright);
  }
  if(!bright)
  {
    ppu_off();
    set_vram_update(NULL);
    scroll(0,0);
  }
}

pal_spr_bright(unsigned char bright);
-sets sprite brightness only

pal_bg_bright(unsigned char bright); -sets BG brightness , use 0-8, same as pal_bright()

ppu_wait_nmi(void);
-wait for next frame

ppu_wait_frame(void);
-it waits an extra frame every 5 frames, for NTSC TVs
-do not use this, I removed it
-potentially buggy with split screens

ppu_off(void); // turns off screen

ppu_on_all(void); // turns sprites and BG back on

ppu_on_bg(void); // only turns BG on, doesn’t affect sprites
ppu_on_spr(void); // only turns sprites on, doesn’t affect bg

ppu_mask(unsigned char mask); // sets the 2001 register manually, see nesdev wiki
-could be used to set color emphasis or grayscale modes

ppu_mask(0x1e); // normal, screen on
ppu_mask(0x1f); // grayscale mode, screen on
ppu_mask(0xfe); // screen on, all color emphasis bits set, darkening the screen

ppu_system(void); // returns 0 for PAL, !0 for NTSC

-during init, it does some timed code, and it figures out what kind of TV system is running. This is a way to access that information, if you want to have it programmed differently for each type of TV
-use like…
a = ppu_system();

oam_clear(void); // clears the OAM buffer, making all sprites disappear

OAM_BUF =$0200, defined somewhere in crt0.s

oam_size(unsigned char size); // sets sprite size to 8×8 or 8×16 mode

oam_size(0); // 8×8 mode
oam_size(1); // 8×16 mode

NOTE: at the start of each loop, set sprid to 0
sprid = 0; , then every time you push a sprite to the OAM buffer, it returns the next index value (sprid)

oam_spr(unsigned char x,unsigned char y,unsigned char chrnum,unsigned char attr,unsigned char sprid);
-returns sprid (the current index to the OAM buffer)
-sprid is the number of sprites in the buffer times 4 (4 bytes per sprite)

sprid = oam_spr(1,2,3,0,sprid);
-this will put a sprite at X=1,Y=2, use tile #3, palette #0, and we’re using sprid to keep track of the index into the buffer

sprid = oam_spr (1,2,3,0|OAM_FLIP_H,sprid); // the same, but flip the sprite horizontally
sprid = oam_spr (1,2,3,0|OAM_FLIP_V,sprid); // the same, but flip the sprite vertically
sprid = oam_spr (1,2,3,0|OAM_FLIP_H|OAM_FLIP_V,sprid); // the same, but flip the sprite horizontally and vertically
sprid = oam_spr (1,2,3,0|OAM_BEHIND,sprid); // the sprite will be behind the background, but in front of the universal background color (the very first bg palette entry)

oam_meta_spr(unsigned char x,unsigned char y,unsigned char sprid,const unsigned char *data);
-returns sprid (the current index to the OAM buffer)
-sprid is the number of sprites in the buffer times 4 (4 bytes per sprite)

sprid = oam_meta_spr(1,2,sprid, metasprite1)

metasprite1[] = …; // definition of the metasprite, array of chars

A metasprite is a collection of sprites
-you can’t flip it so easily
-you can make a metasprite with nes screen tool
-it’s an array of 4 bytes per tile =
-x offset, y offset, tile, attribute (per tile palette/flip)
-you have to pass a pointer to this data array
-the data set needs to terminate in 128 (0x80)
-during each loop (frame) you will be pushing sprites to the OAM buffer
-they will automatically go to the OAM during v-blank (part of nmi code)

oam_hide_rest(unsigned char sprid);
-pushes the rest of the sprites off screen
-do at the end of each loop

-necessary, if you don’t clear the sprites at the beginning of each loop
-if # of sprites on screen is exactly 64, the sprid value would wrap around to 0, and this function would accidentally push all your sprites off screen (passing 0 will push all sprites off screen)
-if for some reason you pass a value not divisible by 4 (like 3), this function would crash the game in an infinite loop
-it might be safer, then, to just use oam_clear() at the start of each loop, and never call oam_hide_rest()

music_play(unsigned char song); // send it a song number, it sets a pointer to the start of the song, will play automatically, updated during v-blank
music_play(0); // plays song #0

music_stop(void); // stops the song, must do music_play() to start again, which will start the beginning of the song

music_pause(unsigned char pause); // pauses a song, and unpauses a song at the point you paused it

music_pause(1); // pause
music_pause(0); // unpause

sfx_play(unsigned char sound,unsigned char channel); // sets a pointer to the start of a sound fx, which will auto-play

sfx_play(0, 0); // plays sound effect #0, priority #0

channel 3 has priority over 2,,,,,, 3 > 2 > 1 > 0. If 2 sound effects conflict, the higher priority will play.

sample_play(unsigned char sample); // play a DMC sound effect

sample_play(0); // play DMC sample #0

pad_poll(unsigned char pad);
-reads a controller
-have to send it a 0 or 1, one for each controller
-do this once per frame
pad1 = pad_poll(0); // reads contoller #1, store in pad1
pad2 = pad_poll(1); // reads contoller #2, store in pad2

pad_trigger(unsigned char pad); // only gets new button presses, not if held

a = pad_trigger(0); // read controller #1, return only if new press this frame
b = pad_trigger(1); // read controller #2, return only if new press this frame

-this actually calls pad_poll(), but returns only new presses, not buttons held

pad_state(unsigned char pad);
-get last poll without polling again
-do pad_poll() first, every frame
-this is so you have a consistent value all frame
-can do this multiple times per frame and will still get the same info

pad1 = pad_state(0); // controller #1, get last poll
pad2 = pad_state(1); // controller #2, get last poll

NOTE: button definitions are opposite of the ones I’ve used, because they are stored with a shift right rather than shift left

// scrolling //
It is expected that you have 2 int’s defined (2 bytes each), ScrollX and ScrollY.
You need to manually keep them from 0 to 0x01ff (0x01df for y, there are only 240 scanlines, not 256)
In example code 9, shiru does this

– -y;

if(y<0) y=240*2-1; // keep Y within the total height of two nametables

scroll(unsigned int x,unsigned int y);
-sets the x and y scroll. can do any time, the numbers don’t go to the 2005 registers till next v-blank
-the upper bit changes the base nametable, register 2000 (during the next v-blank)
-assuming you have mirroring set correctly, it will scroll into the next nametable.

scroll(scroll_X,scroll_Y);

split(unsigned int x,unsigned int y);
-waits for sprite zero hit, then changes the x scroll
-will only work if you have a sprite currently in the OAM at the zero position, and it’s somewhere on-screen with a non-transparent portion overlapping the non-transparent portion of a BG tile.

-i’m not sure why it asks for y, since it doesn’t change the y scroll
-it’s actually very hard to do a mid-screen y scroll change, so this is probably for the best
-warning: all CPU time between the function call and the actual split point will be wasted!
-don’t use ppu_wait_frame() with this, you might have glitches

Tile banks

-there are 2 sets of 256 tiles loaded to the ppu, ppu addresses 0-0x1fff
-sprites and bg can freely choose which tileset to use, or even both use the same set

bank_spr(unsigned char n); // which set of tiles for sprites

bank_spr(0); // use the first set of tiles
bank_spr(1); // use the second set of tiles

bank_bg(unsigned char n); // which set of tiles for background

bank_bg(0); // use the first set of tiles
bank_bg(1); // use the second set of tiles

rand8(void); // get a random number 0-255
a = rand8(); // a is char

rand16(void); // get a random number 0-65535
a = rand16(); // a is int

set_rand(unsigned int seed); // send an int (2 bytes) to seed the rng

-note, crt0 init code auto sets the seed to 0xfdfd
-you might want to use another seeding method, if randomness is important, like checking FRAME_CNT1 at the time of START pressed on title screen

set_vram_update(unsigned char *buf);
-sets a pointer to an array (a VRAM update buffer, somewhere in the RAM)
-when rendering is ON, this is how BG updates are made

usage…
set_vram_update(Some_ROM_Array); // sets a pointer to the data in ROM

(or)

memcpy(update_list,updateListData,sizeof(updateListData));
– copies data from ROM to a buffer, the buffer is called ‘update_list’
set_vram_update(update_list); // sets a pointer, and a flag to auto-update during the next v-blank

also…
set_vram_update(NULL);
-to disable updates, call this function with NULL pointer

The vram buffer should be filled like this…

Non-sequential:
-non-sequential means it will set a PPU address, then write 1 byte
-MSB, LSB, 1 byte data, repeat
-sequence terminated in 0xff (NT_UPD_EOF)

MSB = high byte of PPU address
LSB = low byte of PPU address

Sequential:
-sequential means it will set a PPU address, then write more than 1 byte to the ppu
-left to right (or) top to bottom
-MSB|NT_UPD_HORZ, LSB, # of bytes, a list of the bytes, repeat
or
-MSB|NT_UPD_VERT, LSB, # of bytes, a list of the bytes, repeat
-NT_UPD_HORZ, means it will write left to right, wrapping around to the next line
-NT_UPD_VERT, means is will write top to bottom, but a new address needs to be set after it reaches the bottom of the screen, as it will never wrap to the next column over
-sequence terminated in 0xff (NT_UPD_EOF)

#define NT_UPD_HORZ 0x40 = sequential
#define NT_UPD_VERT 0x80 = sequential
#define NT_UPD_EOF 0xff

Example of 4 sequential writes, left to right, starting at screen position x=1,y=2
tile #’s are 5,6,7,8
{
MSB(NTADR_A(1,2))|NT_UPD_HORZ,
LSB(NTADR_A(1,2)),
4, // 4 writes
5,6,7,8, // tile #’s
NT_UPD_EOF
};

Interestingly, it will continually write the same data, every v-blank, unless you send a NULL pointer like this…
set_vram_update(NULL);
…though, it may not make much difference.
The data set (aka vram buffer) must not be > 256 bytes, including the ff at the end of the data, and should not push more than…I don’t know, maybe * bytes of data to the ppu, since this happens during v-blank and not during rendering off, time is very very limited.

* Max v-ram changes per frame, with rendering on, before BAD THINGS start to happen…

sequential max = 97 (no palette change this frame),
74 (w palette change this frame)

non-sequential max = 40 (no palette change this frame),
31 (w palette change this frame)

the buffer only needs to be…
3 * 40 + 1 = 121 bytes in size
…as it can’t push more bytes than that, during v-blank.

(this hasn’t been tested on hardware, only FCEUX)

// all following vram functions only work when display is disabled

vram_adr(unsigned int adr);
-sets a PPU address
(sets a start point in the background for writing tiles)
vram_adr(NAMETABLE_A); // start at the top left of the screen
vram_adr(NTADR_A(x,y));
vram_adr(NTADR_A(5,6)); // sets a start position x=5,y=6

vram_put(unsigned char n); // puts 1 byte there
-use vram_adr(); first
vram_put(6); // push tile # 6 to screen

vram_fill(unsigned char n,unsigned int len); // repeat same tile * LEN
-use vram_adr(); first
-might have to use vram_inc(); first (see below)
vram_fill(1, 0x200); // tile # 1 pushed 512 times

vram_inc(unsigned char n); // mode of ppu
vram_inc(0); // data gets pushed into vram left to right (wraping to next line)
vram_inc(1); // data gets pushed into vram top to bottom (only works for 1 column (30 bytes), then you have to set another address).
-do this BEFORE writing to the screen, if you need to change directions

vram_read(unsigned char *dst,unsigned int size);
-reads a byte from vram
-use vram_adr(); first
-dst is where in RAM you will be storing this data from the ppu, size is how many bytes

vram_read(0x300, 2); // read 2 bytes from vram, write to RAM 0x300

NOTE, don’t read from the palette, just use the palette buffer at 0x1c0

vram_write(unsigned char *src,unsigned int size);
-write some bytes to the vram
-use vram_adr(); first
-src is a pointer to the data you are writing to the ppu
-size is how many bytes to write

vram_write(0x300, 2); // write 2 bytes to vram, from RAM 0x300
vram_write(TEXT,sizeof(TEXT)); // TEXT[] is an array of bytes to write to vram.
(For some reason this gave me an error, passing just an array name, had to cast to char * pointer)
vram_write((unsigned char*)TEXT,sizeof(TEXT));

vram_unrle(const unsigned char *data);
-pass it a pointer to the RLE data, and it will push it all to the PPU.
-this unpacks compressed data to the vram
-this is what you should actually use…this is what NES screen tool outputs best.
vram_unrle(titleRLE);

usage:
-first, disable rendering, ppu_off();
-set vram_inc(0) and vram_adr()
-wait for start of frame, with ppu_wait_nmi();
-call vram_unrle();
-then turn rendering back on, ppu_on_all()
-only load 1 nametable worth of data, per frame

NOTE:
-nmi is turned on in init, and never comes off

memcpy(void *dst,void *src,unsigned int len);
-moves data from one place to another…usually from ROM to RAM

memcpy(update_list,updateListData,sizeof(updateListData));

memfill(void *dst,unsigned char value,unsigned int len);
-fill memory with a value

memfill(0x200, 0, 0x100);
-to fill 0x200-0x2ff with zero…that is 0x100 bytes worth of filling

delay(unsigned char frames); // waits a # of frames

delay(5); // wait 5 frames

TECHNICAL NOTES, ON ASM BITS IN NESLIB.S:
-vram (besides the palette) is only updated if VRAM_UPDATE + NAME_UPD_ENABLE are set…
-ppu_wait_frame (or) ppu_wait_nmi, sets ‘UPDATE’
-set_vram_update, sets ‘ENABLE’
-set_vram_update(0); disables the vram ‘UPDATE’
-I guess you can’t set a pointer to the zero page address 0x0000, or it will never update.
-music only plays if FT_SONG_SPEED is set, play sets it, stop resets it, pause sets it to negative (ORA #$80), unpause clears that bit

↧

Neslib Example Code

April 12, 2017, 6:20 pm

≫ Next: NES Screen Tool BMP Import

≪ Previous: My Neslib Notes

I thought this would take me 5 minutes. Boy was I wrong. Here’s some examples on neslib use for NES development. I’ve made some changes, that will probably annoy everyone. Sorry.

I changed the cfg, moving the symbols to crt0.s, adding ONCE segment
I changed PAD_STATE to _PAD_STATE in crt0.s (etc), so I can access it from c code as
extern unsigned char PAD_STATE;
I made slight changes to neslib.s (notably removing _ppu_wait_frame as potentially buggy with split screen effects)

The first example is a simple “Hello World”. Note, I have arrays of chars called “PALETTE” and “TEXT”. As stated in an earlier blog post, I have created a tile set that positions the letters and numbers exactly in their ASCII position, so I can reference them with actual letters in the code “Hello World!”.

This is how to write to the screen with rendering OFF. Set an address, vram_adr(), then write, vram_write().

void main (void) {

 // rendering is disabled at the startup
 // the init code set the palette brightness to
 // pal_bright(4); // normal

 // load the palette
 pal_bg(PALETTE);

 // load the text
 // vram_adr(NTADR_A(x,y));
 vram_adr(NTADR_A(10,14)); // screen is 32 x 30 tiles
 // this sets a start position on the BG, where to draw the text, left to right

 vram_write((unsigned char*)TEXT,sizeof(TEXT));
 // this draws the array to the screen
 // this function only works with rendering off, and should come after vram_adr()

 // normally, I would reset the scroll position
 // but the next function waits till v-blank and scroll is set automatically in the nmi routine
 // since the RAM was blanked to 0 in init code, scroll variables will be x = 0, y = 0

 // turn on screen
 ppu_on_all();

 // infinite loop
 while (1){

 // game code will go here.

 }
};

Pretty straightforward way to write to the screen with rendering off. Here’s the source code.

http://dl.dropboxusercontent.com/s/5p8o0umed5k10r5/lesson21.zip

lesson1

Part 2

This “Hello World” writes 1 letter at a time, then blanks it, and starts over. This is how to write to the screen when rendering is ON.

In this example, I am writing to the screen in 2 different ways, with rendering on. With rendering on, you will write data to a buffer, and set a pointer to that data. The nmi code will automatically push the data to the screen during v-blank.

Both of them are examples of set_vram_update(). The first, non-sequential data. This will write the letter A and the letter B at different screen locations.

// example of non-sequential vram data
const unsigned char TWOLETTERS[]={
MSB(NTADR_A(10,17)),
LSB(NTADR_A(10,17)),
‘A’,
MSB(NTADR_A(18,5)),
LSB(NTADR_A(18,5)),
‘B’,
NT_UPD_EOF}; // data must end in EOF

The second, using the CLEAR array, is a sequential data set. It will write 12 zeros to the screen, covering over the “Hello World!” when the loop ends. NT_UPD_HORZ or NT_UPD_VERT is required to tell the vram update to go sequentially.

// example of sequential vram data
const unsigned char CLEAR[]={
MSB(NTADR_A(10,14))|NT_UPD_HORZ, // where to write, repeat horizontally
LSB(NTADR_A(10,14)),
12, // length of write
0,0,0,0, // what to write there
0,0,0,0, // data needs to be exactly the size of length
0,0,0,0,
NT_UPD_EOF}; // data must end in EOF

And, I’m actually constructing a data set on the fly, when pushing letters of “Hello World!” one at a time to the screen.

v_ram_buffer[0] = high;
v_ram_buffer[1] = low;
data = TEXT[text_Position]; // get 1 letter of the text
v_ram_buffer[2] = data;
v_ram_buffer[3] = NT_UPD_EOF;

This is also an example of delay(), which waits a certain number of frames, before moving to the next line. Here’s the main code (with some comments edited out).

void main (void) {

 // load the palette
 pal_bg(PALETTE);

 // set some initial values
 text_Position = 0;

 // turn on screen
 ppu_on_all();

 // load some non-sequential vram data, during rendering
 memcpy(v_ram_buffer,TWOLETTERS,sizeof(TWOLETTERS)); // copy from the ROM to the RAM
 set_vram_update(v_ram_buffer); // this just sets a pointer to the data, and sets a flag to draw it next v-blank
 // works only when NMI is on

 // infinite loop
 while (1){

    delay(30); // wait 30 frames = 0.5 seconds

    address = NTADR_A(10,14) + text_Position; // 2 bytes wide
    high = (char)(address >> 8); // get just the upper byte
    low = (char)(address & 0xff); // get just the lower byte

    v_ram_buffer[0] = high;
    v_ram_buffer[1] = low;

    data = TEXT[text_Position]; // get 1 letter of the text
    v_ram_buffer[2] = data;

    v_ram_buffer[3] = NT_UPD_EOF;

    ++text_Position;

    if (text_Position >= sizeof(TEXT)){
      text_Position = 0;
      ppu_wait_frame();
      memcpy(v_ram_buffer,CLEAR,sizeof(CLEAR)); // if at end, clear screen
      // by overwriting zeros over the text
    }

    set_vram_update(v_ram_buffer); // set a pointer to the buffer
    // it will auto-update during v-blank

  }
};

So, step 1, fill the v_ram_buffer. Step 2, set_vram_update(v_ram_buffer); will set a pointer to the data and set a flag to push the data to the PPU during the next v-blank.

Note: set_vram_update(NULL); will disable v-ram updates.

And, here it the source code…

http://dl.dropboxusercontent.com/s/cupgyz9bg8ibjny/lesson22.zip

lesson22

↧