This tutorial is intended to teach every aspect of using HDMA, from the basic gradient to changing screen modes to parallax scrolling while also talking about SMW specific stuff. If you notice I made a mistake, or especially if you have a question and/or think something isn't clear, please post in this thread rather than PMing me.
Note that for most cases, you can use tools to create the HDMA code for you, such as Effect Tool.
This guide explains the nitty gritty of HDMA, its inner workings.
Table 'o Contents:
A) What is HDMA?
B) Necessary resources
C) My first gradient & other basic HDMA
D) Using multiple HDMA channels
D.5) Notes on background color and color math
E) Transfer modes + Continuous mode
F) HDMA tables in RAM
H) Common other registers used with HDMA
I) Indirect HDMA
J) Appendix of HDMA registers
A) What is HDMA?
HDMA is a nifty feature of the SNES. It is used to produce many graphical effects, from the expanding circle you see in SMW's title screen to color gradients to Neo-Exdeath's wavy background in FF5.
What HDMA does is modify graphical settings in the middle of the screen being drawn on the TV. A TV works by drawing horizontal lines of color. It draws one line, the moves down a bit and draws the next. These lines are called "scanlines," and the standard SNES mode 1 resolution draws a little over 200 of them per frame.
HDMA counts the scanlines, and on ones that you specify, it changes settings to produce the desired effect. You may change the background color and produce a gradient. You might change a layer's x position to make a wavy effect. You can do a lot of things, including twisting the display beyond recognition (which may or may not be a good thing).
More on "what can be done" later.
B) Necessary Resources:
First and foremost, you need to know basic ASM. If BRA only means woman's underwear to you, then you need to go look up an ASM utorial. This tutorial is for people wanting to do more than copy/paste color gradient code.
You must know what a table is and how indexing works. You should know the difference between 8 and 16 bit addressing. You should have basic knowledge of how RAM works. If you know those things, or at least know what they are, you'll be fine.
The one document that's indispensable when using HDMA is regs.txt by Anomie. If a description of a register isn't clear in this tutorial, it probably will be in regs.txt. With in-depth descriptions of every register, you simply can't go without it.
Of note, it's a reference document. If you try to read the entire thing in one sitting, you'll go insane. Use it to look up specifics of registers you're working with.
Next, you'll need a method to run your HDMA code. LevelASM works great, as do generators. Sticking it in a block won't work (unless mario's standing on it). Preferably, you should use something that's run every frame.
Last and most obvious would be a text editor, like notepad.
C) My First Gradient & Other Basic HDMA:
Now, you've got all that stuff and want to jump right into color gradients. Sorry, we'll get to that in a bit
The easiest HDMA to do is a brightness gradient. The following code will make everything on the screen darker as it goes further down the screen:
Using this code, your screen should look something like this:

The world is so dark mario!
Alright, so you have the code, and it works, but you don't understand it. Allow me to explain everything here.
- REP #$20
you should know how this works. A becomes 16 bits in width, allowing us to set two registers at once.
- LDA #$0000
This is more complex than it might seem. You might notice in the comments $43x0 = 00, or whatever. The *low byte* of this number (#$0000) is set to register $4330. For right now, you should leave this byte 00. The *high byte* of this number (#$0000) is set to $4331, which should be 00 for a brightness gradient. The significance of this will be obvious when we get to color gradients, as this number will change.
- STA $4330
As we're in 16 bit mode, STA'ing to this register will write to both $4330 and $4331.
- LDA #LVL1BRIGHT
- STA $4332
This tells xkas to take the location of LVL1BRIGHT and put that number after the LDA. This is the pointer to your table, so the HDMA knows where to pull the values from. If you were to open this part of the code in a hex editor, you'd see that the two byte number after LDA would correspond to the SNES address of your table.
$43x2-4 needs to hold the address of your table. $43x2 = low byte, $43x3 = high byte, $43x4 = bank byte.
- PHK
- PLY
- STY $4334
You don't really need to know what PHK PLY does, just understand that it gets the bank number of where your table is and puts it in Y. This could be replaced with LDY #$xx STY $4334 if you knew beforehand which bank your table is in.
- SEP #$20
Back to 8 bit A, you should know this.
- LDA #$08
- TSB $0D9F
$0D9F holds which HDMA channels are active. You should never STA to this number, or you can screw up things like the goal tape. TSB takes the bits set in A, and sets them in the target address. In this case it would set the fourth bit, and enable HDMA channel 3.
This is the HDMA table. It is very important. You'll notice the table is split up into sections of two bytes each. This is for readability.
The first byte on each line is the scanline count ($0C). This byte controls how many scanlines the HDMA will wait before making another change. Don't set this number above 80 (will explain why later).
The second byte is the value that is written. In a brightness gradient, $0F means the screen is at full brightness, and $00 means it is perfectly black. Setting it to a higher number than F won't do anything, unless you set the highest bit (%10000000), which will kill the screen.
Finally, the very last byte in the table is important. You should always end your HDMA tables with 00. This will end the HDMA transfer. If you don't do this, it will start pulling garbage values and your screen will go funky after the final scanline count.
What if you want the lines of brightness to be thicker or thinner? What if you want it to start out dark, go light, then go back to dark again? You modify the table.
The scanline count ($0C in the table above) can be any number between $01 and $80 ($81+ will glitch). If you want the lines thicker, you raise this number. Thinner, lower the number. The number here corresponds to the number of scanlines the HDMA will wait before making another change.
The value byte can be $00-$0F for brightness, as I mentioned before. You could, if you wanted, start out with 0F, and go to 00. Or, you could alternate, or have any pattern or gradient you desire.
You can have as many scanline counts and write values as you like. If you make the HDMA go so long that the next frame starts before it's done, the remaining parts of the HDMA will be canceled and it will start over again.
With that knowledge in hand, see if you can produce the following effect (or something similar):

Cheat sheet if you can't figure it out.
D) Using Multiple HDMA Channels:
Alright, you've got the basics down. Now you want to do something a little harder... time for a color gradient.
If you were to use this code, you'd end up with something like this:

Does the goomba like the color gradient? His expression is unreadable.
So, what did we do different to make a color gradient instead of a brightness one? There are a couple big things.
- LDA #$3200 ; $43X0 = 00
- STA $4330 ; $43x1 = 32
It was LDA #$0000 with the brightness gradient, but is #$3200 here. Changing the high byte of this to 32 is what changed this from a brightness gradient to a color one. Why 32? We'll get to that MUCH later. Just know that 32 = background color.
Another thing you'll notice is the code is bigger and we have 3 tables instead of just 1. That's because this uses 3 HDMA channels. The SNES can support up to 8 HDMA channels, and while SMW uses most of these, there are 3 free ones we can use: 3, 4 and 5. The channel number corresponds to the second number in the register address ($43x0, where x is the channel number). Don't use any of the other ones, because again, you can screw things up like the goal tape or keyholes.
Using 3 channels means we can do 3 entirely different changes. Explaining why we need 3 for a color gradient will be explained in the next section; for now, just understand that we need to set the Green, Blue, and Red value of the background color separately.
But wait, why is the Blue table shorter? That's because we don't need to set every value of the color every HDMA transfer. Some scanlines only change the red and the green, and that's perfectly fine.
If all you wanted to do was mess with the blue (which can still produce a nice sky gradient), you could certainly delete the red and green tables/code and just use the blue one. That would free up two channels for use in other stuff.
D.5) Notes on Background Color & Color Math:
The following section is not HDMA specific. However, since color gradients are probably the most common thing HDMA is used for in SMW hacking, and the fact that it's rather complex, it deserves its own section.
As was mentioned in the previous section, the background color has to have each color written to separately. You may have also noticed that all the Red values were 20-2F, Green were 40-4F, and Blue were 80-8F. Here's why.
The color byte is formatted like this: BGRCCCCC
B = Set this bit to change the Blue value to CCCCC
G = Set this bit to change the Green value to CCCCC
R = Set this bit to change the Red value to CCCCC
CCCCC = Color value
To change the value of Red only, for example, you'd set the R bit, and then set the CCCCC bits to the value you want Red to be. Therefore, Red's valid hex values would be $20-3F. 25, for example, would set Red to 5 (equivalent to 40 in LM's palette editor).
You may also set two or three colors at once by setting their corresponding bits. However, this will set them to the same value.
In your HDMA table, these are the appropriate value ranges.
$20-3F = Set Red
$40-5F = Set Green
$80-9F = Set Blue
$60-7F = Set Red and Green
$A0-BF = Set Red and Blue
$C0-DF = Set Green and Blue
$E0-FF = Set Red, Green, and Blue
$00-1F = Does nothing
But what if you made the color in LM's palette editor? How do you convert those numbers to ones for this?
Here's the formula:
(Value seen in LM's palette editor)/8, convert to hex, add in 20/40/80 depending on color.
Let's say your color was 64 red, 128 green, 248 blue:
Red: 64/8 = 8. 8 in hex is 8. 8 + 20 = 28. Red's value would be 28
Green: 128/8 = 16. 16 in hex is 10. 10 + 40 = 50. Green's value would be 50
Blue: 248/8 = 31. 31 in hex is 1F. 1F + 80 = 9F. Blue's value would be 9F
Or, if you're exceptionally lazy, you can use this tool.
Great, now you know how to make a gradient. But what if you want to do something crazy; what if you want the gradient to affect your FG/BG? We use color math. And it's pretty easy.
Now, going into great detail about color math is beyond the scope of this tutorial. Here's how to make your background gradient affect everything else, too. Add this to your HDMA code (above or below it, doesn't matter):
Combined with the color gradient from before, this will make your screen look something like this:

Gah! Everything's so white!
A word of warning: If you make your background color bright, your foreground/background will look like this. Use small values for your background color and it will look better. Remember, this is color ADDITION. It ADDS two colors together, and with normal color values, this very often produces white.
Side note: messing with the main/subscreen designations can screw up priority. Some things might appear in front of objects they normally shouldn't. However, if you leave a layer on the subscreen, it will be added to the layers on the main (or look normal if you un-set the bit for it in $40), so it's sorta required for the background color to affect every layer.
Another side note: Sprites that use palettes 8-B never participate in color math. That's why mario looks normal, and the koopas don't. A goomba, for example, wouldn't be affected either.
Further side note: Normally you shouldn't write to registers like $212C outside the v-blank. In an accurate emulator, this will produce a slight glitch for a single frame. But since SMW doesn't have a mirror for these that's applied every frame, you either have to do this or make your own mirror.
Continuing on...
E) Transfer Modes + Continuous Mode:
Now we're getting into the semi advanced stuff. It's time you knew what the crap $43x0 does. Or at least partially
$43x0: DA-IFTTT
D, A, I, F = To be explained later
- = unused
TTT = Transfer mode.
This is important. Why? Because only having 3 HDMA channels available is very, very limiting if you want to do anything complex. The transfer mode allows you to do the job of 2-4 channels at once, assuming the conditions are correct.
Mode 000 is what we've been using so far. Let's make our color gradient from before only use two channels. Let's use mode 010 for one of them.
This is the exact same as our earlier color gradient, but it uses 1 channel for red and green instead of 2.
LDA #$3200 changed to #$3202, obviously, but the other big thing is the table itself. The format changed quite significantly!
Instead of the bytes alternating between values and scanline counts, it now has 2 values per scanline count. This means that 1 channel does two writes at a time. The first byte is written first, and it sets red. The second byte is then written, setting green.
The other transfer modes let you set different combinations of registers at once.
On to continuous mode. Did you wonder why you couldn't use scanline counts over 80? This is why.
If the count is 81-FF, it uses continuous mode. This means that instead of waiting a specified number of scanlines before making another write, it makes a write EVERY scanline. If the scanline count was 86, it would make a write every scanline for 6 scanlines, and then check for another scanline count. Your table's format will change significantly if you use this.
This will wait 4F scanlines then write 99, then wait 8 scanlines and write 9A, and then when it hits the scanline count of 86, it'll switch to continuous mode and write the following values every scanline. After 6 values (since I used 86), it'll check for a scanline count again, and continue as normal.
What's the point of this? Why not use scanline counts of $01? Well, if your HDMA is something that needs to be run every scanline, it can really shorten your table. Other than that though...
F) HDMA Tables in RAM:
What if you want your HDMA to change? If your table is in the ROM, you can't change it based on anything unless you use a separate table. That's not reasonable if you want to do something like parallax scrolling that can have literally thousands of possible states.
So let's stick it in RAM.
You should stick the following code in some sort of initialization routine that is run once. Running it every frame WILL cause slowdown if your table is remotely large.
First we defined some free RAM. There's plenty of it if you look in the RAM map. If you're still not sure it's free, just open your rom in a debugger and check; if it's all 55s, it's likely free.
Next we used a loop to grab the values from the table and stick them in RAM. Now, the values in the table are the *initial* values. After we upload it, we needn't touch that table again, at least until the level is loaded once more.
Now, our normal code has changed too.
Since we know exactly where our table is, we don't have to have the assembler get the pointer to it.
So, our table is in RAM, and the HDMA pointer is... pointing at it. How do we change it?
Pretty simple, actually. However, you must be careful that you don't change the scanline counts (unless you want to). Your table is interleaved, meaning it is part one type of data and part another.
Now, this code is pretty ridiculous to use for a color gradient. We're basing the color off the layer 1 x position? Ha! However, this is quite common if you're using a parallax type effect. More on how to do that in the next section
You'll notice the +$number after !FREERAM. !FREERAM+$1 means the second byte in the table, or the first value. From there it increments by 2, effectively skipping the scanline counts and only affecting the values.
A NOTE IF YOU CHANGE SCANLINE COUNTS:
If you're purposefully changing scanline counts every frame, there's something you should know. HDMA doesn't take the table and store it somewhere else at the start of the frame. It reads it when it needs to.
Levelasm and sprites are run while the screen is being drawn. So, if your code changes the table, a glitch can occur where it is using the values set the previous frame on part of the screen, and the values set this frame on the rest. This is most noticeable with scanline count changes because the effect can "flicker" up and down. What you need to do to fix this is "double buffer" your table. That means upload it in two places and alternate each frame which you change. Because changes to $43x2-4 don't take effect until the next frame, you can effectively avoid the glitch entirely.
Now, if you actually assembled the earlier code, you'd see your background gradient going funky depending on the level scrolling. That's how it should work in this case
But how about using this for something useful?
H) Common Other Registers Used With HDMA:
Let's go back to this part of the code:
To do things besides color and brightness gradients, you need to change the high byte (#$3200) here. This number corresponds to registers $2100-FF. $2132 is the background color register, and what we've been writing to so far. But what else is there?
Layer scrolling:
Using HDMA, you should *always* modify the horizontal scrolls unless you know exactly what you're doing. Because of the way HDMA works, modifying the vertical will look very glitchy, doubling some pixels and hiding others. It's possible to use the vertical for a neat effect (see: FF5's desert battle background), but very advanced.
Of very important note is that these are "write twice" registers. You MUST write two values to these with your HDMA or it will look horrible. So, you might want to use transfer mode 010 and make your HDMA write two values at once. Remember that this changes the format of your table.
Also, if you don't want a static BG, you should probably stick the table in RAM and update it. Explaining the math to be done for every effect possible with this (Like a wavy bg effect) is beyond the scope of this tutorial, but if you know a decent amount of math, you should be able to figure it out. HINT: Base it on the layer 1 x ($1A-B) and calculate the values from there.
Main and Subscreen Designations:
I briefly explained this in the color math part, but if you skipped that, this is how these work.
$212C is the "main screen" register. $212D is the "sub screen" designation. The bytes are formatted like this:
---S4321
Each bit represents each layer (and sprites). By putting a layer on the subscreen only, things on the main screen will appear in front of it regardless of priority. Alternatively, if something's on the main screen and not the subscreen, things on the subscreen will appear behind it.
This is useful for having a pseudo multi layer effect. You can stick layer 2 behind layer 3 for a certain part of the screen, then switch them, making the illusion of 3+ background layers.
If you're crazy, you can also abuse this for funky color math effects. Things on the subscreen can be added/subtracted to things on the main screen, but not vice versa. So, you can make your BG be added to the foreground for a certain part of the screen but not the rest of it. However, to do that, it's probably better to use...
Color math registers:
There are a LOT of things you can do with color math. So many that I'm not going to go into detail. The registers are $2130-1.
If you want to do the aforementioned "color math above below this line," you'll want to change $2131.
Look these up in regs.txt if you want to know more.
Windowing Registers:
Windowing HDMA is hell.
Your tables are probably huge, even more so since you have two registers to update. Plus, you'll need 2 tables (or transfer mode 100) if you're using both windows, and you'll use twice the RAM if you double buffer them. Yeah....
I'm not going to explain the basics of windowing here. Just know that the registers are $2126-9 for the window positions, $212A-B for the logic, $212E-F to determine which layer(s) they affect.
Oh, and you can window color math with them. Yay.
Look them up in regs.txt if you want to know more about them.
There are many other registers HDMA can affect, some useful, some not. Anything $2100-FF is technically free game. Look them up in regs.txt if you want to know what else you can affect.
I) Indirect HDMA:
Let's say your table in RAM repeats. A lot. Like, a wavy BG effect that is about 20 wavelengths from the top to the bottom of the screen. Do you really want to update the table in 20 places with the exact same values? No. That's when you use indirect HDMA.
What is it, exactly? Indirect HDMA essentially splits your HDMA table into two tables. One contains the scanline count and a pointer to the value. The other table contains the values. The benefit of this is you can point to the same value multiple times, and as such only need to update 1 ram address.
First we upload the value table into RAM. You may notice it's only 3 bytes long. But we want it to alternate between the first two values a bunch of times, then use the final value, so we're using indirect HDMA.
First off, $43x0 needs to have the 7th bit set (%01000000) to use indirect HDMA.
Next, you'll notice $43x2-4 point to the table in ROM, not the one at $7FA100. It should point to the table with scanline counts + pointers.
Also, there's an added line: "LDY #$7F STY $4337" $43x7 holds the bank byte that the indirect HDMA uses to point to values. Since our value table is in RAM in bank 7F, we set this to 7F.
As for the table in ROM (it could be in RAM if we so desired), the format is always the same using indirect HDMA, regardless of the transfer mode. 1 scanline count byte, 2 pointer bytes. First byte is the low byte of the pointer, second is the high byte. The bank byte was already set in the main code and cannot be set in the table.
Using different transfer modes will still modify the format of your value table. If it does 4 writes, it will pull 4 bytes from wherever the value pointer points to.
So, we used the same two values multiple times and the last value once. Were this repetitive HDMA that needed to be updated every frame, this significantly reduces the amount of code and processing time you would need to update your table.
J) Appendix of HDMA Registers:
Brief descriptions here. If you want in-depth ones, look at regs.txt.
$43x0: da-ifttt
d: Not useful for HMDA. Regs.txt lists the single use of this bit for HDMA, as it is normally used for DMA.
a: Indirect mode flag
-: unused
f: Does not affect HDMA (used by DMA)
ttt: Transfer mode.
$43x1:
The number placed here corresponds to the register that HDMA writes to in $2100-FF
$43x2: Low byte of pointer to HDMA table
$43x3: High byte of pointer to HDMA table
$43x4: Bank byte of pointer to HDMA table
$43x5-6: Low and high byte of pointer to value table of indirect HDMA. Does not need to be written to by the programmer.
$43x7: Bank byte of pointer to value table of indirect HDMA. Normally needs to be written to.
$43x8-9: HDMA table address. This is what points to the current location in your table. Can be changed if desired in an IRQ, though indirect HDMA is generally better for that.
$43xA: Scanline counter. Counts down by 1 until it hits 0 (or 80 if in continuous mode), and then does a transfer. Can be changed in an IRQ if desired, but... why?
$43xB-F: Unused/unknown/doesn't exist
Phew. That's all folks. Post questions/comments/concerns/random exclamations/errors I made/life stories/etc here.
Note that for most cases, you can use tools to create the HDMA code for you, such as Effect Tool.
This guide explains the nitty gritty of HDMA, its inner workings.
Table 'o Contents:
A) What is HDMA?
B) Necessary resources
C) My first gradient & other basic HDMA
D) Using multiple HDMA channels
D.5) Notes on background color and color math
E) Transfer modes + Continuous mode
F) HDMA tables in RAM
H) Common other registers used with HDMA
I) Indirect HDMA
J) Appendix of HDMA registers
A) What is HDMA?
HDMA is a nifty feature of the SNES. It is used to produce many graphical effects, from the expanding circle you see in SMW's title screen to color gradients to Neo-Exdeath's wavy background in FF5.
What HDMA does is modify graphical settings in the middle of the screen being drawn on the TV. A TV works by drawing horizontal lines of color. It draws one line, the moves down a bit and draws the next. These lines are called "scanlines," and the standard SNES mode 1 resolution draws a little over 200 of them per frame.
HDMA counts the scanlines, and on ones that you specify, it changes settings to produce the desired effect. You may change the background color and produce a gradient. You might change a layer's x position to make a wavy effect. You can do a lot of things, including twisting the display beyond recognition (which may or may not be a good thing).
More on "what can be done" later.
B) Necessary Resources:
First and foremost, you need to know basic ASM. If BRA only means woman's underwear to you, then you need to go look up an ASM utorial. This tutorial is for people wanting to do more than copy/paste color gradient code.
You must know what a table is and how indexing works. You should know the difference between 8 and 16 bit addressing. You should have basic knowledge of how RAM works. If you know those things, or at least know what they are, you'll be fine.
The one document that's indispensable when using HDMA is regs.txt by Anomie. If a description of a register isn't clear in this tutorial, it probably will be in regs.txt. With in-depth descriptions of every register, you simply can't go without it.
Of note, it's a reference document. If you try to read the entire thing in one sitting, you'll go insane. Use it to look up specifics of registers you're working with.
Next, you'll need a method to run your HDMA code. LevelASM works great, as do generators. Sticking it in a block won't work (unless mario's standing on it). Preferably, you should use something that's run every frame.
Last and most obvious would be a text editor, like notepad.
C) My First Gradient & Other Basic HDMA:
Now, you've got all that stuff and want to jump right into color gradients. Sorry, we'll get to that in a bit

The easiest HDMA to do is a brightness gradient. The following code will make everything on the screen darker as it goes further down the screen:
Code
REP #$20 ; 16 bit A LDA #$0000 ; $43X0 = 00 STA $4330 ; $43x1 = 00 LDA #LVL1BRIGHT ; get pointer to brightness table STA $4332 ; store it to low and high byte pointer PHK ; get bank PLY ; STY $4334 ; store to bank pointer byte SEP #$20 ; 8 bit A LDA #$08 ; Enable HDMA on channel 3 TSB $0D9F ; RTS ; return LVL1BRIGHT: db $0C,$0F db $0C,$0E db $0C,$0D db $0C,$0C db $0C,$0B db $0C,$0A db $0C,$09 db $0C,$08 db $0C,$07 db $0C,$06 db $0C,$05 db $0C,$04 db $0C,$03 db $0C,$02 db $0C,$01 db $0C,$00 db $00
Using this code, your screen should look something like this:

The world is so dark mario!

Alright, so you have the code, and it works, but you don't understand it. Allow me to explain everything here.
Code
REP #$20 ; 16 bit A LDA #$0000 ; $43X0 = 00 STA $4330 ; $43x1 = 00 LDA #LVL1BRIGHT ; get pointer to brightness table STA $4332 ; store it to low and high byte pointer PHK ; get bank PLY ; STY $4334 ; store to bank pointer byte SEP #$20 ; 8 bit A LDA #$08 ; Enable HDMA on channel 3 TSB $0D9F ;
- REP #$20
you should know how this works. A becomes 16 bits in width, allowing us to set two registers at once.
- LDA #$0000
This is more complex than it might seem. You might notice in the comments $43x0 = 00, or whatever. The *low byte* of this number (#$0000) is set to register $4330. For right now, you should leave this byte 00. The *high byte* of this number (#$0000) is set to $4331, which should be 00 for a brightness gradient. The significance of this will be obvious when we get to color gradients, as this number will change.
- STA $4330
As we're in 16 bit mode, STA'ing to this register will write to both $4330 and $4331.
- LDA #LVL1BRIGHT
- STA $4332
This tells xkas to take the location of LVL1BRIGHT and put that number after the LDA. This is the pointer to your table, so the HDMA knows where to pull the values from. If you were to open this part of the code in a hex editor, you'd see that the two byte number after LDA would correspond to the SNES address of your table.
$43x2-4 needs to hold the address of your table. $43x2 = low byte, $43x3 = high byte, $43x4 = bank byte.
- PHK
- PLY
- STY $4334
You don't really need to know what PHK PLY does, just understand that it gets the bank number of where your table is and puts it in Y. This could be replaced with LDY #$xx STY $4334 if you knew beforehand which bank your table is in.
- SEP #$20
Back to 8 bit A, you should know this.
- LDA #$08
- TSB $0D9F
$0D9F holds which HDMA channels are active. You should never STA to this number, or you can screw up things like the goal tape. TSB takes the bits set in A, and sets them in the target address. In this case it would set the fourth bit, and enable HDMA channel 3.
Code
LVL1BRIGHT: db $0C,$0F db $0C,$0E ... db $0C,$00 db $00
This is the HDMA table. It is very important. You'll notice the table is split up into sections of two bytes each. This is for readability.
The first byte on each line is the scanline count ($0C). This byte controls how many scanlines the HDMA will wait before making another change. Don't set this number above 80 (will explain why later).
The second byte is the value that is written. In a brightness gradient, $0F means the screen is at full brightness, and $00 means it is perfectly black. Setting it to a higher number than F won't do anything, unless you set the highest bit (%10000000), which will kill the screen.
Finally, the very last byte in the table is important. You should always end your HDMA tables with 00. This will end the HDMA transfer. If you don't do this, it will start pulling garbage values and your screen will go funky after the final scanline count.
What if you want the lines of brightness to be thicker or thinner? What if you want it to start out dark, go light, then go back to dark again? You modify the table.
The scanline count ($0C in the table above) can be any number between $01 and $80 ($81+ will glitch). If you want the lines thicker, you raise this number. Thinner, lower the number. The number here corresponds to the number of scanlines the HDMA will wait before making another change.
The value byte can be $00-$0F for brightness, as I mentioned before. You could, if you wanted, start out with 0F, and go to 00. Or, you could alternate, or have any pattern or gradient you desire.
You can have as many scanline counts and write values as you like. If you make the HDMA go so long that the next frame starts before it's done, the remaining parts of the HDMA will be canceled and it will start over again.
With that knowledge in hand, see if you can produce the following effect (or something similar):

Cheat sheet if you can't figure it out.
D) Using Multiple HDMA Channels:
Alright, you've got the basics down. Now you want to do something a little harder... time for a color gradient.
Code
; HDMA stuff REP #$20 ; 16 bit A LDA #$3200 ; $43X0 = 00 STA $4330 ; $43x1 = 32 LDA #LVL1RED ; get pointer to red color table STA $4332 ; store it to low and high byte pointer PHK ; get bank PLY ; STY $4334 ; store to bank pointer byte LDA #$3200 ; $43X0 = 00 STA $4340 ; $43x1 = 32 LDA #LVL1GREEN ; get pointer to red color table STA $4342 ; store it to low and high byte pointer STY $4344 ; store to bank pointer byte LDA #$3200 ; $43X0 = 00 STA $4350 ; $43x1 = 32 LDA #LVL1BLUE ; get pointer to red color table STA $4352 ; store it to low and high byte pointer STY $4354 ; store to bank pointer byte SEP #$20 ; 8 bit A LDA #$38 ; Enable HDMA on channels 3 4 and 5 TSB $0D9F ; RTS ; return LVL1RED: db $4F,$21 db $04,$22 db $04,$24 db $04,$25 db $04,$27 db $04,$28 db $04,$2A db $04,$2B db $04,$2D db $04,$2E db $04,$30 db $00 LVL1GREEN: db $4F,$50 db $04,$51 db $04,$52 db $04,$53 db $04,$54 db $04,$55 db $04,$56 db $04,$57 db $04,$58 db $04,$59 db $04,$5A db $00 LVL1BLUE: db $4F,$99 db $08,$9A db $08,$9B db $08,$9D db $08,$9E db $08,$9F db $00
If you were to use this code, you'd end up with something like this:

Does the goomba like the color gradient? His expression is unreadable.
So, what did we do different to make a color gradient instead of a brightness one? There are a couple big things.
- LDA #$3200 ; $43X0 = 00
- STA $4330 ; $43x1 = 32
It was LDA #$0000 with the brightness gradient, but is #$3200 here. Changing the high byte of this to 32 is what changed this from a brightness gradient to a color one. Why 32? We'll get to that MUCH later. Just know that 32 = background color.
Another thing you'll notice is the code is bigger and we have 3 tables instead of just 1. That's because this uses 3 HDMA channels. The SNES can support up to 8 HDMA channels, and while SMW uses most of these, there are 3 free ones we can use: 3, 4 and 5. The channel number corresponds to the second number in the register address ($43x0, where x is the channel number). Don't use any of the other ones, because again, you can screw things up like the goal tape or keyholes.
Using 3 channels means we can do 3 entirely different changes. Explaining why we need 3 for a color gradient will be explained in the next section; for now, just understand that we need to set the Green, Blue, and Red value of the background color separately.
But wait, why is the Blue table shorter? That's because we don't need to set every value of the color every HDMA transfer. Some scanlines only change the red and the green, and that's perfectly fine.
If all you wanted to do was mess with the blue (which can still produce a nice sky gradient), you could certainly delete the red and green tables/code and just use the blue one. That would free up two channels for use in other stuff.
D.5) Notes on Background Color & Color Math:
The following section is not HDMA specific. However, since color gradients are probably the most common thing HDMA is used for in SMW hacking, and the fact that it's rather complex, it deserves its own section.
As was mentioned in the previous section, the background color has to have each color written to separately. You may have also noticed that all the Red values were 20-2F, Green were 40-4F, and Blue were 80-8F. Here's why.
The color byte is formatted like this: BGRCCCCC
B = Set this bit to change the Blue value to CCCCC
G = Set this bit to change the Green value to CCCCC
R = Set this bit to change the Red value to CCCCC
CCCCC = Color value
To change the value of Red only, for example, you'd set the R bit, and then set the CCCCC bits to the value you want Red to be. Therefore, Red's valid hex values would be $20-3F. 25, for example, would set Red to 5 (equivalent to 40 in LM's palette editor).
You may also set two or three colors at once by setting their corresponding bits. However, this will set them to the same value.
In your HDMA table, these are the appropriate value ranges.
$20-3F = Set Red
$40-5F = Set Green
$80-9F = Set Blue
$60-7F = Set Red and Green
$A0-BF = Set Red and Blue
$C0-DF = Set Green and Blue
$E0-FF = Set Red, Green, and Blue
$00-1F = Does nothing
But what if you made the color in LM's palette editor? How do you convert those numbers to ones for this?
Here's the formula:
(Value seen in LM's palette editor)/8, convert to hex, add in 20/40/80 depending on color.
Let's say your color was 64 red, 128 green, 248 blue:
Red: 64/8 = 8. 8 in hex is 8. 8 + 20 = 28. Red's value would be 28
Green: 128/8 = 16. 16 in hex is 10. 10 + 40 = 50. Green's value would be 50
Blue: 248/8 = 31. 31 in hex is 1F. 1F + 80 = 9F. Blue's value would be 9F
Or, if you're exceptionally lazy, you can use this tool.
Great, now you know how to make a gradient. But what if you want to do something crazy; what if you want the gradient to affect your FG/BG? We use color math. And it's pretty easy.
Now, going into great detail about color math is beyond the scope of this tutorial. Here's how to make your background gradient affect everything else, too. Add this to your HDMA code (above or below it, doesn't matter):
Code
LDA #$3F ; enable color math on everything STA $40 LDA #$00 ; use background color instead of subscreen layers STA $44 LDA #$1F ; Put all layers on the main screen STA $212C LDA #$00 ; No layers on subscreen STA $212D
Combined with the color gradient from before, this will make your screen look something like this:

Gah! Everything's so white!
A word of warning: If you make your background color bright, your foreground/background will look like this. Use small values for your background color and it will look better. Remember, this is color ADDITION. It ADDS two colors together, and with normal color values, this very often produces white.
Side note: messing with the main/subscreen designations can screw up priority. Some things might appear in front of objects they normally shouldn't. However, if you leave a layer on the subscreen, it will be added to the layers on the main (or look normal if you un-set the bit for it in $40), so it's sorta required for the background color to affect every layer.
Another side note: Sprites that use palettes 8-B never participate in color math. That's why mario looks normal, and the koopas don't. A goomba, for example, wouldn't be affected either.
Further side note: Normally you shouldn't write to registers like $212C outside the v-blank. In an accurate emulator, this will produce a slight glitch for a single frame. But since SMW doesn't have a mirror for these that's applied every frame, you either have to do this or make your own mirror.
Continuing on...
E) Transfer Modes + Continuous Mode:
Now we're getting into the semi advanced stuff. It's time you knew what the crap $43x0 does. Or at least partially
$43x0: DA-IFTTT
D, A, I, F = To be explained later
- = unused
TTT = Transfer mode.
Code
ttt = Transfer Mode. 000 => 1 register write once (1 byte: p ) 001 => 2 registers write once (2 bytes: p, p+1 ) 010 => 1 register write twice (2 bytes: p, p ) 011 => 2 registers write twice each (4 bytes: p, p, p+1, p+1) 100 => 4 registers write once (4 bytes: p, p+1, p+2, p+3) 101 => 2 registers write twice alternate (4 bytes: p, p+1, p, p+1) 110 => 1 register write twice (2 bytes: p, p ) 111 => 2 registers write twice each (4 bytes: p, p, p+1, p+1)
This is important. Why? Because only having 3 HDMA channels available is very, very limiting if you want to do anything complex. The transfer mode allows you to do the job of 2-4 channels at once, assuming the conditions are correct.
Mode 000 is what we've been using so far. Let's make our color gradient from before only use two channels. Let's use mode 010 for one of them.
Code
; earlier code LDA #$3202 ; $43X0 = 02 STA $4330 ; $43x1 = 32 LDA #LVL1REDGREEN ; get pointer to red/green color table STA $4332 ; store it to low and high byte pointer ; later code LVL1REDGREEN: db $4F,$21,$50 db $04,$22,$51 db $04,$24,$52 db $04,$25,$53 db $04,$27,$54 db $04,$28,$55 db $04,$2A,$56 db $04,$2B,$57 db $04,$2D,$58 db $04,$2E,$59 db $04,$30,$5A db $00
This is the exact same as our earlier color gradient, but it uses 1 channel for red and green instead of 2.
LDA #$3200 changed to #$3202, obviously, but the other big thing is the table itself. The format changed quite significantly!
Instead of the bytes alternating between values and scanline counts, it now has 2 values per scanline count. This means that 1 channel does two writes at a time. The first byte is written first, and it sets red. The second byte is then written, setting green.
The other transfer modes let you set different combinations of registers at once.
On to continuous mode. Did you wonder why you couldn't use scanline counts over 80? This is why.
If the count is 81-FF, it uses continuous mode. This means that instead of waiting a specified number of scanlines before making another write, it makes a write EVERY scanline. If the scanline count was 86, it would make a write every scanline for 6 scanlines, and then check for another scanline count. Your table's format will change significantly if you use this.
Code
LVL1BLUE: db $4F,$99 db $08,$9A db $86,$9B,$9C,$9D,$9E,$9F,$9D db $08,$9C db $08,$9B db $08,$9A db $00
This will wait 4F scanlines then write 99, then wait 8 scanlines and write 9A, and then when it hits the scanline count of 86, it'll switch to continuous mode and write the following values every scanline. After 6 values (since I used 86), it'll check for a scanline count again, and continue as normal.
What's the point of this? Why not use scanline counts of $01? Well, if your HDMA is something that needs to be run every scanline, it can really shorten your table. Other than that though...

F) HDMA Tables in RAM:
What if you want your HDMA to change? If your table is in the ROM, you can't change it based on anything unless you use a separate table. That's not reasonable if you want to do something like parallax scrolling that can have literally thousands of possible states.
So let's stick it in RAM.
You should stick the following code in some sort of initialization routine that is run once. Running it every frame WILL cause slowdown if your table is remotely large.
Code
!FREERAM = $7FA100 LDX #$0B LVL1_HDMAINIT: LDA LVL1BLUE,x STA !FREERAM,x DEX BPL LVL1_HDMAINIT RTS LVL1BLUE: db $4F,$99 db $08,$9A db $08,$9B db $08,$9D db $08,$9E db $08,$9F db $00
First we defined some free RAM. There's plenty of it if you look in the RAM map. If you're still not sure it's free, just open your rom in a debugger and check; if it's all 55s, it's likely free.
Next we used a loop to grab the values from the table and stick them in RAM. Now, the values in the table are the *initial* values. After we upload it, we needn't touch that table again, at least until the level is loaded once more.
Now, our normal code has changed too.
Code
LDA #$A100 ; get pointer to blue color table STA $4352 ; store it to low and high byte pointer LDY #$7F STY $4354 ; store to bank pointer byte
Since we know exactly where our table is, we don't have to have the assembler get the pointer to it.
So, our table is in RAM, and the HDMA pointer is... pointing at it. How do we change it?
Pretty simple, actually. However, you must be careful that you don't change the scanline counts (unless you want to). Your table is interleaved, meaning it is part one type of data and part another.
Code
SEP #$20 LDA $1A STA !FREERAM+$1 LSR STA !FREERAM+$3 LSR STA !FREERAM+$5
Now, this code is pretty ridiculous to use for a color gradient. We're basing the color off the layer 1 x position? Ha! However, this is quite common if you're using a parallax type effect. More on how to do that in the next section
You'll notice the +$number after !FREERAM. !FREERAM+$1 means the second byte in the table, or the first value. From there it increments by 2, effectively skipping the scanline counts and only affecting the values.
A NOTE IF YOU CHANGE SCANLINE COUNTS:
If you're purposefully changing scanline counts every frame, there's something you should know. HDMA doesn't take the table and store it somewhere else at the start of the frame. It reads it when it needs to.
Levelasm and sprites are run while the screen is being drawn. So, if your code changes the table, a glitch can occur where it is using the values set the previous frame on part of the screen, and the values set this frame on the rest. This is most noticeable with scanline count changes because the effect can "flicker" up and down. What you need to do to fix this is "double buffer" your table. That means upload it in two places and alternate each frame which you change. Because changes to $43x2-4 don't take effect until the next frame, you can effectively avoid the glitch entirely.
Now, if you actually assembled the earlier code, you'd see your background gradient going funky depending on the level scrolling. That's how it should work in this case

H) Common Other Registers Used With HDMA:
Let's go back to this part of the code:
Code
LDA #$3200 ; $43X0 = 00 STA $4330 ; $43x1 = 32
To do things besides color and brightness gradients, you need to change the high byte (#$3200) here. This number corresponds to registers $2100-FF. $2132 is the background color register, and what we've been writing to so far. But what else is there?
Layer scrolling:
Code
210f ww+++- BG2HOFS - BG2 Horizontal Scroll 2110 ww+++- BG2VOFS - BG2 Vertical Scroll 2111 ww+++- BG3HOFS - BG3 Horizontal Scroll 2112 ww+++- BG3VOFS - BG3 Vertical Scroll 2113 ww+++- BG4HOFS - BG4 Horizontal Scroll 2114 ww+++- BG4VOFS - BG4 Vertical Scroll ------xx xxxxxxxx Note that these are "write twice" registers, first the low byte is written then the high.
Using HDMA, you should *always* modify the horizontal scrolls unless you know exactly what you're doing. Because of the way HDMA works, modifying the vertical will look very glitchy, doubling some pixels and hiding others. It's possible to use the vertical for a neat effect (see: FF5's desert battle background), but very advanced.
Of very important note is that these are "write twice" registers. You MUST write two values to these with your HDMA or it will look horrible. So, you might want to use transfer mode 010 and make your HDMA write two values at once. Remember that this changes the format of your table.
Also, if you don't want a static BG, you should probably stick the table in RAM and update it. Explaining the math to be done for every effect possible with this (Like a wavy bg effect) is beyond the scope of this tutorial, but if you know a decent amount of math, you should be able to figure it out. HINT: Base it on the layer 1 x ($1A-B) and calculate the values from there.
Main and Subscreen Designations:
I briefly explained this in the color math part, but if you skipped that, this is how these work.
$212C is the "main screen" register. $212D is the "sub screen" designation. The bytes are formatted like this:
---S4321
Each bit represents each layer (and sprites). By putting a layer on the subscreen only, things on the main screen will appear in front of it regardless of priority. Alternatively, if something's on the main screen and not the subscreen, things on the subscreen will appear behind it.
This is useful for having a pseudo multi layer effect. You can stick layer 2 behind layer 3 for a certain part of the screen, then switch them, making the illusion of 3+ background layers.
If you're crazy, you can also abuse this for funky color math effects. Things on the subscreen can be added/subtracted to things on the main screen, but not vice versa. So, you can make your BG be added to the foreground for a certain part of the screen but not the rest of it. However, to do that, it's probably better to use...
Color math registers:
There are a LOT of things you can do with color math. So many that I'm not going to go into detail. The registers are $2130-1.
If you want to do the aforementioned "color math above below this line," you'll want to change $2131.
Code
shbo4321 s = Add/subtract select 0 => Add the colors 1 => Subtract the colors h = Half color math. When set, the result of the color math is divided by 2 (except when $2130 bit 1 is set and the fixed color is used, or when color is cliped). 4/3/2/1/o/b = Enable color math on BG1/BG2/BG3/BG4/OBJ/Backdrop
Look these up in regs.txt if you want to know more.
Windowing Registers:
Windowing HDMA is hell.
Your tables are probably huge, even more so since you have two registers to update. Plus, you'll need 2 tables (or transfer mode 100) if you're using both windows, and you'll use twice the RAM if you double buffer them. Yeah....
I'm not going to explain the basics of windowing here. Just know that the registers are $2126-9 for the window positions, $212A-B for the logic, $212E-F to determine which layer(s) they affect.
Oh, and you can window color math with them. Yay.
Look them up in regs.txt if you want to know more about them.
There are many other registers HDMA can affect, some useful, some not. Anything $2100-FF is technically free game. Look them up in regs.txt if you want to know what else you can affect.
I) Indirect HDMA:
Let's say your table in RAM repeats. A lot. Like, a wavy BG effect that is about 20 wavelengths from the top to the bottom of the screen. Do you really want to update the table in 20 places with the exact same values? No. That's when you use indirect HDMA.
What is it, exactly? Indirect HDMA essentially splits your HDMA table into two tables. One contains the scanline count and a pointer to the value. The other table contains the values. The benefit of this is you can point to the same value multiple times, and as such only need to update 1 ram address.
Code
; init !FREERAM = $7FA100 LDX #$02 LVL1_HDMAINIT: LDA LVL1RED,x STA !FREERAM,x DEX BPL LVL1_HDMAINIT RTS LVL1RED: db $21 db $22 db $23 ; main code LDA #$3240 ; $43X0 = 40 STA $4330 ; $43x1 = 32 LDA #LVL1REDPOINTER ; get pointer to red color table STA $4332 ; store it to low and high byte pointer PHK ; get bank PLY ; STY $4334 ; store to bank pointer byte LDY #$7F STY $4337 ; set indirect bank byte ; other code LVL1REDPOINTER: db $0C,$00,$A0 db $0C,$01,$A0 db $0C,$00,$A0 db $0C,$01,$A0 db $0C,$00,$A0 db $0C,$01,$A0 db $0C,$00,$A0 db $0C,$01,$A0 db $0C,$00,$A0 db $0C,$01,$A0 db $0C,$00,$A0 db $0C,$01,$A0 db $0C,$00,$A0 db $0C,$01,$A0 db $0C,$02,$A0 db $00
First we upload the value table into RAM. You may notice it's only 3 bytes long. But we want it to alternate between the first two values a bunch of times, then use the final value, so we're using indirect HDMA.
First off, $43x0 needs to have the 7th bit set (%01000000) to use indirect HDMA.
Next, you'll notice $43x2-4 point to the table in ROM, not the one at $7FA100. It should point to the table with scanline counts + pointers.
Also, there's an added line: "LDY #$7F STY $4337" $43x7 holds the bank byte that the indirect HDMA uses to point to values. Since our value table is in RAM in bank 7F, we set this to 7F.
As for the table in ROM (it could be in RAM if we so desired), the format is always the same using indirect HDMA, regardless of the transfer mode. 1 scanline count byte, 2 pointer bytes. First byte is the low byte of the pointer, second is the high byte. The bank byte was already set in the main code and cannot be set in the table.
Using different transfer modes will still modify the format of your value table. If it does 4 writes, it will pull 4 bytes from wherever the value pointer points to.
So, we used the same two values multiple times and the last value once. Were this repetitive HDMA that needed to be updated every frame, this significantly reduces the amount of code and processing time you would need to update your table.
J) Appendix of HDMA Registers:
Brief descriptions here. If you want in-depth ones, look at regs.txt.
$43x0: da-ifttt
d: Not useful for HMDA. Regs.txt lists the single use of this bit for HDMA, as it is normally used for DMA.
a: Indirect mode flag
-: unused
f: Does not affect HDMA (used by DMA)
ttt: Transfer mode.
Code
000 => 1 register write once (1 byte: p ) 001 => 2 registers write once (2 bytes: p, p+1 ) 010 => 1 register write twice (2 bytes: p, p ) 011 => 2 registers write twice each (4 bytes: p, p, p+1, p+1) 100 => 4 registers write once (4 bytes: p, p+1, p+2, p+3) 101 => 2 registers write twice alternate (4 bytes: p, p+1, p, p+1) 110 => 1 register write twice (2 bytes: p, p ) 111 => 2 registers write twice each (4 bytes: p, p, p+1, p+1)
$43x1:
The number placed here corresponds to the register that HDMA writes to in $2100-FF
$43x2: Low byte of pointer to HDMA table
$43x3: High byte of pointer to HDMA table
$43x4: Bank byte of pointer to HDMA table
$43x5-6: Low and high byte of pointer to value table of indirect HDMA. Does not need to be written to by the programmer.
$43x7: Bank byte of pointer to value table of indirect HDMA. Normally needs to be written to.
$43x8-9: HDMA table address. This is what points to the current location in your table. Can be changed if desired in an IRQ, though indirect HDMA is generally better for that.
$43xA: Scanline counter. Counts down by 1 until it hits 0 (or 80 if in continuous mode), and then does a transfer. Can be changed in an IRQ if desired, but... why?
$43xB-F: Unused/unknown/doesn't exist
Phew. That's all folks. Post questions/comments/concerns/random exclamations/errors I made/life stories/etc here.