Language…
6 users online: DanMario24YT, JPhanto,  K.T.B., Majink12, Maw, Rhubarb44230 - Guests: 230 - Bots: 349
Users: 64,795 (2,378 active)
Latest user: mathew

$431A - DMA and HDMA Information

DMA, or "direct memory access" is found in a number of computer systems, not
just the Super Nintendo. It's basically a way for a peripheral or
coprocessor to read data directly from memory, instead of requiring the main
CPU to do a number of reads and writes. This is typically faster, if only
because it lets the system skip the opcode fetch-and-decode. In the SNES, the
CPU is paused during DMA since the address buses are in use for the transfer.

HDMA is similar in concept, though rather different in execution: instead of
transferring a block of memory all at once, it transfers a few bytes during
the H-Blank period of each scanline. This is extremely helpful, as most PPU
registers may only be changed during a frame (at least without glitching)
during this narrow window.

The SNES has 8 channels (numbered 0-7) that can be used for either DMA or HDMA.
HDMA takes priority over DMA if both are to occur at once, pausing all DMA and
terminating a conflicting DMA immediately. Lower-numbered channels take
priority over higher-numbered channels.


DMA
---

A DMA transfer has three main variables, and a number of setting bits. These
are: (those marked '*' must be set up before starting DMA)
* Direction (bit 7 of $43x0): Read from PPU or write to PPU?
* Fixed (bit 3 of $43x0): Adjust Address?
* Increment (bit 4 of $43x0): Direction to adjust Address?
* Mode (bits 0-2 of $43x0): See below...
* Port (register $43x1): If this is 'xx', the register accessed will be $21xx.
* AAddress (registers $43x2-4): Any CPU address, just like you'd use with
the Absolute Long addressing mode.
* Count (registers $43x5-6): The number of bytes to transfer.

See register $43x0 for the correspondence between the Mode bits and the
transfer mode. Note that One Register Write Once and One Register Write
Twice end up being the exact same thing, and Two Registers Write Once
and Two Registers Write Twice Alternate are the same, but that Two Registers
Write Once and Two Registers Write Twice Each are different.

DMA transfers take 8 master cycles per byte transferred, no matter the
FastROM setting. There is also an overhead of 8 master cycles per channel, and
an overhead of 12-24 cycles for the whole transfer.

The basic process seems to be:
1. Get byte and write it to the destination.
- The DMA seems to take advantage of the SNES's two address buses with one
shared data bus. AAddress is pushed out Bus A, Port is pushed out bus B,
and the read/write signals are sent according to Direction. The bus
marked read obligingly put data on the bus, while the bus marked write
obligingly writes that value.
- Thus, since the PPU/APU/WRAM registers are only accessible via Bus B,
attempts to access them via AAddress will result in Open Bus accesses.
- Attempts to access WRAM via both Bus A and Bus B (registers 2180-3) will
fail, with the 2180-3 access being Open Bussed.
- Also, DMA cannot access the $4300-$437f registers nor $420b nor $420c.
Writes will have no effect, and reads will return Open Bus.
2. Adjust AAddress.
- If Fixed is set, do nothing. Else if Increment is set, subtract one,
else add one.
- Note that the bank byte is not modified.
3. Decrement Count. If count is not zero, then go to step 1.
- Thus, if Count is initially zero, it wraps to 65535 before being
tested. So you end up transferring 65536 bytes.

Note that Count ($43x5-6) ends up always 0, unless a conflicting HDMA
terminates the transfer early.


HDMA
----

HDMA has 4 flags and 5 variables. Again, those marked '*' are required
before starting HDMA. In addition, those marked '+' are required if HDMA is
to be started mid-frame.
* Addressing Mode (bit 6 of $43x0): If clear, Direct, else Indirect.
* Transfer Mode (bits 0-2 of $43x0): See below...
* Port ($43x1): As for DMA.
* AAddress ($43x2-4): Pointer to the HDMA Table. Not really 'required' for
starting mid-frame, but unless you're going to stop it before the next
init...
- Indirect Address ($43x5-6): Used with Indirect Bank. See below...
* Indirect Bank ($43x7): Used with Indirect Address. See below...
+ Address ($43x8-9): See below...
+ Repeat (bit 7 of $43xA): Whether to write every scanline or not
+ Line Counter (bits 0-6 of $43xA): See below...
- DoTransfer: Used internally.

Modes are the same as for DMA. However, note that only one cycle through the
mode is done per scanline, so One Register Write Once will write 1 byte per
scanline, while One Register Write Twice will write two.

For each scanline during which HDMA is active (i.e. at least one channel is not
paused and has not terminated yet for the frame), there are ~18 master cycles
overhead. Each active channel incurs another 8 master cycles overhead (during
which time $42xA is presumably loaded if necessary) for every scanline, whether
or not a transfer actually occurs. If a new indirect address is required, 16
master cycles are taken to load it. Then 8 cycles per byte transferred are
used. Thus, HDMA takes a maximum of 466 master cycles per scanline (if all 8
channels are active, require an indirect address load, and transfer 4 bytes).

The basic process has two sections. First, at the beginning of the frame (V=0
H=approx 6), for all active HDMA channels (see register $420c):
1. Copy AAddress into Address.
2. Load $43xA (Line Counter and Repeat) from the table. I believe $00 will
terminate this channel immediately.
3. Load Indirect Address, if necessary.
4. Set DoTransfer to true.

The CPU is paused during this time. Overhead is ~18 master cycles, plus 8
master cycles for each channel set for direct HDMA and 24 master cycles for
each channel set for indirect HDMA.

If you are starting HDMA mid-frame, you must basically do the init process
manually by setting $43x8-A, and $43x5-6 for indirect channels. Note though
that there is no way to perform step 4, so no transfer will be done the first
transfer period. Also, note that a channel that has already terminated for the
frame cannot be restarted.
XXX: Or does it automatically do Step 4 when you enable the channel?

Then, for each scanline from V=0 to V=$e0 (or V=$ef is overscan is enabled) at
about H=$116:
1. If DoTransfer is false, skip to step 3.
2. For the number of bytes (1, 2, or 4) required for this Transfer Mode...
a. Read a byte from Address or Indirect Address, and increment.
b. Write the byte to Port, Port+1, Port+2, or Port+3, depending on the
Transfer Mode and which byte we're on.
- The same notes regarding DMA from PPU to PPU or RAM to RAM via $2180
apply here as well.
3. Decrement $43xA.
4. Set DoTransfer to the value of Repeat.
5. If Line Counter is zero...
a. Read the next byte from Address into $43xA (thus, into both Line
Counter and Repeat).
b. If Addressing Mode is Indirect, read two bytes from Address into
Indirect Address (and increment Address by two bytes).
- One oddity: if $43xA is 0 and this is the last active HDMA channel for
this scanline, only load one byte for Address, and use the
$00 for the low byte. So Address ends up incremented one less than
otherwise expected, and one less CPU Cycle is used.
c. If $43xA is zero, terminate this HDMA channel for this frame. The bit in
$420c is not cleared, though, so it may be automatically restarted next
frame.
d. Set DoTransfer to true.
6. Continue with Step 1 next scanline.

HDMA does not occur during V-Blank, as any writes it might perform are
likely have no visible effect anyway. The start-of-frame processing then resets
all active channels at the end of V-Blank. This allows updating of the HDMA
registers during V-Blank without worrying about the transfer beginning
immediately and scribbling on the PPU state.

Note how the above implicitly defines the format of the HDMA table.
Explicitly, the format is a series of entries. Each entry begins with a line
count and repeat flag. If repeat is false, there is one scanline worth of
data following and the count is the number of scanlines to wait before
processing the next entry. If it's true, the line count is the number of
scanlines worth of data following. The data following is either a pointer to
the data (for Indirect HDMA), or the data itself (for Direct HDMA).

Looking at the above, it's clear why Address, and Repeat/Line Counter must
be initialized by hand when starting HDMA mid-frame: they're only
automatically initialized at the start of the frame. Note how AAddress is
not affected by HDMA, though Address and Repeat/Line Counter are.