Note: This is a really long tutorial, make sure you are comfortable and read it carefully. Take pauses and remember to go steady.
This tutorial is intended to teach the basic aspects of Super FX assembly, from the basic registers to sorta complex codes. If you have a question and/or think something isn't clear/right, please post in this thread rather than PMing me.
B)Getting used to the registers
What is Super FX? It is a Co-processor for general usage, clocked at 21.7 MHz (being capable of receiving overclocks at maximum 60 MHz) and RISC architecture. The Super FX makes use of 16 registers, each being 16-bit in size, ranging from R0 to R15. This CPU also have a pipeline system, loading the next instructions as one is executed. There are other registers accessed by SNES only, they will be covered on this tutorial as well.
B)Getting used to the registers
As it is noted, Super FX contains 16 registers for usage but not all them are general only. This table will show the registers and their relation with Super FX, registers in Italic
means that GSU can't access it:
||Super NES CPU Address
||Default Source/Destination Registers
||PLOT Instructions, X coordinate
||PLOT Instructions, Y coordinate
||LMULT Instructions, lower 16 bits
||FMULT and LMULT instructions, multiplication
||MERGE instruction, source 1
||MERGE instruction, source 2
||None but best used as Stack Pointer
||LINK instruction destination register
||LOOP instruction counter
||LOOP instruction branch
||ROM address pointer
||Indicates the status of the GSU.
|Program Bank Register
||The program bank register specifies the memory bank register to be accessed.
|ROM Bank Register
||The ROM bank register specifies the ROM bank when loading data from ROM using the ROM buffering system.
|RAM Bank Register
||The RAM bank register specifies the RAM bank when loading/writing data from RAM.
|Cache Base Register
||The cache base register specifies the starting address when data are loaded from ROM or RAM to the cache RAM.
|Screen Base Register
||The screen base register is used to specify the start address in the character data storage area.
|Screen Mode Register
||The screen mode register specifies the color gradient and screen height during PLOT processing and controls ROM and RAM bus assignments.
||The colour register contains data which specifies the colours to be plotted when PLOT processing is performed.
|Plot Option Register
||The plot option register contains flags which specify the mode to be used when a COLOR, GETC, or PLOT instruction is executed.
|Backup RAM Register
||Makes sure data at Banks $78:$79 get protected or not for writing.
|Version Code Register
||Checks for the version of the Super FX chip.
||The CONFIG register selects the operating speed of the multiplier in the GSU and sets up a mask for the interrupt signal.
|Clock Select Register
||This register assigns the Super FX operating frequency.
Now that you are aware of the registers, you will learn the basic codes, this section deals with knowledge of the operations so we can apply them later in this tutorial.
Super FX is a 16-bit CPU, almost every 8-bit operation will sign-extend the bytes to words by grabbing the bit 7 and copying from bits 8 through bit 15. The above code clarifies that, doing IBT R10,#$83
makes R10 = $FF83
. Why? #$83 in binary equals to 1000 0011
, count the bits, the last bit is always copied from the upper bits. Word operations sets the values as is.
The above code sets ROM bank to $10, meaning that ROM operations will be done taking bank $10 in mind, also, it sets bank $71, so RAM operations are done in bank $71.
Source and Destination:
The above code sets R10 as destination
from the Get Byte from ROM. So Super FX gets data from ROM and puts on R10. Later on, it sets R1 as source and puts the value of R1 to the color register. Later, it sets R3 as both source and destination and subtracts R3 from it.
The above operation can be read like this: Data -> R10 ; R1 -> COLOR ; R3-R3=R3 (0)
Be aware that R0 is the default Source and Destination register, whatever operation you do that isn't a branch nor MOVE/MOVES/ALT operations, resets the source and destination to R0.
Store and Loading:
GETB loads a byte from ROM, taking account address in ROMB:R14. STW stores a 16-bit (word) value from source to the value in the register, for example, in that code, the register used is R4, if R4 = $3232 it means that data will be stored in address RAMB:$3232. On the other hand, LDB does the reverse operation, loading a value (in this case a byte) on the destination register. SM and LM do the same except it loads/stores 16-bit values only and you can specify the address. NOTE: If you load 16-bit values, take care of even/odd addresses. If address is even, the high byte will be located at Address+1, however if address is odd, the high byte will be located at Address-1.
Jumping and Comparing:
Suppose that R1 = $8000 and R3 = $2FFF
The above code is a simple compare, same operation as SNES does, set the source to compare, if it sets the flags then you can set the branches.
The other code is a simple "subroutine", LINK (ranges from 1-4 bytes) loads the return address by doing (1 thru 4)+ R15 = R11. R15 is the Program Counter, it is where the processor is executing the codes, so if you modify R15, you are basically jumping to routines. Changing R15 makes Super FX jump to the desired location. NOTE: Due to pipeline, you should be careful for two or more bytes when jumping, Super FX will only read the first byte of the next instruction.
JMP works like the SNES version, except you use the register as jump address while LJMP does a long jump, it works by getting the source as the bank and the other register as address to jump.
Bitshift, Addition and Subtraction operations:
The above operations are pretty much self explanatory. Increase register by 1. Decrease register by 1. Add with Carry from source to destination (Source + Rn = Destination). Add without carry. Subtract with carry. Subtract without carry. Arithmetic shift right. Logical shift right. Rotate through carry right and Rotate through carry left.
The difference of the shifts is that ASR copies bit 15 into itself while LSR doesn't, shifting normally.
NOT is a simple operation, it inverts every bit. AND compares the values and if the bits match, the bit value is maintained if not the other is discarded. OR works the opposite way as AND, if the bits don't match, then the value are maintained rather than discarded. XOR albeit similar of the OR instruction, this one takes in consideration that bits SHOULDN'T be matched, otherwise they'll be inverted and last but not least, BIC performs logical AND on corresponding bits of source register and the 1's complement of register specified in register, this means the value stated will be inverted THEN AND
operation will be done.
Multiplcation is simple on Super FX. MULT and UMULT does 8-bit multiplication only while LMULT and FMULT does 16-bit calculations.
The difference of MULT and UMULT is that MULT does signed operations (it checks for the 7th bit) while UMULT doesn't, also they differ from LMULT and FMULT that they can set registers to multiply from whereas FMULT and LMULT uses R6 as prefixed register to multiply from and R4 as low word destination from the 32-bit result.
For example: Source -> R5 = $52CF and R1 = $63CF
The result would be R0 = $0961. Why? The operation is 8-bit but result is 16-bit, it takes account for sign bit. You can do yourself on Windows calculator, $FFCF*$FFCF=$0961. As for an unsigned multiplication, let's take this for example: Source -> R5 = $364F and R1 = $B2CF
The result would be R0 = $3FE1. Why? Same reason as above, HOWEVER
the operation isn't signed, therefore in Windows calculator, you'd do $004F*$00CF=$3FE1.
Long multiplications are a tad harder but they do good in complex operations. FMULT omits the R4 destination while LMULT sets the whole result, take this as an example: Source -> R5 = $B556 and R6 = $DAAB
The result would be: R0 = $0AE3 and R4 = $5C72. To check the result in Windows calculator, do $FFFFB556*$FFFFDAAB = $0AE35C72. Remembering, only UMULT doesn't account for most significant bit (either bit 7 in 8-bit operations or bit 15 in 16-bit operations.)
Loop and Cache:
The above code is simple, R12 sets the amount of times a routine should be looped. The MOVE opcode copies address from R15 (PC) to Looback address Register. The CACHE opcode needs to be used prior loops so when a LOOP command is executed, the contents of data will be ran on Cache RAM next time rather than ROM/RAM. LOOP decrements R12 and checks if it is zero, if it is, don't loop again, otherwise, jump to the address specified in R13.
Well, starting with SWAP. SWAP changes the position of high byte to low byte and vice versa. For example, if R0 is $1234, after a SWAP it'd be $3412. MOVE copies the value from source to destination, the syntax is MOVE Destination,Source. MOVES does the same as MOVE except it sets flags that can be useful for testing values. HIB gets the high byte value and places on low byte from destination. LOB does AND #$00FF and gets the low byte only.
MERGE is a tad complicated operation but it works like this: MERGE gets the high byte of R7 and places on the high byte of destination register while gets the high byte of R8 and places on the low byte of the destination register, effectively merging them.
STOP does as is, it stops Super FX's clock for SNES to read the output result.
The above code deals with alternate codes, by using ALTn instructions, you can replace certain operations with others. It is done automatically on the assembler but you can use it to save a few bytes or cycles even.
For example, without any ALTn, ADD R3 stays as is. With ALT1, then ADD R3 turns into ADC R3. With ALT2, ADD R3 turns into ADD #3. With ALT3, then ADD R3 turns into ADC #3. Beware of them!
The bitmap code is easy to understand but they require attention when working with them. CMODE sets the flags for the PLOT operation, such as transparency, dither, sprite mode and 256 bit colour. COLOR reads the source address to get the palette index for plotting.
The PLOT opcode works like a printer, it reads for the X and Y coordinates (specified by R1 and R2 respectively), the palette index pointed by COLOR and the Screen Base Register. Take into mind that PLOT will increment X so you don't have to do it.
The above code is extra codes for bitmap processing, RPIX is the alternate code for PLOT, it reads the pixel position by checking the coordinates and reads the colour information on the destination register. The GETC works like GETB except that it places data straight into Colour Register.
A reminder that this section deals only with the basics of code, below, I will do simple examples, with commented code for easier understanding.
The codes below are just examples, they are shown here to present you, how you should interpret and understand the usage of the opcodes and how to assemble them as you need.
SUB R0 ;Do R0-R0=R0 (0)
RAMB ;Store bank value from R0 to RAM Bank
IBT R1,#$44 ;R1 = $0044
IWT R2,#$8000 ;R2 = $8000
FROM R1 ;Source is R1
STB (R2) ;Store byte value from R1 on address at R2. High byte is ignored.
IBT R0,#$01 ;R1 = $0001
ROMB ;Store bank value from R0 to ROM Bank
IWT R2,#$1DFB ;R2 = $1DFB
IWT R14,#$8000 ;R14 = $8000 - Also start ROM buffering (ROM pointer)
TO R6 ;Set R6 as destination
GETB ;Get data from ROM to destination - ROMB:R14 - In this case $01:8000
TO R4 ;Set R4 as destination
LDW (R2) ;Load word value from address in R2 to destination in R4. R0 turns destination again.
LINK #4 ;Get return address by doing R15+4 = R11
IWT R15,#JumpHere ;Jump to Label
WITH R5 ;Meanwhile load this opcode and make R5 source and destination
FROM R5 ;When return, get R5 as source
ADD R1 ;Do R5+R1=R2
SM ($1AAF),R2 ;Store the result (16-bit) from R2 to address $1AAF
STOP ;Stop the CPU
UMULT #5 ;Do R5*5=R5
JMP R11 ;Return
TO R2 ;Set R2 as destination
IBT R1,#$80 ;R1 = $FF80
FROM R1 ;Set R1 as source
TO R2 ;Set R2 as destination
XOR #15 ;Do R1^F = R2 ($FF8F)
FROM R2 ;Set R2 as source
AND R1 ;Do R2 & R1 = R0 ($FF80)
BIC R2 ;Do R0 & (~ R2) = R0 ($0000) - It Inverts the register THEN it ANDs it.
NOT ;Invert all bits = R0 ($FFFF)
IBT R0,#$02 ;R0 = $0002
CMODE ;Set Transparency and Dithering Mode
IBT R1,#$00 ;\ Clear X and Y positions
IBT R2,#$00 ;/
IBT R12,#$15 ;Loop 14 times
IWT R7,#$8000 ;Set RAM area
CACHE ;The subsequent code will be read on cache
MOVE R13,R15 ;Set loopback address
LDB (R7) ;Load byte to R0 (If not specified, source/destination will be ALWAYS R0)
INC R7 ;After loading byte, increment the address
COLOR ;Store data from R0 to Colour Register
PLOT ;Plot the colour on specified coordinates, increasing X (This will draw a line)
STOP ;After looping enough, stop CPU
Obviously, this covers only the basics for the Super FX ASM, to achieve the maximum potentiality of this tutorial, you should practice doing the codes and seeing the results, there is no secret, it is trial and error.
In order to use Super FX, a few steps must have to be taken:
1. Move the NMI/IRQ routines to RAM and repoint the vectors
2. Move the Super FX Invoke routine on WRAM ($7E or $7F preferably the former)
3. Setup the initial hardware configuration for GSU
org $FFE0 ;Repoint Vector Info - Native
org $FFF0 ;Repoint Vector Info - Emulation
Basically you are repointing the vector info for the IRQ and NMI as well BRK routines. Also, you can set up checks, so you don't have to fully upload the NMI/IRQ routines to be on WRAM, basically making a check if GSU is active, so you can wait until processing is done.
By the way, to make use of the Super FX chip, you need to upload the invoke data to WRAM, the data is something like this:
LDX #$3D ;\ Give Super FX Game Pak ROM and RAM access
STX $303A ;/
STY $3034 ; Set Super FX bank
STA $301E ; (PC)
LDA #$0020 ;\ Check for G (Go) Flag
- BIT $3030 ; | If routine isn't finished yet
BNE - ;/ Loop until it is...
SEP #$20 ; Clear 16-bit mode
STZ $303A ; Give back ROM and RAM access to the SNES
After uploading the code on WRAM and set up the IRQ/NMI routines, you have to write your own routines for Super FX, after you do that, you need to invoke the code, to do that, you do:
LDY.b #Label>>16 ;\ Put address in the proper place...
LDA.w #Label ;/
JSR $xxxx ; Call Super FX and wait.
[...] ; *other code*
arch superfx ; REMEMBER! Use asar to easily create routines
; using Super FX's ASM language
[...] ; Code goes here
STOP ; Finish processing data.
arch 65816 ; Return to SNES ASM mode
LDY will hold the Bank address of the GSU routine while LDA will hold the address within the bank, then you jump to the invoke routine you uploaded. By the way, REMEMBER
to finish GSU routines with a STOP, or else you have serious chances of making things crash.
Also, read the comments well, it covers the operations of each example code, so you can see what each does. It is a short tutorial but don't be afraid to ask questions if you have any doubts.
For more information, I disponibilized a material for reading that contains the opcodes and overall info of Super FX that you can get it here
Also, you can consult this amazing Super FX guide, it contains tables and information that this tutorial doesn't cover for more info: Link Here!
- Work with SA-1
- Work with DSP-1
Update: - Added information about the External Site.
- Added information about how activate Super FX routines.