Banner
Views: 778,888,504
Time:
6 users online: Darkslayer, KirbyDee, MrDeePay, Paperdomo101, Strifer413, WhiteYoshiEgg - Guests: 31 - Bots: 241 Users: 40,835 (1,681 active)
Latest: Blugar23
Tip: If you're using the original SMW ending, remember that level 104 (Yoshi's House) will appear during it.Not logged in.
2013 ASM Workshop Summaries
Forum Index - SMW Hacking - SMW Hacking Help - Tutorials - Old Tutorials - 2013 ASM Workshop Summaries
Pages: « 1 »
These are the summaries of all of the 2013 ASM workshop lessons. The following posts will be updated as I finish summaries of each session.

SESSION 1:

Table of Contents:
  1. ROM, RAM, and SRAM
  2. The SNES Header
  3. 65c816 ASM Syntax
  4. The Accumulator
  5. The Control Flow
  6. Math
  7. Assignments



ROM, RAM, and SRAM:

Definitions:
  • Bank - a reference to a 64 KB range of memory ($0000-$FFFF)
  • RAM - memory that can be read from or written to
  • ROM - memory that can only be read from
  • SRAM - memory that can be read from or written to, but is also saved to the cartridge/computer
  • WRAM - work RAM, or "normal" system RAM

There are three different types of memory that SNES uses: ROM, RAM, and SRAM.

ROM stands for Read-Only Memory, meaning that it can be read from but not written to (during the course of the game, that is). The ROM data are what you see when you open your ROM in a hex editor. In SMW, the ROM data ranges from addresses $8000-$FFFF in banks $00-$0F. In the ROM, we can find things such as background tilemap data, graphics, coordinate tables, as well as codes for level loading, enemy interaction with players, and so on.

RAM stands for Random-Access Memory, meaning that it can be read from and written to at any point during the game. The SNES has 128 KB of "regular" RAM (also called WRAM, or work RAM), which fills up banks $7E and $7F. This is where variables are found, such as the player's position, the player's current powerup, the level data, and so on.

SRAM stands for Static RAM, meaning that its values are stored to the cartridge, or computer. Because of this, it's commonly used for save game data -- SMW uses it to store how many exits have been found and where the player is in the overworld. In SMW, the SRAM data ranges from $700000-$7007FF, but only $700000-$700359 is actually used. While SMW uses only 2 KB of SRAM, games can have more than that -- SMAS uses 8 KB of SRAM, and the maximum amount of SRAM possible is 128 KB. The SRAM data ranges from addresses $0000-$7FFF in banks $70-$73.



The SNES Header:

The SNES header controls various aspects of the ROM; for example, it contains the (ASCII) ROM name, the ROM size, and the SRAM size.

The ROM name is stored at SNES address $00FFC0, and it is a 21-character long string encoded in ASCII that identifies the ROM. In SMW, the string is "SUPER MARIOWORLD ". To edit the ROM name, the right panel of most hex editors can be used.

The ROM size is stored at SNES address $00FFD7. To calculate the ROM size, in bytes, this formula can be used: 0x800 * 2X-1, where X is the value of $00FFD7. The most common values for this address range from $08 to $0C, or 256 KB to 4 MB. SMW uses $09 as its value, which is a 512 KB ROM size.

The SRAM size is stored at SNES address $00FFD8. To calculate the SRAM size, in bytes, this formula can be used: 0x800 * 2X-1, where X is the value of $00FFD8. If X = $00, then there is simply no SRAM at all. SMW uses $01 as its value, which is a 2 KB SRAM size.



65c816 ASM Syntax:

Definitions:
  • Comment - text that is not interpreted by the assembler
  • Opcode - an operation that the CPU performs
  • Operand - a value that the CPU uses to perform an operation
  • S-CPU - the SNES's main CPU

Text preceded by a semicolon is considered to be a comment:
Code
	LDA --			; This is a comment -- it is never interpreted by the assembler

	Whereas this is not a comment, meaning the assembler will consider this line to have an error...


Opcodes tell the SNES CPU (S-CPU, for short) what kind of operation needs to be performed. For example, the CPU might need to read an address, store a value to memory, or jump to another part of the ROM. Opcodes are specified via three character mnemonics -- LDA, STA, RTS -- and must be put on separate lines unless colons are used to delimit lines:
Code
	LDA operand
	STA operand

	LDA op : STA op 	; Multiple opcodes in one line


Operands tell the S-CPU what values are needed with an operation -- you can't load a value without specifying what value needs to be loaded, nor can you read an address without specifying what address needs to be used. Numerical operands are usually written with an even number of digits for consistency (however, this is not required):
Code
	LDA $01
	LDA $0DBF
	LDA $0184A5


Certain symbols are used to distinguish what type a numerical operand is. # indicates that the operand is a constant value, and a lack of it means that the operand is an address. $ indicates that the operand is in hex, % indicates that the operand is in binary, and a lack of either of these two symbols indicates that the operand is in decimal:
Code
	LDA $19			; Address 0x19
	LDA #$19		; Constant value 0x19
	LDA #%00001101		; Constant value 0b00001101
	LDA #242		; Constant value 242


Each opcode and operand takes up an amount of space in the ROM. Opcodes always take up one byte, whereas numerical operands take up 1, 2, or 3 bytes if they are 2, 4, or 6 digits long (in hex):
Code
	LDA $19			; Two bytes
	LDA $0DBF		; Three bytes
	LDA $0184A5		; Four bytes




The Accumulator:

Definitions:
  • Absolute addressing - four-digit hex memory addressing
  • Accumulator - a general-purpose register that holds a value
  • Direct page addressing - two-digit hex memory addressing
  • LDA - Loads a value into the accumulator
  • Long addressing - six-digit hex memory addressing
  • Register - built-in variables for the CPU
  • STA - Stores the accumulator to an address
  • STZ - Stores zero to an address

The accumulator (A, for short) is a general-purpose register that holds a value.

To load a value into A, you use the LDA opcode (we will refer to the mnemonics as opcodes from now on):
Code
	LDA #$01		; Loads the constant value 0x01 into A
	LDA #$03		; Loads the constant value 0x03 into A, overwrites previous value

	LDA $018000		; Loads the value of address $018000 into A


One of the most common uses of A is storing values to RAM addresses, which is done with the STA opcode (if you want to store zero, then STZ is a better option):
Code
	STA $00			; Stores A into address $00
	STA $0DBF		; Stores A into address $0DBF

	STZ $00			; Stores zero to address $00


It's important to know that with each store, the value of A is unchanged, meaning that you can continue storing the same value into different RAM addresses:
Code
	LDA #$01		; Loads the constant value 0x01 into A
	STA $00			; Stores A into address $00
	STA $01			; Stores A into address $01


If an address is used as the operand, then we refer to it as direct page addressing, absolute addressing, or long addressing, depending on whether the operand has two hex digits, four hex digits, or six hex digits, respectively. Also, .b, .w, and .l at the end of the opcode can indicate direct page, absolute, and long addressing, as well:
Code
	LDA $01			; Direct page addressing
	LDA $0001		; Absolute addressing
	LDA $000001		; Long addressing

	LDA.w $01		; Absolute addressing
	LDA.l $01		; Long addressing


Most RAM addresses can be shortened. If the long address is $7E0000-$7E1FFF, then the 7E can be cut off. If the absolute address is $0000-$00FF, then the first 00 can be cut off. If this is possible, then you should definitely do it, as it increases the speed of the code and decreases the number of bytes the code takes up in the ROM.



Control Flow:

Definitions:
  • BCC - Branches if less than
  • BCS - Branches if greater than or equal to
  • BEQ - Branches if equal
  • BMI - Branches if negative
  • BNE - Branches if not equal
  • BPL - Branches if positive
  • BRA - Branches always
  • CMP - Compares a value with A
  • label - Marks a part of the code
  • JML - Jumps (long)
  • JMP - Jumps (short)
  • JSL - Jumps to subroutine (long)
  • JSR - Jumps to subroutine (short)
  • RTL - Returns (long)
  • RTS - Returns (short)

The control flow is the order in which the CPU executes operations. To use an analogy, we can think of an "SNES car" driving on a piece of code, which is the road. Throughout the road, you may encounter labels, which are kilometer or mile marks. These labels are alphanumeric (A-Z + 0-9 + _), unique, and defined with a colon afterwards:
Code
label:
label_2:
2nd_label:


There are some opcodes that can change the control flow, effectively moving the SNES car onto another road. The so-called branch commands allow conditional control flow changes within a short area (128 bytes backwards, 127 bytes forward), and the jump commands allow control flow changes in a larger area (the entire bank for short jumps, and the entire ROM for long jumps).

To always branch, you use the BRA opcode, with the operand being the label (with no colon):
Code
	LDA $00
	BRA change_powerup	; Branch always to change_powerup label
	STA $01

change_powerup:
	STA $19


If the range of BRA is not enough, then you need to use JMP to target any address within the same bank, or JML for cross-bank targets:
Code
	LDA $00
	JMP change_powerup	; Jump to change_powerup label

	* lots of code *

change_powerup:
	STA $01
	JMP $8000		; Jump to $8000 in the current bank


When you want the SNES car to return to the main road after driving down another path, then you'll need to use the subroutine opcodes. To execute a subroutine within the same bank, you use JSR and end the subroutine with RTS. To execute a subroutine cross-bank, you use JSL and end the subroutine with RTL:
Code
	LDA $00
	JSR change_powerup	; Execute change_powerup subroutine

	LDA $01			; After execution, we return here

change_powerup:
	STA $19
	RTS			; Return from the subroutine


The main purpose of subroutines is to decrease code size (for example, instead of Nintendo writing the "Hurt Mario" subroutine every time, they used a JSL $00F5B7) and to have better organization.

If you only want the SNES car to sometimes veer off onto another road, then you need to use a conditional branch opcode (with a comparison before, if needed). To compare with A, you use CMP:
Code
	CMP #$01		; Compares A with constant value $01
	CMP $FF			; Compares A with the value of address $FF


Then, a conditional branch opcode is needed. To branch if the comparison was equal, BEQ is used:
Code
	LDA $19
	CMP #$02		; If Mario is caped (if the value of address $19 is $02),
	BEQ is_caped		; Branch to is_caped label

	RTL			; Otherwise, continue

is_caped:
	LDA #$01
	STA $19			; Make Mario big
	RTL


As you can see, if the comparison was equal, the SNES car moves to the code under is_caped. Otherwise, the car continues driving until it reaches the RTL right afterward, ending the subroutine.

To branch if the comparison was not equal, BNE is used in the same way.

To branch if A is larger than or equal to the compared value, or less than the compared value, BCS and BCC are used, respectively:
Code
	LDA $0DBF
	CMP #20			; If Mario has at least 20 coins,
	BCS enough_money	; Branch to enough_money.

	CMP #10			; If Mario has less than 10 coins,
	BCC too_low_money	; Branch to too_low_money.
	RTL

enough_money:
	LDA #$01
	STA $19			; Make Mario big
	RTL

too_low_money:
	STZ $19			; Make Mario small
	RTS


Values can be interpreted as negative or positive -- BMI branches if the value is negative (that is, $80-$FF, where $FF = -$01, $FE = -$02, and so on), and BPL branches if the value is positive (that is, $00-$7F). In this case, there are no compares required:
Code
	LDA $7B			; If Mario's X speed is positive,
	BPL moving_right	; Branch to moving_right

	LDA $7D			; If Mario's Y speed is negative,
	BMI moving_up		; Branch to moving_up

moving_right:
moving_up:
	RTS


Important things to note about branches: if you are able to optimize the branch order, then do so. For example, if I wanted to kill Mario if he had 66 coins, this:
Code
	LDA $0DBF
	CMP #66			; If Mario does not have 66 coins,
	BNE return		; Branch to return

	JSL $00F606		; Kill Mario

return:
	RTS

is better than this:
Code
	LDA $0DBF
	CMP #66			; If Mario has 66 coins,
	BEQ kill_mario		; Branch to kill_mario
	RTS

kill_mario:
	JSL $00F606		; Kill Mario
	RTS


Also, if no compare is used, then there is an "implicit" compare to $00 (dropping the compare in this case is an optimization, as well):
Code
	LDA $19			; If Mario is small,
	(implicit CMP #$00)
	BEQ small		; Branch to small

small:
	RTS




Math:

Definitions:
  • ADC - adds a value to A
  • ASL - multiples A by two
  • CLC - used before addition, will be explained later on
  • DEC - decrements A (or an address) by one
  • INC - increments A (or an address) by one
  • LSR - divides A by two
  • SBC - subtracts a value from A
  • SEC - used before subtraction, will be explained later on

The S-CPU is, of course, capable of performing basic addition, subtraction and multiplication/division by two.

To add a value, CLC must first be used, followed by ADC:
Code
	LDA $0DBF		; Load Mario's coins
	CLC
	ADC #10			; Add 10 to A


To subtract a value, SEC must first be used, followed by SBC:
Code
	LDA $0DBF		; Load Mario's coins
	SEC
	SBC #10			; Subtract 10 from A


Now, there are special opcodes for addition and subtraction by one: INC and DEC, respectively. Aside from the fact that using these is shorter and faster than using the corresponding code with CLC ADC or SEC SBC, these opcodes can also modify RAM addresses (except for long addresses):
Code
	LDA $0DBF		; Load Mario's coins
	INC			; Increment A by 1
	STA $0DBF		; Store A to Mario's coins

	DEC $0DBF		; Decrement Mario's coins by 1

	INC $7E0DBF		; This doesn't exist!


Multiplication and division by two can be achieved using ASL and LSR, respectively. Note that these opcodes can also modify RAM addresses (except for long addresses):
Code
	LDA $0DBE		; Loar Mario's lives
	ASL			; Double A
	STA $0DBE		; Store A to Mario's lives

	LSR $0DBE		; Halve mario's lives

	LSR $7E0DBE		; This doesn't exist!


If the result of any operation overflows (or underflows) what a byte can represent, then the extra information will be cut off. For example:
Code
	LDA #$80		; Load $80 into A
	CLC
	ADC #$80		; Add $80 to A

				; Instead of A being $0100, it is now $00.

	LDA #$C0		; Load $C0 into A
	ASL			; Double A

				; Instead of A being $0180, it is now $80.




Assignments:

If you want, you may PM me your results to some of these assignments. Don't post them in this thread.
  1. Write code which makes Mario caped and have twenty coins.
  2. Write code which makes Mario caped if he has more than twenty coins.
  3. Write code which sells Mario a cape for twenty coins.
  4. Write code which multiplies A by twenty.
SESSION 2:

Table of Contents:
  1. Blocks
  2. X and Y
  3. Processor Flags
  4. The Stack
  5. Hijacking
  6. Debugging
  7. Assignments



Blocks:

Blocks, in the newest version of BTSD, always follow this format:
Code
db $42

JMP MarioBelow : JMP MarioAbove : JMP MarioSide
JMP SpriteV : JMP SpriteH : JMP MarioCape : JMP MarioFireball
JMP TopCorner : JMP BodyInside : JMP HeadInside

MarioBelow:
MarioAbove:
MarioSide:

TopCorner:
BodyInside:
HeadInside:

SpriteV:
SpriteH:

MarioCape:
MarioFireball:
	RTL


The db $42 indicates that we're using the newest block version, with the added offsets. Each JMP tells the game exactly what code runs whenever Mario hits the block from below, above, or if a fireball hits the block, and so on. RTL always ends the blocks.

The first JMP tells the game what code runs when Mario hits the block from below, hence the label name "MarioBelow." The second JMP tells the game what code runs when Mario hits the block from above, leading to the label name "MarioAbove," and so on. The important thing to know is that the label name does not actually matter -- only the order of the JMPs matters!

So, for example, consider this block:
Code
db $42

JMP Main : JMP Main : JMP Main2
JMP Return : JMP Return : JMP Return : JMP Return
JMP Main : JMP Main : JMP Main

Main:
	INC $0DBF		; Increase Mario's coins	
	RTL

Main2:
	INC $0DBE		; Increase Mario's lives
Return:
	RTL


When Mario hits the block from below, the game executes the first JMP, which means that Mario's coins will be increased. When Mario hits the block from above, it executes the second JMP, also increasing Mario's coins. However, when Mario hits the block from a side, it executes the third JMP, increasing Mario's lives instead. The same logic applies to every other JMP.



X and Y:

Definitions:
  • CPX - Compares a value with X
  • CPY - Compares a value with Y
  • db - Inserts a direct byte
  • dd - Inserts a direct double word
  • DEX - Decrements X by one
  • DEY - Decrements Y by one
  • dl - Inserts a direct long
  • dw - Inserts a direct word
  • Indexing - Addressing an address with an offset of either X or Y
  • INX - Increments X by one
  • INY - Increments Y by one
  • Loop - Executing code more than once
  • LDX - Loads a value into X
  • LDY - Loads a value into Y
  • STX - Stores X to an address
  • STY - Stores Y to an address
  • TAX - Transfers A to X
  • TAY - Transfers A to Y
  • TXA - Transfers X to A
  • TXY - Transfers X to Y
  • TYA - Transfers Y to A
  • TYX - Transfers Y to X

X and Y are two more registers that the S-CPU has -- they're like less functional accumulators in some case (none of these opcodes can use long addressing, and addition/subtraction can't directly be done on X and Y), but they are able to do things that the accumulator cannot.

Loading and storing values with X and Y is the same as loading and storing values with A. LDX and LDY load a value into X and Y, respectively, and STX and STY store X and Y into an address, respectively:
Code
	LDX #$01		; Load $01 into X
	STY $19			; Store Y into $19


You can also compare values with X and Y, like the accumulator, with CPX and CPY, respectively:
Code
	CPX #$01		; If X is $01,
	BEQ label		; Branch to okay

	CPY $19			; If Y is the value of address $19,
	BEQ label		; Branch to okay

label:


The X and Y registers can also be incremented and decremented by one using INX, INY, DEX, and DEY:
Code
	DEX			; Decrement X by one
	INY : INY		; Increment Y by two
	DEY : DEY : DEY : DEY	; Decrement Y by four


However, math is not directly possible on the X and Y registers, and you might see that repeating the decrementing or incrementing opcodes is highly inefficient. We can use some of the transfer opcodes (TAX, TAY, TXA, TXY, TYA, and TYX) to achieve our goal:
Code
	TXA			; Transfer X to A
	CLC
	ADC #20			; Add 20 to A
	TAX			; Transfer A back to X


With all of the transfer opcodes, the register denoted by the second letter is transferred to the register denoted by the third letter (TAX => Transfer A to X). Also, the first register's value is not modified:
Code
	LDX #$20		; Load $20 into X
	TXA			; Transfer X to A
				
				; X is still $20 here, but A is also $20


So, what exactly is the point of these registers? First of all, Y is used as the high byte of the acts-like setting in custom blocks. Since $1693 is the RAM address for the low byte of the acts-like setting, we can use this code to make our block solid (i.e., act like a cement block, which is $0130):
Code
	LDY #$01		; Set high byte to $01
	LDA #$30
	STA $1693		; Set low byte to $30, so our block acts like $0130 in total


Second, the X and Y registers can be used for indexing, and they are called the index registers for this reason. Indexing is where the values of X and Y are added to the address that you use. To better explain, here is an example:
Code
	LDX #$02		; Load $02 into X

	LDA $157C,x		; LDA $157C+x = LDA $157C+$02 = LDA $157E


Most of the opcodes involving A can be indexed by X and Y, although addressing coverage varies (for example, LDA $7E0000,x exists, but LDA $7E0000,y doesn't). Some of the opcodes involving X can be indexed by Y, and vice versa -- simply try and see what does and doesn't exist.

The main use for indexing is in tables or loops. Tables contain direct bytes, and we load different bytes from them using indexing by X or Y. To specify direct bytes, we use db, as in this example:
Code
table:
	db $1A,$2B,$3C


Bigger values can be inserted using dw, dl, and dd, which are two bytes, three bytes, and four bytes, respectively:
Code
table:
	dw $0100,$0300,$0F00
	dl $017AB0,$B103A8,$FFEE11


Here is an example of the use of tables -- suppose I wanted to give Mario coins depending on what powerup he has, and then clear his powerup. Without tables, one would have to do this:
Code
	LDA $19			; If Mario is small,
	BEQ small		; Branch to small
	CMP #$01		; If Mario is big,
	BEQ big			; Branch to big
	
	LDA #$20
	CLC
	ADC $13CC
	STA $13CC		; Give $20 coins here (Mario is caped or fiery)
	RTL

big:
	LDA #$10
	CLC
	ADC
	STA $13CC		; Give $10 coins
	RTL

small:
	RTL			; Give no coins


However, tables make this task easy:
Code
	LDX $19			; Load Mario's powerup into X
	LDA table,x
	CLC
	ADC $13CC
	STA $13CC		; Set coins to give based on Mario's powerup
	STZ $19			; Clear Mario's powerup
	RTL

table:
	db $00,$10,$20,$20


If Mario has no powerup ($19 = $00), then LDA table,x would be equivalent to LDA table+$00 = LDA #$00. If Mario is big ($19 = $01), then LDA table,x would be equivalent to LDA table+$01 = LDA #$10, and so on.

The other main use of the index registers is looping (executing code more than once), especially when combined with indexing. Take this example:
Code
	LDA #$20
	LDX #$00
loop:
	STA $00,x		; Store A into address $00+x
	INX			; Increment X by one
	CPX #$20		; If X is not $20,
	BNE loop		; Branch to loop


The red code is executed exactly $20 times -- why? Each time, at the end of the loop, it increments X. If X hasn't reached $20, then it'll keep running that code. The overall effect of this code is storing zero to addresses $00-$1F, in effect, replicating this code:
Code
	LDA #$20
	STA $00			; Store A into $00
	STA $01			; Store A into $01
	STA $02			; ...
	
	* ... *

	STA $1F			; Store A into $1F


As you can see, looping is a major space optimization. However, it decreases speed -- "unrolling" the loop would mean going from the first code to the second code, increasing speed but severely increasing space used. This is the drawback of using a loop.



Processor Flags:

Definitions:
  • ADC - Add with carry
  • BVC - Branch if overflow set
  • BVS - Branch if overflow clear
  • c - Carry flag
  • CLC - Clears the carry flag
  • CLV - Clears the overflow flag
  • d - Decimal flag
  • e - Emulation flag
  • i - Disable-IRQ (or interrupt) flag
  • n - Negative flag
  • m - Accumulator width flag
  • REP - Resets processor flags
  • ROL - Rotate bits left with carry
  • ROR - Rotate bits right with carry
  • SBC - Subtract with borrow (opposite of carry)
  • SEC - Sets the carry flag
  • SEP - Sets processor flags
  • v - Overflow flag
  • x - Index width flag
  • XCE - Exchanges carry flag with emulation flag
  • z - Zero flag

The SNES has another register called the processor flags register. They're a group of flags which are altered by most opcodes, and also influence what some opcodes do. The register can be represented in this form: e nvmxdizc, where each letter is an individual bit. Here are what each of the letters represent:
  • e - Emulation flag. If this flag is set, then the S-CPU will act like its predecessor, the 6502. It is very rarely useful, but it important to note that when the SNES first starts up, it will be in emulation mode. One can use XCE to exchange this flag with the carry flag, which will be discussed a bit later.
  • n - Negative flag. If any operation results in a negative number ($80-$FF), then this flag is set. Otherwise, it's cleared. For example:
    Code
    	LDA #$80		; Sets n flag
    	
    	LDA #$00		; Clears n flag
    	CLC
    	ADC #$90		; Sets n flag, since A is now $90, which is negative
    
    	LDX #$FF		; Sets n flag
    
  • v - Overflow flag. If a non-decrement/increment arithmetic with both numbers being of the same sign results in a new number of a different sign, then this flag will be set. Otherwise, it's cleared. For example:
    Code
    	LDA #$7F
    	CLC
    	ADC #$20		; Sets v flag, since A went from $7F to $9F, which is negative
    
    	CLV			; Clears v flag
    
    	LDA #$20
    	CLC
    	ADC #$10		; Clears v flag
    
    	LDA #$7F
    	INC			; Does not set v flag
    


    CLV can be used to clear the v flag manually.
  • m - Accumulator width flag. If it's set, then A will be 8-bit. Otherwise, it will be 16-bit. If A is 16-bit, then the requirement for being negative is now $8000-$FFFF, and all operations involving A will be 16-bit. For example:
    Code
    	LDA #$2000
    	STA $00
    				; $00 = $00, $01 = $20
    


    Note that the order is reversed because of the little-endian order! Little-endian is where the most significant bytes are stored at the end (i.e., $2000 is stored as 00 20).

    Also, be careful to use opcodes with the right number of bytes. This may cause crashes if you do not do this:
    Code
    				; A is 16-bit
    	LDA #$20
    	STA $00
    

  • x - Index width flag. If it's set, then X/Y will be one byte. Otherwise, it will be two bytes. This acts roughly the same way as the m flag.
  • d - Decimal flag. If it's set, then arithmetic operations will become decimal-like. This should never be used since calculations can be done entirely in hex. To set and clear decimal mode, SED and CLD are used, respectively.
  • i - IRQ (interrupt) disable flag. If it's set, then it'll disable IRQs -- not important for now.
  • z - Zero flag. If an operation results in a zero, then this flag will be set. Otherwise, it's cleared. For example:
    Code
    	LDA #$00		; Sets z flag
    	LDA #$01		; Clears z flag
    	DEC			; Sets z flag, since A is now $00
    
  • c - Carry flag. If addition results in a byte overflow, then this flag will be set. Otherwise, it's cleared. If subtraction results in a byte underflow, then this flag will be cleared. Otherwise, it's set. For example:
    Code
    	LDA #$FE
    	CLC
    	ADC #$20		; Sets c flag
    	CLC
    	ADC #$10		; Clears c flag, since the addition did not byte overflow
    
    	LDA #$20
    	SEC
    	SBC #$30		; Clears c flag
    	SEC
    	SBC #$10		; Sets c flag, since the subtraction did not byte underflow
    


    CLC and SEC can be used to clear or set the carry flag, respectively.

    ADC and SBC actually add or subtract depending on the carry flag. ADC adds the carry flag as well as the operand, and SBC subtracts the opposite of the carry flag as well as the operand:
    Code
    	SEC
    	ADC #$20		; Add #$20 + 1 to A
    
    	CLC
    	SBC #$20		; Subtract (#$20 + 1) from A
    


    Using this fact, one can simulate 16-bit addition or subtraction. 16-bit addition is simulated this way:
    Code
    	LDA $94
    	CLC
    	ADC #$20
    	STA $94
    	LDA $95
    	ADC #$00
    	STA $95
    


    If the low byte addition overflows, then this sets the carry flag. Thus, the ADC #$00 afterwards will actually add an extra one, leading to the correct 16-bit value. If the low byte addition doesn't overflow, then the carry flag is cleared, meaning the ADC #$00 afterwards just does nothing. The same sort of logic applies to the simulation of 16-bit subtraction.

    ASL and LSR modify the carry bit, as well. They actually shift the bits (e.g., #%00110001 => #%00011000). The seventh bit of the operand is shifted into the carry bit with ASL, and the zeroth bit of the operand is shifted into the carry bit with LSR:
    Code
    	LDA #$80
    	ASL			; Sets c flag
    
    	LDA #$01
    	LSR			; Sets c flag
    


    ROL and ROR are full bit shifts -- they shift carry into the zeroth and seventh bit, respectively:
    Code
    	LDA #$22
    	SEC			; (#%00100010 1)
    	ROL			; (#%01000101 0)
    


    where the last number is the carry flag.

Now, how does one exactly modify, for example, the m or x flags? REP resets all of the processor flags corresponding to each bit in the operand. That is, consider this:
Code
				; Processor flags: xxxxxxxx
	REP #$30		; REP #%00110000
				; Processor flags: xx00xxxx


SEP sets all of the processor flags corresponding to each bit in the operand:
Code
				; Processor flags: xxxxxxxx
	SEP #$30		; REP $%00110000
				; Processor flags: xx11xxxx


The CMP, CPX, and CPY opcodes actually do subtraction, modifying the processor flags but not the actual registers' values. The branch opcodes actually branch based on processor flags. BNE branches if the z flag is cleared, BEQ branches if the z flag is set, BCC branches if the carry flag is cleared, BCS branches if the carry flag is set, BPL branches if the n flag is cleared, and BMI branches if the n flag is set. There are two extra opcodes, BVC and BVS, which branch if the v flag is cleared or set, respectively:
Code
	LDA #$20
	CMP #$1F		; $20-$1F => set c flag
	BCS greater_than	; Since carry is set, branch to greater_than

greater_than:

	LDA $19			; Clear/set z flag depending on $19
	BEQ zero		; Branch to zero if z flag is set

zero:




The Stack:

Definitions:
  • Data bank - The bank used in absolute addressing ($xxxx)
  • PEA - Pushes the 16-bit operand onto the stack
  • PEI - Pushes the value of the address onto the stack
  • PHA - Pushes A onto the stack
  • PHB - Pushes the data bank onto the stack
  • PHK - Pushes the program bank onto the stack
  • PHP - Pushes processor flags onto the stack
  • PHX - Pushes X onto the stack
  • PHY - Pushes Y onto the stack
  • PLA - Pulls from the stack into A
  • PLB - Pulls from the stack into the data bank
  • PLP - Pulls from the stack into the processor flags
  • PLX - Pulls from the stack into X
  • PLY - Pulls from the stack into Y
  • Program bank - The bank that the current code is in
  • Program counter - The lower bytes of the location of the current code
  • Stack pointer - Where the next pushed value onto the stack will go
  • Stack relative - Addressing a part of the stack via the stack pointer

The stack is a sort of temporary storage solution for the S-CPU. We can use a stack of papers on a secretary's desk as an analogy. You can only (easily) add papers onto the top of the stack, which is called "pushing." You can only (easily) take papers off from the top of the stack, which is called "pulling."

To push A, X, Y, or the processor flags onto the stack, PHA, PHX, PHY, and PHP are used. To pull from the stack into A, X, Y, or the processor flags, PLA, PLX, PLY, and PLP are used. When a register is pushed onto the stack, its value is not changed:
Code
	LDA #$20
	PHA			; Pushed A, topmost value is now $20, A is still $20

	PLX			; Pulled topmost value into X, X is now $20.


These opcodes push two bytes if their corresponding registers are 16-bit.

The most important use of the stack is short-term preservation of the registers. As you may recall, the Y register is used as the high byte of the acts-like setting in custom blocks. This means that modifying the register is quite hazardous; indeed, many blocks make this mistake and act like, for example, tile 125 instead of tile 25 in some cases.
Code
	PHY			; Push Y -- now, Y can be freely changed
	LDY #$20
	LDY #$10
	DEY
	TAY
	TXY
	PLY			; Pull Y, now the acts-like setting is the same


The number of bytes pushed and the number of bytes pulled must match. Attempting to pull too many bytes will cause the stack to run out of paper, leading to bad behavior as there is no more stack to pull from. Likewise, attempting to push too many bytes will cause the stack of paper to overflow, causing more bad behavior:
Code
	PHA
	PHX
	PHA
	PHP
	RTL			; This is bad!

	PLA
	PLX
	RTL			; This is also bad!


The stack is also important for absolute addressing. Absolute addressing uses the so-called data bank register (DBR) in order to generate the full long address to use. The program bank register (PBR) is the current bank that the code is in, and the program counter (PC) is the lower bytes of the location that the code is in. It is not required that DBR = PBR. One uses PHB and PLB to push or pull DBR. The PBR can be pushed via PHK, but there is no meaningful "PLK." Thus, to correctly use absolute addressing, one needs to have DBR = PBR:
Code
	PHK			; Push program bank
	PLB			; Pull into data bank


However, one should also preserve the data bank (otherwise, there could be some odd glitches):
Code
	PHB			; Preserve data bank
	PHK			; Push program bank
	PLB			; Pull into data bank

	* code *

	PLB			; Restore previous data bank


You'll see the above formation of code a lot in sprites and if you dig around in quite a few patches.

Suppose you wanted to quickly push a 16-bit value onto the stack. PEA is ideal for this. The syntax is strange, though -- it appears that it would push the value of a 16-bit address, but it does not:
Code
	PEA $7F7F		; Pushes the constant $7F7F onto the stack


If you wanted to push the 16-bit value of a direct page address onto the stack, then you'd use PEI:
Code
	REP #$20
	LDA #$1337
	STA $00			; Set $00 to $1337.
	SEP #$20

	PEI ($00)		; This will push the value of $00, which is $1337, onto the stack.


Now, for how the stack works. The stack is actually stored in WRAM, and there is a stack pointer register (SP) which points to the where the next pushed value will go. If SP = $01FE, then pushing A will push A onto $01FE. After each push, the SP is decreased by how many bytes were pushed. After each pull, the SP is increased by how many bytes were pushed:
Code
				; SP = $01FF, A = 8-bit
	PHA			; SP = $01FE
	PHB			; SP = $01FD
	PLA			; SP = $01FE
	PLA			; SP = $01FF


With this comes a new addressing mode, stack relative addressing. It's essentially the same as indexing with X and Y, except ,s is used instead:
Code
	LDA #$20		; SP = $01FF
	PHA			; SP = $01FE

	LDA $01,s		; LDA $01+s = LDA $01+$01FE = LDA $01FF = previous pushed value
	PLA			; SP = $01FF

	LDA $00,s		; LDA $00+s = LDA $00+$01FF = LDA $01FF = what was just pulled


LDA $00,s should never be used -- it can be highly unpredictable, but the reason for it being so will be explained much later.



Hijacking:

Definitions:
  • Autoclean - Removes RATS tags inserted by freecode/freedata
  • Freecode - Automatically finds freespace suitable for code and inserts a RATS tag, as well
  • Freedata - Automatically finds freespace suitable for data and inserts a RATS tag, as well
  • Freespace - Parts of the ROM which are not used at all
  • Hijack - A modification of the ROM to make it execute custom code
  • Org - Changes the current SNES address in the assembler
  • NOP - Does nothing, wastes a byte
  • RATS tag - Protects an area of the ROM

Hijacking is pretty much essential if you want to make patches. It's basically where parts of the (SMW) ROM are changed to execute custom code instead of the normal SMW code. To begin the process of hijacking, you use the org assembler directive, which changes the assembler's current SNES address:
Code
org $00F606			; SNES address for the assembler is now $00F606

org $008000			; But now, it's $008000


Any ASM after an org will overwrite the stuff at the current SNES address -- LDA $1337 would overwrite the three bytes at $008000 if appended to the bottom of the code above. This is why knowing byte counts is so important -- one needs to know exactly how much to overwrite, and how much the code inserted overwrites.

To make hijacking actually possible, we will use all.log, which can be found here. This is a disassembly of a clean SMW ROM; essentially, it's the source code for the game. Now, suppose we want to make a patch which resets Mario's coins and score when he dies. You may know that $00F606 is the address for the "kill Mario" subroutine. However, you may not know that when Mario dies from falling into a pit, the game jumps to $00F60A. So, search for "00F60A" in the document -- you'll find this:
Code
CODE_00F60A:        A9 09         LDA.B #$09                ; \ 
CODE_00F60C:        8D FB 1D      STA.W $1DFB               ; / Change music 


With org $00F60A, you'll start overwriting the code here. The longer the code, the more that's overwritten. Thus, the solution is to use freespace -- you'll overwrite unused ROM data with your code instead of ROM data that is actually used:
Code
org $00F60A
	autoclean JSL main	; Automatically clean the freespace that our patch uses

freecode			; Search for freespace
main:
	RTL


The JSL is still overwriting stuff, though! How big is a JSL? The opcode takes up one byte, and the operand takes up three bytes, as long addresses are three bytes long. Thus, our JSL takes four bytes. The JSL partially overwrites the two instructions that were already there:
Code
CODE_00F60A:        22 XX XX XX   JSL $XXXXXX
CODE_00F60E:        1D            ORA $XXXX,x


As you can see, we have problems. The S-CPU will interpret the four bytes as the JSL, but then it'll interpret the next byte as an ORA absolute,x. So, what can we do about this? We can overwrite that extra byte with a NOP, which literally does nothing -- it only takes up space:
Code
CODE_00F60A:        22 XX XX XX   JSL $XXXXXX
CODE_00F60E:        EA            NOP


That portion is now much less hazardous, and we are now free to write the code which clears Mario's coins and score:
Code
org $00F60A
	autoclean JSL main
	NOP

freecode
main:
	STZ $0DBF		; Zero Mario's coins
	STZ $0F34
	STZ $0F35
	STZ $0F36		; Zero Mario's score
	RTL


But wait! We're still not correct! The code that we overwrote definitely did something -- it set the death music. But now, we're not doing that, so if we still want that function, we have to replicate it, which we can usually do by just sticking the original code at the end of our code:
Code
org $00F60A
	autoclean JSL main
	NOP

freecode
main:
	STZ $0DBF		; Zero Mario's coins
	STZ $0F34
	STZ $0F35
	STZ $0F36		; Zero Mario's score

	LDA #$09
	STA $1DFB		; Replicate the overwritten code!
	RTL


The process is similar for other types of patches. If you decide to hijack a branch, then it will be a little bit tricky to restore its effects -- you'll have to resort to JMLs instead of using JSL. If you hijack a comparison, then the comparison has to be at the end of your custom code.



Debugging:

Definitions:
  • Breakpoint - An address which triggers the debugger when it is executed, read from, or written to

Programmers run into lots of problems when they code. Assembly programmers run into even more, as it is a very low-level language -- that's why we have debuggers. For the purposes of this summary, we'll be using this debugger.

You load a ROM just like any ordinary version of snes9x. When you first load a ROM, the debug console will pop up:


Start the ROM by clicking "Run." To see the hex data, click "Hex Data," and a window like this will show:


There are several options for the memory type, but we will only be concerned about RAM. If you switch to show RAM and play through SMW, you'll see that the values update in real time. You can set the range from 7E0019 to 7E0019 to see Mario's powerup in real time:


You can also freeze values by selecting the RAM region you want to freeze and clicking the "Freeze" button:


To undo this, simply click "Unfreeze."

Finally, you can modify the RAM addresses by simply typing in values into.

One of the other important features of the debugger is stepping through opcodes/instructions one by one. To do this effectively, we're going to need breakpoints. Breakpoints are addresses that stop (normal) execution of the game when they are either executed, written to, or loaded from, allowing the debugger to step through the individual opcodes.

To add a breakpoint, click the "Breakpoints" button on the debug console. This window should pop up:


Add an entry for 7E0DBF, and check the "Write" checkbox. Then, in the game, collect a coin. Execution should immediately halt, and some text will be printed on the debug console:
Code
$00/8F25 EE BF 0D    INC $0DBF  [$00:0DBF]   A:FF01 X:0003 Y:0018 P:envMXdiZc

$00/8F25 is the SNES address of the current code. The hex values afterwards are the hex values for the instruction. The actual instruction is printed afterwards. The number after A is the value of A, the number after X is the value of X, the number after Y is the value of Y, and the letters after P indicate the processor flag status. If the letter is capitalized, that means the corresponding processor flag is set; otherwise, it's cleared.

This text is printed before the instruction is executed:
Code
$00/9045 A2 00       LDX #$00                A:FF05 X:0003 Y:0018 P:envMXdizc

As you can see, X hasn't been updated to be $0000 quite yet.

Anyway, to step through instructions one-by-one, click the "Step Into" button. If you're stepping through instructions and you come across a JSR or a JSL and you want to skip it, then click the "Step Over" button. If you want to get out of the current subroutine, then click the "Step Out" button. If you want to skip the current opcode, then click the "Skip Op" button -- this can have a lot of side effects. If you want more information, then you can uncheck "Squelch" under the "CPU Trace Options" menu. The new output will be in this form:
Code
$00/8F25 EE BF 0D    INC $0DBF  [$00:0DBF]   A:FF01 X:0003 Y:0018 D:0000 DB:00 S:01F9 P:envMXdiZc HC:1024 VC:022 FC:40 I:00

Chances are, though, that you won't know anything more except for the value after DB and S, which are the DBR and SP, respectively.

You can also dump RAM addresses using the "Dump RAM" button on the debug console. You can edit the values of the registers using the "Edit Registers" button.

One more important feature is the S-CPU tracing feature, meaning that all of the instructions that the S-CPU executes will be logged to a file, specifically (ROM name)#####.log in the Logs folder. To begin logging, just click the "CPU" checkbox under the "Logging" panel in the debug console.



Assignments:

If you want, you may PM me your results to some of these assignments. Don't post them in this thread.
  1. Write code which gives Mario an appropriate number of lives depending on his powerup and then remove his powerup.
  2. Write code which moves every sprite to the left one pixel in the X direction.
  3. Write a patch which hijacks the hurt routine, making it give Mario coins instead.
  4. Find out where the code is that gives Mario a 1-up.
SESSION 3:

Table of Contents:
  1. LoROM Memory Mapping
  2. Pointers
  3. Block Moves
  4. Bitwise Operations
  5. SNES Hardware Registers
  6. Assignments



LoROM Memory Mapping:

Definitions:
  • Memory mapping - How SNES addresses are mapped to memory (e.g., RAM, ROM, SRAM, hardware register, or nothing at all).

Memory mapping basically dictates what SNES addresses "are" -- is $004000 RAM? Is it ROM? Is it SRAM? Or is it actually nothing at all? The SNES has several memory mapping formats, but the one we're going to discuss is LoROM, which is what SMW uses.

In LoROM, ROM data are mapped to $8000-$FFFF of banks $00-$7D and banks $80-$FF, with the latter half mirroring the first half. Data in the latter half can actually be accessed more quickly than the data in the first half, but only if a specific bit in the SNES is set -- this is called FastROM, and you are probably using it right now.

WRAM is mapped to $0000-$FFFF of banks $7E and $7F. $7E0000-$7E1FFF are actually mirrored in banks $00-$3F (and banks $80-$BF); this is why you can, in most cases, omit the 7E when referencing data within that range. This is also why you should always try to insert code within those regions via freecode in Asar. To sum it up, these operations are equivalent:
Code
	LDA $7E0100
	LDA $000100
	LDA $800100

freecode
	LDA $0100


SRAM is mapped to $0000-$7FFF of banks $70-$7D. Not all of this is actually unique SRAM -- if you're using 32KB of SRAM, $710000-$717FFF will be a mirror of $700000-$707FFF.

The SNES's hardware registers (we'll discuss them later on in this session) are mapped to $2100-$21FF of banks $00-$3F and banks $80-$BF. $4000-$40FF are another set of hardware registers, as are $4200-$42FF and $4300-$43FF (these control (H)DMA, which we'll discuss in the next session).

The remaining addresses are either unmapped (nothing meaningful corresponds to them) or used in expansion chips (MSU-1, for example, uses $2000-$20FF).



Pointers:

Definitions:
  • Pointer - A reference to another object via its memory address.

A pointer is essentially a reference to another object via its memory address. For example, if there are layer 1 data at SNES address $408000, then the value $408000 is a pointer to layer 1 data.

The SNES has two types (or rather, two sizes) of pointers: 16-bit and 24-bit. 16-bit pointers reference data in the same data bank, but 24-bit pointers specify the bank. To reference the values of RAM addresses as if it were a pointer, () and [] are used. ($00) is the 16-bit pointer formed from $01 and $00, and [$00] is the 24-bit pointer formed from $02, $01, and $00. The RAM address used for the pointer can be indexed with X, and the pointer's address can be indexed with Y:
Code
	LDA ($00)
	LDA [$00]
	
	LDA ($00,x)
	LDA [$00],y


Suppose that the values of $00, $01, and $02 are $00, $20, and $7E, respectively. The first two instructions translate into:
Code
	LDA $2000
	LDA $7E2000


So what exactly is the point in using pointers? For one, you need them if you want to reference "variable addresses," or addresses that can change at run-time. For example, consider SMW's LC-LZ2 decompression algorithm. Obviously, it can't know what is has to decompress at assembly time, so this means that it must use pointers.

You can also use them to control sprite states. Without pointers, you'd have to do something like this:
Code
	LDA $C2,x
	BEQ .state_0
	CMP #$01
	BEQ .state_1
	CMP #$02
	BEQ .state_2
	CMP #$03
	BEQ .state_3


With pointers, though, you could do this:
Code
	LDA $C2,x
	ASL				; Double the state index, because 16-bit pointers are two bytes each!
	TAX				; Restore the sprite index later on
	JMP (ptrs,x)

ptrs:	dw .state_0
	dw .state_1
	dw .state_2
	dw .state_3


24-bit pointers can be used similarly, but you will have to multiply the index by three instead of two because the pointers now take three bytes to represent instead of two.



Block Moves:

Definitions:
  • MVN - Transfers data.
  • MVP - Transfers data.

Transferring data is a slow process with loops. Block moves are designed for mass-transfers, and are up to twice as fast as naive loop.

To use them, A, X, and Y must be 16-bit. A must contain the number of bytes to transfer, minus one, X must contain the lower two bytes of the source address, and Y must contain the lower two bytes of the destination address. Then, MVN/MVP $source bank,$destination bank is used.

MVN and MVP transfer data in a different direction. Basically, MVN works by first transferring a byte from X to Y, then increasing X and Y, then transferring a byte from X to Y, and so on. MVP works by first transferring a byte from X to Y, then decreasing X and Y, and so on:
Code
	LDA #$0001
	LDX #$0000
	LDY #$0000
	MVN $7E,$7F		; $7E0000 -> $7F0000
				; $7E0001 -> $7F0001

	LDA #$0001
	LDX #$0001
	LDY #$0001
	MVP $7E,$7F		; $7E0001 -> $7F0001
				; $7E0000 -> $7F0000


The reason why there are two opcodes is because of transferring from one region to another region with overlaps. Suppose I'm writing some code which keeps track of Mario's X positions. Each frame, the X positions will be shifted over by two bytes (up to a limit, of course), and then the current X position will be written. Now suppose I'm using $7F0000-$7F0100. The code could look like this:
Code
	REP #$30
	LDA #$00FD
	LDX #$0000
	LDY #$0002
	MVN $7E,$7E

	LDA $94
	STA $7F0000
	SEP #$30


It's supposed to transfer $00FE bytes from $7F0000 to $7F0002. However, there is a major flaw. $7F0000 will transfer to $7F0002, but this destroys the original value of $7F0002. This, you would need to use MVP in this case:
Code
	REP #$30
	LDA #$00FD
	LDX #$00FE
	LDY #$0100
	MVP $7E,$7E

	LDA $94
	STA $7F0000
	SEP #$30


This transfers $7F00FE to $7F0100, then $7F00FD to $7F00FF, and so on, which works correctly.

- TO BE FINISHED LATER -
Reserved for fourth session summary.
Reserved for future fifth session summary.
Pages: « 1 »
Forum Index - SMW Hacking - SMW Hacking Help - Tutorials - Old Tutorials - 2013 ASM Workshop Summaries

The purpose of this site is not to distribute copyrighted material, but to honor one of our favourite games.

Copyright © 2005 - 2019 - SMW Central
Legal Information - Privacy Policy - Link To Us


Total queries: 7

Menu

Follow Us On

  • YouTube
  • Twitch
  • Twitter

Affiliates

  • Talkhaus
  • SMBX Community
  • GTx0
  • Super Luigi Bros
  • ROMhacking.net
  • MFGG
  • Gaming Reinvented