Banner
Views: 98,301,631
Time: 2010-09-08 10:42:18 PM
Users: 12,262 (1,889 active)
Latest: IRecordYourHacks
Tip: Don't overwrite the status bar colors with the background's palette.
Find 40th Fibonacci number (SNES)
Forum Index - Creation - Other ROM Hacking - Find 40th Fibonacci number (SNES)
Pages: « 1 »
(30 August 2009) Status of this ROM hack...

* wiki page for Find 40th Fibonacci Number
* 65816 version for SNES: complete at 6 June 2009.
* SPC-700 version for SNES: complete at 30 August 2009.



(23 February 2009) I want to learn ROM hacking and to become a ROM hacker, so that I can hack ROM images.

I wanted a simple idea for my first ROM hack for the Super NES. I would use ASM to cause the ROM to perform a simple calculation. Then I would need to write the answer to save-RAM, and extract the answer from the save-RAM file. This is simpler than learning graphics to draw the answer on the screen, and probably easier than fitting a custom block into Super Mario World. So I created an idea and designed a program:

Calculation: Find the 40th number in the Fibonacci sequence (1, 1, 2, 3, 5, 8, 13, 21, ...).
Answer: Write to save-RAM, an ASCII text file inside a cpio archive.
Extraction: Read the cpio archive in the save-RAM image with the existing Unix command pax(1).

I chose the cpio archive format, so that I can use pax(1) instead of writing a new extraction program. I am new to 65816 assembly language, so I needed to try my design in a different language so that I can learn the cpio archive format.

So I planned three versions of the program:
1. in C language, for OpenBSD or other Unix systems. Finished, with one source file: find40th.c
2. in PowerPC assembly language, for OpenBSD systems. Finished, with four source files: cpio-powerpc.s, find40th-powerpc.s, format-powerpc.s, system-powerpc.i
-----> and README.powerpc (655 lines) to explain PowerPC assembly language
3. in 65816 asssembly language, for Super NES emulators. (6 June 2009) Complete and working, with four source files: cpio-65816.s, find40th-65816.s, format-65816.s, system-65816.i (28 August 2009: I changed the links from HTTPS to HTTP.)

The first two versions use a memory-mapped file in place of save-RAM.



(Screenshot of February 24: At the Unix command line, I run the PowerPC assembly version of the program and extracted the result. The 40th Fibonacci number is 102334155.)

I wrote in C language, to learn cpio archive format and to verify my design. Then I ported the program to PowerPC assembly language, because I already knew some PowerPC instructions and I already had the correct assembler and tools. To port this program, I needed to learn new instructions, especially for comparisons and conditional branches.

The calculation to find the 40th Fibonacci number was simple for C language:

Code
int_fast32_t a, b; a = 1; /* the first two numbers of the sequence */ b = 1; /* * Compute two more numbers per iteration. * * An optimizing compiler might run this loop at compile * time, and store the answer within the object file. */ for (count = 4; count <= 40; count += 2) { a = a + b; b = a + b; }


The same calculation was moderately simple for PowerPC assembly language:

Code
## the first two numbers in the sequence li %r29, 1 li %r30, 1 ## compute two more numbers per iteration li %r0, (40 - 2) / 2 mtctr %r0 # number of loop iterations 5: add %r29, %r29, %r30 add %r30, %r29, %r30 bdnz 5b # loop ctr times


PowerPC has 32 registers (%r0 through %r31); programs can use %r14 through %r31 for local variables. When this loop finishes, %r29 is the 39th Fibonacci number and %r39 is the 40th Fibonacci number. Addition is easy; add %target, %op1, %op2 does %target = %op1 + %op2 with 32-bit arithmetic.

The same calculation would be more difficult for 65816 assembly language. It seems that the 65816 does not have so many registers, and has 16-bit addition but not 32-bit addition. The answer would overflow 16 bits. I might need to put variables in the direct page, instead of registers; and use the carry bit with addition. It seems well that I wrote the program in PowerPC assembly language before I would try 65816 assembly language.

To write the answer, I need to convert an integer to ASCII decimal. Also, the cpio archive headers have some fields in ASCII hexadecimal. My C program uses snprintf(3) for the conversion. My PowerPC assembly program uses my custom format routine for the conversion. The routine uses a loop that needs to divide by 10 and take the remainder. I use a formula that requires multiplication, subtraction, and right shift.

Code
## Formula (suggested by gcc) to divide by 10, is to ## 0xcccccccd ## multiply by ----------- = 0.100000000005821 ## 0x800000000 ## which is to multiply by 0xcccccccd, then shift right 35. ## ... mulhwu %r9, %r5, %r4 # %r9 <= ((%r5 * %r4) >> 32) ... srwi %r9, %r9, 3 # %r9 <= (%r9 >> 3) mulli %r6, %r9, 10 sub %r6, %r5, %r6 # %r6 digit <= remainder


I plan to port this routine to 65816 assembly language, but I understand that the 65816 cannot multiply. Some time ago, I read some documents about the Super NES, so I remember various options that I might have:
1. to use the hardware registers for multiplication in the Super NES. I would have to write the factors, then wait, then read the product; but I know not how to wait.
2. to use the multiply instruction of the SPC700. This would be very difficult. I would have to learn SPC700 assembly language, upload a program to the SPC700, and use hardware registers to communicate between the 65816 and the SPC700; but I would use the registers to indicate when the product becomes available.
3. to write some custom code for multiplication. I might be able to code multiplication with repeated addition and left shift. I would need code to multiply by 0xcccccccd and code to multiply by 10. I predict that I will choose this option.
4. The 65816 has a decimal flag, but I do not know what this flag does or whether it might help me. (February 24: I learned that the decimal flag affects ADC and SBC, and treats hexadecimal digits as decimal digits, so $1234 would be 1234, and $08 + $02 would be $10 instead of $0a. To divide by ten, I would shift right by four. There is no conversion between decimal and binary, so I would have to compute the Fibonacci sequence in decimal, using 48-bit arithmetic. I predict that I will choose this option.)

I wonder how games like Super Mario World do display the score in decimal.

Meanwhile, I need an assembler. I used GNU as (from GNU binutils) for PowerPC, but I need something for 65816. I read a rumour that XKAS cannot relocate sections or resolve undefined symbols, so I might use WLA.
Last edited on 2009-08-30 04:42:12 PM by Kernigh.
Using the SPC700 as a math coprocessor is an interesting idea, but I think it'd be much slower than the multiply registers. The typical way to multiply is bit shifting and addition. e.g. to multiply by 10:
y = x << 3 ;multiply by 8
x = x << 1 ;multiply by 2
x = x + y

So something like:
LDA $00
ASL
ASL
ASL
STA $01
LDA $00
ASL
STA $00
ADC $01
You do need to watch for overflow though.
Last edited on 2009-02-26 12:27:18 AM by HyperHacker.
Quote
1. to use the hardware registers for multiplication in the Super NES. I would have to write the factors, then wait, then read the product; but I know not how to wait.


just use NOPs to burn the time if you don't have something to do in between. the worst case scenarios are 8 cycles for mult and 16 for div. 3 NOPs and one LDA $4xxx would mean 9 cycles would have elapsed and the product is ready to be read.

also SMW is fully disassembled and all.log can be downloaded in the documents section.

Quote
To write the answer, I need to convert an integer to ASCII decimal


unless you want to save it to SRAM and read the .srm the emulator outputs you just put in a tileset with numbers in vram and upload the appropriate tile indexes to the tilemap. bunch of steps you have to do to get a working display, it may just be convenient to look at the .srm file if you don't know how it already works.

also I am not familiar with PPC although it's leagues ahead of any 65xx processor and you'll definitely spend alot more time working a solution for it.
Originally posted by Vanilla Lake 2
also SMW is fully disassembled and all.log can be downloaded in the documents section.

(offtopic) I dunno why but all.log is gone from the documents section. I don't know WHO removed it or WHY. It was rude not to inform us about the removal or I overlooked something. Whatever. Any clarification would be appreciated.

Meanwhile I put up all.log here:
http://ersan00b.googlepages.com/disassembly.rar
Originally posted by HyperHacker
The typical way to multiply is bit shifting and addition. e.g. to multiply by 10:
y = x << 3 ;multiply by 8
x = x << 1 ;multiply by 2
x = x + y

So something like:
LDA $00
ASL
ASL
ASL
STA $01
LDA $00
ASL
STA $00
ADC $01


Thank you for the example. I might have missed that 'ASL' exists. Now that I check the instruction set (in my copy of w65c816s.pdf), 'ASL' does shift left.

The instruction set does not have 'ASR'. If I wanted to do shift right, then I would have to use 'AND' to clear the low bits, then use 'ROR' to do rotate right?

Originally posted by Vanilla Lake 2
(while the hardware registers do multiplication) just use NOPs to burn the time if you don't have something to do in between. the worst case scenarios are 8 cycles for mult and 16 for div. 3 NOPs and one LDA $4xxx would mean 9 cycles would have elapsed and the product is ready to be read.


I know that 'NOP' does nothing but wait, but I did not learn about cycles yet. From your example, I guess that one 'NOP' waits 3 cycles.

My primary references to the Super NES are anything by Anomie at romhacking.net. Anomie's Register Doc has:

Code
4202 wb++++ WRMPYA - Multiplicand A 4203 wb++++ WRMPYB - Multiplicand B mmmmmmmm Write $4202, then $4203. 8 "machine cycles" (probably 48 master cycles) after $4203 is set, the product may be read from $4216/7. ...


I will not soon try multiplication. First I need to try addition in my ROM.

Originally posted by Ersanio
(offtopic) I dunno why but all.log is gone from the documents section. I don't know WHO removed it or WHY. It was rude not to inform us about the removal or I overlooked something. Whatever. Any clarification would be appreciated.

Meanwhile I put up all.log here:
http://ersan00b.googlepages.com/disassembly.rar


(offtopic reply) I also wondered about all.log, after I read about its disappearance in the ASM help thread. I downloaded your copy, Ersanio. I might use it later if I would hack SMW. Maybe you should post your URL in the ASM help thread, also?

--------------------
To start my 65816 hack, I had two choices:
1. Place my code and a 'JSL' in the Super Mario World ROM.
2. Place my code in a new ROM.

I chose a new ROM. I started my project, using WLA as my assembler. WLA can assemble a complete, new ROM with the Super NES header, including the checksum and the pointer to the reset handler. I configured the header for two banks of loROM and $800 bytes of save-RAM.

Inside the Super NES, my memory map seems to be:

$700000 through $7007ff: save-RAM
$7e0000 through $7effff: work-RAM
$808000 through $80ffff: first ROM bank
$818000 through $81ffff: second ROM bank

This should be exactly like Super Mario World, except that SMW uses more RAM and ROM, and I added $80 to my ROM banks.

The Super NES accesses its stack, its direct page and its reset handler through bank $00, so I use the mirrors:

$000000 through $001fff: mirror of $7e0000 through $7e1fff
$008000 through $00ffff: mirror of $808000 through $80ffff

Given this memory map, I put the reset handler in bank $80, and I split bank $7e between the stack and the direct page:

$7e0000 through $7e1eff: stack
$7e1f00 through $7e1fff: direct page

I wrote some WLA assembler directives for this memory map, and then I began to program my project. I made some early mistakes like this:

Code
;; nvmxdizc .define P_MSELECT %00100000 ... sep #P_MSELECT lda #$1eff.w ; this code is WRONG


I needed some time to find why my ROM was broken. The cause was my backward use of P_MSELECT. The correct way is to set the flag for 8-bit A, and reset the flag for 16-bit A, so rep #P_MSELECT before lda #$1eff.

WLA also came with a list of mnemonics, which I use as a reference. I use the list to code like so:

Code
lda #0 sta 9,s ; no stz 9,s ... txa sta 13,s ; no stx 13,s


I needed some way to check if my ROM reached some code. In the current version of my ROM, I have:

Code
;; cpio points to save-RAM at $700000 ;; (test) put '$61 $73' in the save-RAM lda #$7361 sta cpio + $400.l ... (assign arguments for cpio_header) ... jsl cpio_header ;; (test) put '$62 $74' in the save-RAM lda #$7462 sta cpio + $402.l


After I run my ROM, I look at the .srm file:

Code
$ hexdump -C find40th.srm 00000000 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 |````````````````| * 00000400 61 73 60 60 60 60 60 60 60 60 60 60 60 60 60 60 |as``````````````| 00000410 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 |````````````````| * 00000800 $


(This hex viewer uses '*' to skip identical lines.) I see '61 73' but not '62 74', so I know that my ROM became stuck. If I comment out the 'jsl cpio_header', then I also see '62 74'. Thus I know that my ROM becomes stuck somewhere in 'cpio_header'.

To use this check, I must delete my .srm file before I run my ROM, so that my .srm file does not contain leftover bytes from earlier runs.

I have some code in 'cpio_header' that accesses a 16-bit address, but I forgot to set the data bank register, so that must be one of my problems.

'cpio_header' is unusual because I try to allocate a stack frame for local variables:

Code
cpio_header: ;; allocate stack frame tsc sec sbc #16 tcs ... ;; free stack frame tsc clc adc #16 tcs rtl


So after I fix the data bank register, then if I would have more problems, then I might check for problems with the stack.
Quote
The instruction set does not have 'ASR'. If I wanted to do shift right, then I would have to use 'AND' to clear the low bits, then use 'ROR' to do rotate right?


arithmetic shift right can be done with just:

Code
CMP #$80 ROR A


that's all, carry flag will be set based on sign bit and you'll have the bit rotated in conveniently.

Quote
I know that 'NOP' does nothing but wait, but I did not learn about cycles yet. From your example, I guess that one 'NOP' waits 3 cycles.


no NOP is just 2 cycle, 3 NOPS amount to 6 cycles and by the time the opcode and operand for the LDA $4xxx is read more than 8 cycles have passed.

Quote
Inside the Super NES, my memory map seems to be:

$700000 through $7007ff: save-RAM yes IF 2kb is allocated in the internal header
$7e0000 through $7effff: work-RAM 7e0000-7fffff actually, 128kb
$808000 through $80ffff: first ROM bank yes for map mode 20 (LoROM)
$818000 through $81ffff: second ROM bank

Originally posted by Vanilla Lake 2
arithmetic shift right can be done with just:

Code
CMP #$80 ROR A


that's all, carry flag will be set based on sign bit and you'll have the bit rotated in conveniently.


I forgot that ROR uses the carry bit, but I understand now. If I remember about CMP from the recent ASM Workshop, then CMP #$80 tests (unsigned) A >= #$80 to set the carry flag.

During the Workshop, I learned that 'LSR' exists. I guess that 'LSR' is for unsigned right shift. Previously, I believed that there was no right shift because I did not find 'SRA' or 'ASR'.

Originally posted by Vanilla Lake 2
NOP is just 2 cycle, 3 NOPS amount to 6 cycles and by the time the opcode and operand for the LDA $4xxx is read more than 8 cycles have passed.


So cycles are more difficult to understand than I guessed.

Meanwhile, I fix mistakes in my code. For example, I had this:

Code
;; set stack pointer to #$1eff lda #$1eff txa ; this is so WRONG


I wrote 'TXA' (transfer X to A) but I wanted 'TCS' (transfer 16-bit A to S). My code might have other mistakes like this.



(March 2) I found my worst mistake yet. I searched for a problem in my ROM. The accumulator had $1efc. Subtraction of 16 gave $1eec. The accumulator had $1eec. Addition of 16 gave $2562. How does $1eec + 16 = $2562?

I simplified my code, but the problem remained.

Code
lda #$1eec ; a9 ec 1e clc ; 18 adc #16 ; 69 10 00 sta cpio + $3dc.l ; 8f d0 03 70


This computes $1eec + 16 and writes the answer to $7003d0. I ran my ROM, then looked at x03d0 in my .SRM file, and saw '62 25' = $2562. I read the hex in my ROM, and the assembly was correct. Then I recalled that the decimal flag affects 'ADC'.

So I checked my hardware reset handler.

Code
;; nvmxdizc .define P_MSELECT %00100000 .define P_XSELECT %00010000 .define P_DECIMAL %00001000 sep #P_DECIMAL ; another MISTAKE rep #P_MSELECT | P_XSELECT


This code does set the decimal flag but I meant to reset this flag. I am changing my code to

Code
rep #P_MSELECT | P_XSELECT | P_DECIMAL




(April 15) I now have working code that actually computes the 40th Fibonacci number. The num1 and num2 are labels in RAM. I still have the direct page in range $7e1f00..$7e1ffff, so I need to use the < sign to grab the low byte ($00..$ff) to use the direct page.

Code
;; num1, num2 = 1 in decimal mode lda #1 sta <num1 sta <num2 stz <num1 + 2 stz <num2 + 2 stz <num1 + 4 stz <num2 + 4 ;; compute 40th Fibonacci number ldx #(40 - 2) / 2 ; number of loop iterations sed ; set decimal mode _fib_loop: ;; num1 = num1 + num2 lda <num1 clc adc <num2 sta <num1 lda <num1 + 2 adc <num2 + 2 ; carry sta <num1 + 2 lda <num1 + 4 adc <num2 + 4 ; carry sta <num1 + 4 ;; num2 = num1 + num2 lda <num2 clc adc <num1 sta <num2 lda <num2 + 2 adc <num1 + 2 ; carry sta <num2 + 2 lda <num2 + 4 adc <num1 + 4 ; carry sta <num2 + 4 dex bne _fib_loop ; loop X times cld ; clear decimal mode


I also learned to use mvn. I had some trouble with wla-65816. My mvn did not work until I reversed the operands.

Code
;; Copy answer from format_buffer to cpio_pos. ;; ;; Attention! WLA-65816 reverses the operands of "mvn", which ;; moves a string of bytes from bank $ss to bank $dd: ;; mvn $ss, $dd ; standard syntax ;; mvn $dd, $ss ; wla-65816 syntax ;; ;; Our banks are ;; $ss == dp[format_buffer + 2] == answer >> 16 ;; $dd == dp[cpio_pos + 2] == cpio >> 16 ;; ldx <format_buffer ; x = source address ldy <cpio_pos ; y = destination address lda <format_bufsiz dec a ; a = length - 1 mvn cpio >> 16, answer >> 16


I checked my assemblers:

xkas-v0.12 accepts the standard syntax, mvn $ss, $dd
wla-65816, version 9.5, requires backward syntax, mvn $dd, $ss
xkas-v0.06 requires mvn $ssdd



(June 6) My 65816 hack, Find 40th Fibonacci Number, is complete.

Four source files: cpio-65816.s, find40th-65816.s, format-65816.s, system-65816.i (28 August 2009: I changed the links from HTTPS to HTTP.)

The generated find40th.srm provides the correct number, 102334155:

Code
$ make find40th.sfc wla-65816 -o cpio-65816.s cpio-65816.o wla-65816 -o find40th-65816.s find40th-65816.o wla-65816 -o format-65816.s format-65816.o wlalink -r find40th-65816.link find40th.sfc $ snes9x-gtk find40th.sfc $ pax -rvf find40th.srm result pax: sv4crc vol 1, 1 files, 2048 bytes read, 0 bytes written. $ cat result 102334155 $

Last edited on 2009-08-28 01:33:42 PM by Kernigh.
(28 August 2009) I bump this thread by almost six months because this is my hack thread.

I wanted to learn ASM for SPC-700. So I am porting my hack, Find 40th Fibonacci Number, from 65816 assembly to SPC-700 assembly.

The SNES has two processors: the 5A22 (which runs 65816 code) and the SPC-700. The 65816 version of my hack uses the 5A22 and ignores the SPC-700. The SPC-700 version of my hack will use the SPC-700 for almost everything. This slows the hack (because the SPC-700 runs slower than the 5A22) but provides my first chance to play with SPC-700 opcodes.

The SPC-700 version of my hack has two sides. The 5A22 side in sram-spc700.s runs some 65816 code to upload a program from ROM to the SPC-700, then to download some data from the SPC-700 to save-RAM. The 5A22 side is complete.

The SPC-700 side will compute the 40th Fibonacci number and write the answer to save-RAM. The SPC-700 side will be a port of the 65816 (or PowerPC, or whatever) version of my hack. The SPC-700 side is started but not complete.

The SPC-700 processor resembles a 65xx processor, but the opcodes have different numbers, and someone changed the mnemonics to a different pattern. For example, LDA, STA and TAX became MOV.

Code
65xx => spc700 lda #$23 => mov a, #$23 sta $45 => mov $45, a tax => mov x, a pha lda $67 sta $89 => mov $89, $67 pla


I like the new form of ADC. The SPC-700 can use any direct page address like an 8-bit accumulator.

Code
65xx => spc700 clc clrc adc #$03 => adc a, #$03 pha lda $04 clc clrc adc $05 => adc $04, $05 sta $04 pla


The SPC-700 does 16-bit operations, but not as well as the 65816. The SPC-700 has no instructions to load an immediate 16-bit value or use 16-bit values outside of the direct page.

Code
65816 => spc700 rep #$20 lda $06 clc movw ya, $06 adc $08 => addw ya, $08 sta $06 movw $06, ya lda #$abcd => mov a, #$cd mov y, #$ab push x mov x, !$1357 mov $00, x mov x, !$1538 clc mov $01, x adc $1357 => addw ya, $00 pop x sta $0200 => mov !$0200, a mov !$0201, y


I think that I prefer the 65816 to the SPC-700. The 65816 has more and better addressing modes than the SPC-700.

Code
;; I want to use this code, ;; but the SPC-700 misses these addressing modes! ;; -- mov a, format_integer + y ; NO and a, #$0f ; yes mov x, a ; yes mov a, !digit_table + x ; yes mov x, !hex_low_placement + y ; NO mov [format_buffer] + x, a ; NO

Code
;; This would be the code for the 65816. ;; -- lda.b format_integer, y ; NO, so use TYX and #$0f ; yes tax ; yes lda.w digit_table, x ; yes ldx.w hex_low_placement, y ; yes sta.b (format_buffer), x ; NO, so use PHY, TXY, PLY

Code
;; This is the actual SPC-700 code in ;; 'hexformat' subroutine in my hack. ;; -- mov x, format_integer + y mov a, x ; a = byte from integer and a, #$0f mov x, a ; x = low digit mov a, !digit_table + x mov $00, a ; RAM $00 = ASCII digit mov a, !hex_low_placement + y push y mov y, a ; y = placement of digit mov a, $00 mov [format_buffer] + y, a ; place digit in buffer pop y ; restore y = loop counter



(30 August 2009) The SPC-700 version is complete! Source code is at block-spc700.s and sram-spc700.s
Last edited on 2009-08-30 04:47:12 PM by Kernigh.
Pages: « 1 »
Forum Index - Creation - Other ROM Hacking - Find 40th Fibonacci number (SNES)

The purpose of this site is not to distribute copyrighted material, but to honor one of our favourite games.

Copyright © 2005 - 2010 - SMW Central
Legal Information - Link To Us


Total queries: 24
Menu