Language…
13 users online:  AmperSam, dotCoockie, Golden Yoshi, Hayashi Neru, jirok1, JPhanto, Mario's GameBase, MorrieTheMagpie, Rykon-V73, signature_steve, Sparkz314, timothy726, toady - Guests: 252 - Bots: 285
Users: 64,795 (2,377 active)
Latest user: mathew

Super Accelerated SMW?

This is just theory, I haven't actually put it into action.

The use of the SFX in that HDMA topic made me think about offloading some routines from the S-CPU to an SA-1. The nauseating slowdown when too many sprites are on screen might be done away with if the sprite routines were to run on the SA-1 end and the results were sent back to the SNES RAM on completion.

I chose the SA-1 because the code can be mostly reused with almost identical CPU's (some extra code ofcourse has to bring it all together). And unlike the SuperFX which hoards the ROM for itself, the SA-1 can access RAM/ROM together with the S-CPU. The former running at 5.37Mhz and the latter at SloROM speed. It pales to 21Mhz but it makes the implementation so much smoother.

The initial thought I have of this working out is to:

-Start of frame, S-CPU IRQ's SA-1 and signals a new frame. S-CPU DMA's sprite data to 'BW-RAM', sources would be 7E:14C8 and such. Might just do a big block copy (7E:0000-7E:1FFF) to include controller data and what-not. S-RAM is a big issue since the data for say, OAM ($0300) shares ranges with it. Might just scrap the 'integrity check'(?) and just load/store backup data elsewhere.

E: Forgot...Can just set the DB reg to 71f / 41h and forget about backup data altogether.

Probably wouldn't be too hard, I've stepped through most of it. Seems that the other data can be accessed with the data bank being set to BW-RAM (for 16bit addresses). 24bit addresses and SNES reg writes would cause some problems...At any rate, routines would most likely have to be written with the coprocessor in mind.

-For each run of a sprite routine, S-CPU writes PC location of sprite routine to somewhere in I-RAM and IRQ's SA-1 for action. S-CPU just returns and moves onto the next, while SA-1 works on sprite routine.

-Upon completion of the routine, SA-1 writes a certain part of I-RAM checked by the SNES when it finishes processing for the frame.

-SNES DMA's the BW-RAM initially used in the sprite routines back to the SNES and the S-CPU allows NMI to progress / move onto the next frame.

Any thoughts? I've 'equipped' a copy of SMW with an SA-1 and the memory map doesn't conflict with the original one of SMW. LM doesn't seem to mind either. S-RAM and all works fine (just that it also appears at bank 40+). So, I'd like to ask:

-how feasible do you think this is?
-is there anything else taxing on the SNES that could also be offloaded?
-does LM significantly change the sprite routine system?

I'm just thinking out loud here, so I'm not sure if there's a horrible flaw in this plan.
From what I heard, most of the slowdown seen in hack are due to blocktool's ASM code, wich take more cycle to execute each time a custom block is inserted. Here a partial disassembly of the code used:

Code
;;;;;;;;;;;;;;;;;;;;;;;;;
;blktool ASM disassembly;
;;;;;;;;;;;;;;;;;;;;;;;;;
org $86F690

$86/F690: JSL Below           ;entry point for below offset

org $90BB2E

Below:
$10/BB2E: REP #$20          
$10/BB30: LDA #$BB97             
$10/BB33: STA $7EBD00
$10/BB37: LDA #$BBA3          
$10/BB3A: STA $7EBD02
$10/BB3E: LDA #$0006             
$10/BB41: STA $7EBD04
$10/BB45: LDA #$BBAF              
$10/BB48: STA $7EBD06
$10/BB4C: JSR Main

org $90BFEA

Main:
$10/BFEA: PHX           
$10/BFEB: PHY                 
$10/BFEC: PHB                    
$10/BFED: PHK                     
$10/BFEE: PLB                   
$10/BFEF: REP #$30              
$10/BFF1: LDA $7EBD06
$10/BFF5: STA $05   
$10/BFF7: LDA $7EBD04    ;\
$10/BFFB: AND #$00FF     ; |load the number of blocks and store it into X       
$10/BFFE: TAX            ;/       
$10/BFFF: LDA #$0000     ;\       
$10/C002: TAY            ;/reset Y to $0000   
$10/C003: LDA $7EBD00    
$10/C007: STA $00   
loop:
$10/C009: LDA ($05),y
$10/C00B: BEQ label2
$10/C00D: LDA ($00),y
$10/C00F: CMP $03  
$10/C011: BEQ toblockcode
$10/C013: BRA label1 
label2:
$90/C015: INY                 
$90/C016: SEP #$20             
$90/C018: LDA ($00),y
$90/C01A: DEY                    
$90/C01B: CMP $04  
$90/C01D: REP #$20              
$90/C01F: BEQ toblockcode
label1:
$10/C021: INY                    
$10/C022: INY                    
$10/C023: DEX                   
$10/C024: BNE loop
$10/C026: SEP #$30       ;\
$10/C028: PLB            ; |
$10/C029: PLY            ; |if no match found, exit
$10/C02A: PLX            ; |   
$10/C02B: RTS            ;/
toblockcode:
$10/C02C: REP #$20              
$10/C02E: NOP                ;\ 
$10/C02F: NOP                ; |w00t at useless NOPs
$10/C030: NOP                ;/   
$10/C031: LDA $7EBD02        ;\
$10/C035: STA $00            ; |load the pointer to the block code
$10/C037: LDA ($00),y        ; |
$10/C039: STA $00            ;/
$10/C03B: SEP #$30           
$10/C03D: PLB              
$10/C03E: PLY            
$10/C03F: PLX               
$10/C040: LDA #$00           
$10/C042: STA $7EBD06
$10/C046: JMP ($0000) ;jump to custom block code


The Main part is executed 8 time(one for each blocktool offset), I already "fixed" it by converting the main part to SuperFX ASM
Code
SNES Part:

org $10BFF1
           

;;;;;;;;;;;;;;;;;;;;;;;;;;;
;Blktool optimisation code;
;;;;;;;;;;;;;;;;;;;;;;;;;;;
blktoolfix:
                      LDA $7EBD06 ;\ 
		      sta $7F16   ; |
		      lda $7EBD04 ; |
                      and #$00FF  ; |
		      sta $7F14   ; |
                      lda $7EBD00 ; |
		      sta $7F10   ; |
		      lda $7EBD02 ; |
		      sta $7F12   ; |
		      lda $03     ; |store Various RAM adress used by the ASM to SRAM so SuperFX can acess them
		      sta $7F1C   ; |
		      sep #$30    ; |
		      phb         ; |
		      pla         ; |
		      sta $7F0F   ;/
		      lda #$03    ;\
		      sta $7FFF   ;/ set function 3(blktool optimiser) to get exected by SuperFX
Finishcode:           JSL !SuperFXinit ;start SuperFX
		      lda $7FFF
		      beq Finishcode
		      lda $7F18        ;\if no custom block found...
		      BNE toblockcode  ;/ return
                      PLB       
                      PLY     
                      PLX        
                      RTS 
toblockcode:
		      PLB              
                      PLY            
                      PLX               
                      LDA #$00     ;\
                      STA $7EBD06  ;/ not sure what is the point of this, but the original blktool ASM to do this
		      jmp ($7F1A)  ;jump to the custom block code
SuperFXPart:

superblktool: 
             lm r2, (#1F10)   ;
	     lm r3, (#1F12)   ;
	     lm r12, (#1F14)  ; load the data previously stored by the SNES CPU
	     lm r4, (#1F16)   ;
	     ibt r5, #00
	     lm r0, (#1f0f)
	     lm r6, (#1f1C)
	     romb
	     cache
	     move r13,r15
             move r14, r4
             with r14
	     add r5
	     getb
	     inc r14
	     getbh
	     add r0
	     BEQ label2
	     nop
	     move r14,r2
	     with r14
	     add r5
	     getb
	     inc r14
	     getbh
	     cmp r6
	     Beq blockfound
	     nop
	     BRA label1
	     nop
label2:
	    move r14,r2
	    with r14
	    add r5
	    inc r14
	    with r1
	    getb
	    from r6
	    hib
	    cmp r1
	    BEq blockfound
	    nop
label1:
            inc r5
	    loop
	    inc r5
	    IBT R0, #00
	    SM (#1f18), r0
            stop
            nop
            jmp r10 ;Wannabe rts P
            nop
blockfound:
            IbT R0, #01
	    SM (#1f18), r0
	    move r14, r3
	    with r14
	    add r5
	    getb
            inc r14
	    getbh
	    sm (#1f1A), r0
	    ibt r0, #00
	    sm (#1FFF), r0
            stop
            nop
            jmp r10 ;Wannabe rts P
            nop


the problem is that blktool ASM's location is dynamic, wich mean I can't make a patch to change it since the location is not the smae for all hacks :S
SA-1 is the one used in Megaman X2 and X3 right?

also: wouldn't it be easer for blocks to use a table or something? ^_^;
I was thinking maybe it's time to make a new blocktool system... Actualy I sorta want to make something that can insert ASM, Blocks, AND sprites. XD But I probobly won't. At least untill I get to the point where i need it.
Your layout has been removed.
Originally posted by KilloZapit
SA-1 is the one used in Megaman X2 and X3 right?

no, it's the one used in Kirby Superstar, Kirby dreamland 3 and SMRPG. X2 and X3 used Cx4
Originally posted by KilloZapit
also: wouldn't it be easer for blocks to use a table or something? ^_^;

from what I heard BTO(if it ever get released) will use a pointer table insted, wich will fix the slowdown issue
Crud. Those are the three games that run slow as heck on my PSP emulator which I play most of my stuff on. :/ Oh well.

Also: Ya think a xkas preprocesser that searched for free space in a rom would be nice? XD And maybe let you insert sprites/blocks too. I donno.
Your layout has been removed.
Totally forgot about custom blocks. I was thinking all the focus was on the sprites. Compounding blocktool's inefficient implementation with sprites (and moreso with buoyancy) makes for some ugly slowdown.

While there are several things that would have to be worked around, I just realised the most prohibitive one would be like the following bank change:

PHB
PHK
PLB

Or anything that changes the bank really. Since the SA-1 only has 2kb of I-RAM equivalent to SNES's 8kb at the beginning of each ROM bank, it'd be too much of a hassle to work with that. Just about every sprite memory access is done with 16bit addressing targetting that area of RAM. If it had 8kb aswell, it would be more workable. Apart from blocks and sprites I can't think of anything else that could be of significant help if offloaded to a coprocessor. Oh well.
Originally posted by KilloZapit
I was thinking maybe it's time to make a new blocktool system...

Well there is the Japanese one, blite, but it seems to just be a makeover'd Block Tool.
Your layout has been removed.
Yep. I wish Sukasa would hurry up with Block Tool Omega. D:
Originally posted by KilloZapit
Crud. Those are the three games that run slow as heck on my PSP emulator which I play most of my stuff on. :/ Oh well.

Also: Ya think a xkas preprocesser that searched for free space in a rom would be nice? XD And maybe let you insert sprites/blocks too. I donno.

same here, especially Yoshi's island..Maybe we should just rebuild the Snesx9 emu for the psp. I was thinking using Zsnes v1.42(i think). It is one of the really good emus to me.