Banner
Views: 944,782,246
Time:
16 users online:  AmperSam,  Anorakun, chickaDEE Magazine, GigaCrash,  icrawfish, Infinity, MassPunishment, mateochips, nicknicknick2020, P1JDS, SiameseTwins, SusGodGaming, Sweetdude, TCgamerboy2002, Truxton, will___ - Guests: 91 - Bots: 75 Users: 52,528 (2,020 active)
Latest: Luis Darskt
Tip: Don't replace the first two Map16 FG pages unless you explicitly know what you're doing, or objects may act strangely.
Not logged in.
Why couldn't developers optimize their assembly?
Forum Index - Non-SMW Hacking - Misc. ROM Hacking - Why couldn't developers optimize their assembly?
Pages: « 1 » Link - Thread Closed
This has been bothering me for years. Why did the developers for the Super Nintendo always used the slowest algorithms they could find? 65816 ASM is very fast when you do everything the easiest way, but a lot of developers chose complicated methods to solve easy problems instead. I don't understand why they do it.

There were some later Snes games that had well optimized assembly that were extremely fast, but when comparing them to other games, it just makes you wonder if there was something mentally wrong with the other programmers.
SNES games weren't programmed in 65816 ASM, per se, but it was what the code was compiled to. For instance, if you program a game in C, that is a high-level language meant for people to easily understand and work with. 65816 ASM is a low-level language, what the machine works with. It could be that the compiler would be to blame for poor optimization, or that the programmer, as you said, didn't code it the best in the original language that it was written in. As time went on, they probably made better compilers for optimizing data.

Think of it like an online translator - if you translate something from English to Spanish and look at the Spanish that is output, it might not be in exactly correct grammar. Similarly with a compiler, an instruction compiled to ASM might turn up something ridiculous like LDA $000019 instead of just LDA $19.

Just look above you...
If it's something that can be stopped, then just try to stop it!
If developers relied more on assembly instead of using compile, there wouldn't have been all the "Genesis has blastprocessing, Super Nintendo doesn't" crap.
kyoseron: the amount of games programmed for the SNES in a high level language makes that point completely negligible. the processor is far too limited to get reasonable speed out of code generated by a C compiler or such outside of very few applications. no SNES programmer would choose C over assembly unless the game is an interactive book or something, or the processing load at any given moment is small.

Quote
LDA $000019 instead of just LDA $19


compilers from the 1970s were capable of optimizing multiplies by powers of two into bit shifts, i wouldn't expect this to turn up unless the compiler is absolute garbage or the programmers used defines which didn't get optimizied into DP access.

dragonboy: can you cite a few examples of the horror code you mentioned and post it in code tags here?

as for possible reasons why it is like that, imagine having to write a hundred thousand lines of assembly or so with a deadline creeping up on you. in these cases the priority would definitely be on code that functions rather than code that functions quickly. miyamoto (i think) himself stated that the allocated time for smw development means a few things were left out that, this could run the same for developers optimizing the code. in the end though, smw ran fine and rarely slowed down.
Also, "Blast Processing" was really just a Sega marketing gimmick. Most of people who bought into that term didn't even know what the heck it meant.

--------------------
Originally posted by Vanilla Lake 2
in the end though, smw ran fine and rarely slowed down.


Do you have the SMAA:SMW version? I have an actual cartridge of that game and it rarely ever slowsdown, but on my ZSNES emulator ROM, there is more slowdown in it for some odd reason.


I have a game called "Super Starwars: Empire Stikes Back" and it literally slowsdown every time a new enemy comes on the screen. It's very obnoxious and I'm wondering what the hell was wrong with the guy who programmed this game?

No Super Nintendo game should ever have this much slowdown, concerning that when you have all 128 sprites onscreen running at the full 60 fps, you have approxmately 400 cycles per sprite to work with, and in high action games most of the sprites will be used as confetti, and as parts of bigger sprites anyway.


EDIT: Just ran Super Star Wars through the 65816 ASM program, and appearantly it is FULL of unnecessary jumping and returning to subroutines. It might also have poorly designed compression techniques but that will take me a long time to find that out.
snes's processor was already slow on its release and many companies didn't want to invest the time in fine tuning their code i suppose. if it ran at the same speed as the ppu (5mhz~) that would've done alot for it's speed. the speed the engineers went along with wasn't fast, but it turned out to be adequate since it did win over the most gamers.

piecing together sprites takes a good chunk of frame time but there's alot of other stuff that happens behind the scenes like assembling a strip of level graphics to upload / collision detection / complex enemy behaviour etc / assembling hdma tables list goes on. code doesn't take up that much space and ideally if you have space to burn you'd use macros over subroutines when possible, though most of the time developers had to cram as much as they could for whatever cart size they were budgeted for. that'd be a limiting factor, along with time dedicated to optimizing.

zsnes isn't known for being accurate so that would explain any unexpected slowdown.
I wouldn't say the 65816 was slow like everybody says it is. It was clocked twice as fast as the NES's 6502, and it's instructions take several times less cycles than the 68000, so just saying "it's cpu was slow" is NOT an answer. I don't care what Wikipedia says, they are extremely biased, and specifications do not tell the whole story.

And, how can optimizing code take up more time? It would take up less time to do it, at least MY kind of code optimizing. Let say I to program a sprite moving right across the screen, I would make my job easier and just make it add the speed to x-coordinate every frame. I wouldn't waste my time programming a formula to calculate the displacement from the initial coordinate using change of time. That will take up both my time and the 65816's time also.

I also don't believe in the "make a special algorithm to do everything to make the game easier to program despite being really slow on the system." I know some programmers were able to do it, and it was easy for them to do, and let them program games faster in a limited amount of time, but I don't know how to make an algorithm like that, and it WOULD be a LOT faster for me to just program the game by programming it instead of working on creating this complicated game engine that slows the system down anyway.

For sprite compression, why use algorithms that are hard to program for planar format, such as RLE and LSZZ, when it is easy to program compression methods such as using 1x8 tiles or using variable bit depths per line or tile?

For collision detection I believe I accidentally create the fastest algorithm known to man without realising it at first.

If 0 < ( XS1 + XC1 - XC2 ) < XS1 + XS2
and 0 < ( YS1 + YC1 - YC2 ) < YS1 + YS2
then collision occured

XS1 means x-size of sprite 1
YC2 means y-coordinate of sprite 2
you could figure out the rest by yourself.

It works even faster when you subtract XS1 from ( XS1 + XC1 - XC2 ) because you can just compare it with XS2 and your done.
Quote
I wouldn't say the 65816 was slow like everybody says it is. It was clocked twice as fast as the NES's 6502


nope, you're misinformed (and yes it is as slow as 'everybody says', but it was adequate in the end). it only ran at 3.58mhz~ speed only when accessing memory from a fastrom area with compatible roms. 68k's huge register count + width plus the 3x clock advantage means that there's really no hope in arguing that the 816 is comparable in performance. (65xx may be able to do more clock per clock but you aren't comparing a 2.68mhz 68k here).

also, 68k's address registers and overall size means it has to access external memory much less frequently than the 816, like having a pointer in an address register or having a much easier time doing multiprecision arithmetic etc, not to mention the obvious advantage of 16bit bus vs 8bit. there's no arguing against the sluggish speed of the 816, especially in comparison to the genesis 68k.

Quote
And, how can optimizing code take up more time?


this quote sums it up pretty nicely:

Quote
Optimization can reduce readability and add code that is used only to improve the performance. This may complicate programs or systems, making them harder to maintain and debug. As a result, optimization or performance tuning is often performed at the end of the development stage.


it's normally left to the end, and optimizing code especially assembly can impact just how readable it is. prematurely optimizing on a 2MB ROM or something, say 200,000 lines of assembly is probably not a good idea and the problem is compounded when working in teams (read: always).

sprite compression is largely irrelevant since a poorly written decompressor will only result in a couple extra frames of black screen during loading a level. real time decompression is just about impossible, even the FastROM SMK left decompressed graphics in ROM because there wasn't enough room in RAM to store them decompressed.

consoles have been using collision detection for the past 30 or so years when speed has always been critical during the time, creating algorithms at this point doesn't really happen since chances are its already been done.

Quote
MY kind of code optimizing. Let say I to program a sprite moving right across the screen, I would make my job easier and just make it add the speed to x-coordinate every frame. I wouldn't waste my time programming a formula to calculate the displacement from the initial coordinate using change of time. That will take up both my time and the 65816's time also.


that's more blatant common sense than actual optimizing. practically every game from the NES era and beyond game that uses speeds does this.
65816 can access memory every cycle. 68000 can only access memory every 4 cycles. The 65816 can access memory in less cycles than the 68000 can even access it's internal registers. That gives the 65816 256 psuedo registers, compared to only 16 registers of the 68000. Oh, and the 65816 can use 8-bit and 16-bit addresses to free up cycles by accessing less memory, the 68000 is glued to using the big 32-bit addresses which wastes 8 cycles due to it only being to access memory every 4 cycles.


And, the 68000 instructions do not take a little more cycles to execute, they take up 3 or 4 times as many cycles as similar 65816 instructions, and that IS excluding all the "super powerful" instructions which take up 50 cycles.

And since when does 3 x 3 = 7? No, 3 x 3 = 9. That proves you still didn't graduate from 3rd grade math.
Quote
65816 can access memory every cycle. 68000 can only access memory every 4 cycles


actually, 68k can access a word from memory faster than the 816 can access a word from DP despite this. this is where those address registers give it real nice performance.

Code
move.w (a0), d0 ;read from address in a0 into d0, 8 cycles for word access


Code
LDA $00 ;read 16bits into A assume m bit clr. 4 cycles.


2.68mhz speed vs 7.67mhz approx means snes is 2.86~ times slower (2.14 in fastROM), here the 68k can access memory quicker than the SNES can provided the programmer is actually taking advantage of the address registers in their code. when using absolute addresses snes has the advantage although vast majority of cases using an address reg is the way to go. in fastrom cases when reading only just a byte snes is the victor, although fastrom itself is a premium most developers went without...

Quote
The 65816 can access memory in less cycles than the 68000 can even access it's internal registers


see above.

also, regsiter to register transfer is 4 cycles on a 68k (move.w/.l d0,d1) and 32bits can be moved at a time, this actually means genesis can move from register to register faster than the snes can along with memory.

Quote
That gives the 65816 256 psuedo registers, compared to only 16 registers of the 68000


as stated, code that actually uses address register can outpace these pseudo registers in terms of speed.

Quote
Oh, and the 65816 can use 8-bit and 16-bit addresses to free up cycles by accessing less memory, the 68000 is glued to using the big 32-bit addresses which wastes 8 cycles due to it only being to access memory every 4 cycles


nope, 68k has absolute short addressing for 16bit addresses. again, address registers can mean it only has to access that big bad 32bit address once when loading it into an address register, think of a pointer to compressed graphics where the SNES has to constantly reload a pointer (say you're using LDA [$00],y) whereas the 68k just has to load it once, plus you get free post inc / pre-dec though SNES requires you to INY. also, another bonus is that when reading 16bits at a time 68k's auto increment is adjusted to the data size read, so icnrementing on a 16bit read advances the address by two (two INY / INXes required in sequence for the SNES).

another example would be storing tilemap data in ram to upload to vram later on. store the buffer address in an address register once and you're already capable of accessing memory faster than the snes by a fair margin (comparing something like STA $1234,x INX INX to something like move.w d0,(a0)+, no contest).

Quote
And, the 68000 instructions do not take a little more cycles to execute, they take up 3 or 4 times as many cycles as similar 65816 instructions


see above. also, similar sets of 816 instructions are rolled into one 68k instruction:

Code
;setup points and count and such
loop:
LDA [$00],y ;read word
STA $1000,y ;store somewhere
INY
INY
DEX ;until x runs to zero
BNE loop


Code
lea source, a0 ;note i'm loading loading these once
lea dest, a1
move.w count-1, d0
loop:
move.w (a0)+,(a1)+ ;roll lda + sta + iny + iny into one :0)
dbne d0,loop ;roll DEX + BNE loop into one
dbra d0,loop


68k has a really snazzy instruction set that means you have to look past clock speeds and memory access cycles to see just how useful it is. hope i shed some light on all of this for you. if you're still not convinced, consider asking a snes emulator author / genesis enthusiast etc. since i don't think i've ever seen someone claim the 816 is superior. it's a really abstract argument to make and i've seen several emulator authors claim that there's simply no contest and it's pretty easy to see.

Quote
And since when does 3 x 3 = 7? No, 3 x 3 = 9.


2.68 != 3. i assume slorom since it was the most common application. 3 came from approxiamating 2.86 (7.67 / 2.68), although the few games that used fastrom bring this to 2.16, although this only applies to memory accesses from the cart ROM. even then, 68k would still edge it out for most cases. check out it's instruction set it does alot to pretty much negate the slow memory accesses.

Quote
That proves you still didn't graduate from 3rd grade math.


hey now, i like this sort of discussion and i hardly see how it calls for insults. that proves you still didn't graduate from 3rd grade maturity :p. i'd be happy to go on but just keep the childish stuff out of it.

fake edit: damn that's a huge reply.
I am not saying the Super Nintendo's 65816 is faster than the Sega's 68000. I am saying that, if both cpus were clocked the same speed, the 65816 would be faster than the 68000 which is still very obvious in your post. I'm just proving that it is still faster than a third or a half the speed of the 68000, even though it was clocked at a third or a half the speed.

EDIT: And, about that INC, DEC stuff. You don't need to increment Y or X, you just use the next offset instead.
Quote
I am saying that, if both cpus were clocked the same speed, the 65816 would be faster than the 68000 which is still very obvious in your post.


you didn't say any of this earlier, but it's really silly to make this point regardless. how many times have you seen a 4mhz 6502 compared to a 4mhz z80? there's good reason that 1mhz vs 4mhz was the norm.

take the 68k and '816. sure, if the 816 as running at 7.67mhz there'd be no contest but considering the snes normally ran at 2.68mhz and in 1980 the very first, slowest 68k ran at 4mhz it's clear they aren't supposed to run with comparable clocks:

65xx cpus need to be able to access memory every cycle which means the logic it is tied to has to be fast enough to respond in time ofcourse. if you ran the 816 at 7.67mhz when accessing ROM, you'd need extremely fast roms and considering the 3.58mhz access on ROM was not the norm for SNES games but rather a luxury if the developer's budget was right. 68k's access memory only every 4 cycles like you said, which means that a 7.67mhz 68k would only need external logic to be run at 7.67/4 = 1.92~ mhz. well look at that, you could even use slower logic than you would with the snes! don't compare 68k's running at 65xx clocks and vice versa, they were never intended to run at each others speeds. 2mhz 816 vs 8mhz or 4mhz 816 vs 16mhz 68k etc. would be a fair comparison since once you start ramping up the speed of the 816 you have to pay a great deal extra for the faster logic (this makes the PC engine rather strange, an abnormally fast 65c02).

Quote
I'm just proving that it is still faster than a third or a half the speed of the 68000, even though it was clocked at a third or a half the speed.


3/4 or 1/2 speed is still a great deal slower, although any 68k programmer with any sense would easily increase the performance gap between the two cpus. 68k powerful instruction set and register count does wonders for its performance.

Quote
EDIT: And, about that INC, DEC stuff. You don't need to increment Y or X, you just use the next offset instead.


ever worked with pointers before? take the ever so common LDA [$xx],y. you read one byte, how do you reach the next byte? you either increment the pointer in RAM or increment the index register. the latter is less expensive. 68k gives you this for free, just one of the perks of its instruction set.
Originally posted by Vanilla Lake 2
Quote

ever worked with pointers before? take the ever so common LDA [$xx],y. you read one byte, how do you reach the next byte? you either increment the pointer in RAM or increment the index register. the latter is less expensive. 68k gives you this for free, just one of the perks of its instruction set.


No that is not what I mean.

Take for instance:

LDA $00,y
STA $00,x
INY
INY
INX
INX
LDA $00,y
STA $00,x
INY
INY
INX
INX

that could be simplified by doing

LDA $00,y
STA $00,x
LDA $02,y
STA $02,x

Heck, you can even save more cycles by doing it by Direct Page to index, instead of index to index.

LDA $00
STA $00,x
LDA $02
STA $02,x

Tag (div) was not closed.
Tag (div) was not closed.
"3/4 or 1/2 speed is still a great deal slower, although any 68k programmer with any sense would easily increase the performance gap between the two cpus. 68k powerful instruction set and register count does wonders for its performance."

Well those programmers obviously never met me. I'm a lot faster at 65816 ASM than anyone I know. For example, take for instance the Load and Increment stuff you mentioned, you relied on programming it to figure out the next value, while I just programmed in the next value. It's little things like this that programmers do that they don't realize that makes it always go a lot slower than it could have been.
Originally posted by Vanilla Lake 2
Quote
65816 can access memory every cycle. 68000 can only access memory every 4 cycles


actually, 68k can access a word from memory faster than the 816 can access a word from DP despite this. this is where those address registers give it real nice performance.

Code
move.w (a0), d0 ;read from address in a0 into d0, 8 cycles for word access


Code
LDA $00 ;read 16bits into A assume m bit clr. 4 cycles.





This is why you are a moron. 4 cycles is half the amount of cycles 8 is, not the other way around. This is what I meant by it is able to access it's memory faster than the 68000 is able to access it's internal registers if both cpus are clocked at the same speed.

Since they are not clocked at the same speed, since Super Nintendo's cpu is clocked at 7/20 of the Genesis in slow mode, and 7/15 the speed in fast mode, to get the exact speed for this particular instruction you have to multiply the opposite cpu's number of cycles it takes to perform the operation to the clock speed.

7 x 8 = 56

15 x 4 = 60

56/60 = 14/15

the 3.58 Mhz 65816 is 14/15 the speed of the 7.67 Mhz 68000

7 x 8 = 56

20 x 4 = 80

56/80 = 7/10

the 2.68 Mhz 65816 is 7/10 the speed of the 7.67 Mhz 68000


LEARN HOW TO DO MATH! Just because it is clocked different doesn't mean the individual performance of different cpus identically clocked doesn't matter. It isn't just some kind of coin flipping you only do when there is a tie. The cpus individual performance has an effect to the performance even if you clocked it slower or faster than another chip.

for instance lets say a and b are the performance of two different but equally clocked chips. Lets say c and d are the performance of the same two cpus only one b is clocked twice as fast as d than a is clocked as c.

a = c
2b = d

Is 2c = d always true just because it has the same coefficent?
Quote
Well those programmers obviously never met me. I'm a lot faster at 65816 ASM than anyone I know.


how many real life snes programmers from the commercial snes days do you know? if any of them saw this topic they'd be lollin' as hard as me right now; arrogant AND egotistical. put the ego away and trust me if you were to present this argument elsewhere you'd get a similar response. try nesdev.com forums if you're still convinced you're right here, they have some of the most knowledgeable folk from both the homebrew and emulation scene. i've already tried to spell it out for you.

Quote
For example, take for instance the Load and Increment stuff you mentioned, you relied on programming it to figure out the next value, while I just programmed in the next value.


it's called an unrolled loop in the context i have brought it up in. it's one of the most basic optimizations there is, although at the obvious costs.

Quote
It's little things like this that programmers do that they don't realize that makes it always go a lot slower than it could have been.


believe me, a paid full time professional programmer in the SNES days with years of experience knows what an unrolled loop is, they didn't miss anything don't worry. there's reasons for and against using them...

honestly, the rest post is looks like a angry 9 year old wrote it so here goes:

Quote
This is why you are a moron.


This is why i get SO ANGRY in technical discussions that i resort to ad hominem in the form of calling people morons.

Quote
LEARN HOW TO DO MATH!


BLARGH LEARN HOW TO READING COMPREHENSION! *steam shoots out of ears* i'll just direct you to the paragraph right below the code block you quoted which points out the speed differences. yes, i can see that 4 cycles is half the cycles of 8 cycles, perhaps you should read the whole post where i do compensate for speed differences between the two. is the snes only marginally slower in fastrom mode? yeah, but you can't guage relative performance of commerical games off a single instruction on two very different cpus, particularly when the instructions set / adressing mode etc etc is far superior on one of those, and that fastrom is a luxury in itself.

closing points:
-3.58mhz was a rare premium, and only applied to ROM access.
-how a cpu competes with another in terms of one instruction is irrelevant in the big picture, commercial games won't be achieving 7/10 or 14/15 speeds in comparison for points i've already mentioned. the only reason i posted that block of code was to show the genesis' superior performance compared to the snes for a point you brought up. that, and DP memory access is one of the '816's defining performance features and fastrom games do approach the 68k's speed, too bad it doesn't tell the whole story. all in all, it's a major performance gap. i suggest you actually try to put together some actual homebrew examples since in all honesty the only way you can come to the conclusion that the snes '816 is on a level playing field with the 68k is misinformation and a narrow minded approach of comparing the two.

since you seem to make your points based on not reading whole posts, ad hominem attacks and a comical level of arrogance i have to suggest you don't act like a child in your future posts. it makes it difficult for people to take you seriously.
http://dsultimate.net/Board/upload/showthread.php?t=15927

you'll find a demo I made here.

I don't care that they'll laugh at me. All programmers are dimwitted brainwashed arrogant trolling loudmouthed dickheads.

Why don't you just get away from me, and argue with another one of your stupid arrogant dickhead programming friends, who loves to argue as much as you do.
me and my stupid arrogant dickhead programming friends are laughing at how you got so angry at a simple discussion on an internet forum.

Quote
It also proves that I'm also way cooler than the idiots at Sega-16.com. Whenever they try make a demo for the Super Nintendo, it lags horribly because they're too stupid to think up the easiest and most obvious tricks. You would never see any of the Sega-16.com morons having a demo that has remotely as many sprites on screen that is anywhere near as lagless as mine.

I'm like the James Bond of programming.


my stupid arrogant dickhead programming friends are also laughing at your mind boggling level of narcissism.

welp, this thread's run it's course. tone it down with the rage replies or you won't be around for much longer =O
Pages: « 1 » Link - Thread Closed
Forum Index - Non-SMW Hacking - Misc. ROM Hacking - Why couldn't developers optimize their assembly?

The purpose of this site is not to distribute copyrighted material, but to honor one of our favourite games.

Copyright © 2005 - 2021 - SMW Central
Legal Information - Privacy Policy - Link To Us


Menu

Follow Us On

  • YouTube
  • Twitch
  • Twitter

Affiliates

  • Super Mario Bros. X Community
  • ROMhacking.net
  • Mario Fan Games Galaxy
  • sm64romhacks