Banner
Views: 829,157,823
Time:
13 users online: alexwong, Arcten, Asparagii,  Deeke,  dtothefourth, EnderChan, katun24, killerofkings07, Knight of Time, LOLRyan2006 Mario Fan, RollingRigatonis, Sancles, SiameseTwins - Guests: 50 - Bots: 91 Users: 44,445 (2,526 active)
Latest: dskelly1001
Tip: After changing, moving, adding or deleting a level from the overworld, use only EMPTY save files. 0s often contain the old version.Not logged in.
[Fast3D] Improving the way we handle textures in custom levels
Forum Index - Non-SMW Hacking - Super Mario 64 Hacking - ASM and Data Repository - [Fast3D] Improving the way we handle textures in custom levels
Pages: « 1 »

Over the past year and a half, the SM64 hacking community has made significant progress on getting ROM hacks to work on the original N64 console. macN64 has even created a guide on the process to fix textures on some older hacks to get them to work on console, which is fantastic. You can see this thread on the current progress on console compatibility. 

A major problem we are having now is the unplayable amount of lag that larger levels cause on console. There are many approaches to try and reduce lag, and one approach that I'm going to introduce in this post is a way to optimize fast3d texture processing. I will make a simple level with 2 (32x32) textures as an example.



I first draw the grass, then the orange rock wall, and then finally I draw the top part with the grass texture. Once imported, the fast3d commands for loading the textures will look something like the image diagram below.

(TMEM = Texture memory cache)


As you can see each texture is loaded up and bonded to the following triangles that will be drawn. However, looking at the image above, we see that the game has to switch from Tex1 to Tex2 and then back to Tex1. Why does it do this when we know that all of the grass triangles are going to be drawn anyways? Also, why are we not taking full advantage of TMEM? Every time we want to switch textures, we always just load it up from RAM. This is just a small example level, but just imagine a huge level from Last Impact or Star Road. The game would have to keep loading textures into TMEM from RAM tens to hundreds of times per frame!

My hypothesis

I believe that a large number of 0xF3 (G_LOADBLOCK) commands can cause a large level to lag, because of a delay between loading from RDRAM to TMEM on console. If we reduce the number of times the game has to load textures, then performance should improve to a semi-playable level. At the very least we can reduce the number of Fast3D commands the game has to process, which will definitely have a significant benefit on performance.

Does this really matter?

On console, I believe the answer is yes. See the two paragraphs above.

On emulators, the answer is no. The RAM on your modern machine is monumentally faster than anything from a game console from 1996, so moving data from a emulated N64 RAM to a emulated N64 TMEM cache is basically just copying memory around inside your computer.

2 Solutions

Group by texture

The way levels are rendered is dependent on how you draw them in sketchup. The first triangles you draw usually get rendered first (assuming your textures are fully opague). In the simple level I made above I drew the floor first, so that is where the game starts rendering with. Then it switches over to the wall texture, because I started drawing the walls after the floor. The game then has to load up the first texture again, because I am using the grass texture for the top part. That is the reason why the game has to load up the grass texture twice.

If we want to minimize the number of times the textures have to switch, then we need to group all the triangles that use the same texture together. This way we only have to load up the texture once, and then it can draw all the triangles that go with that texture. I found a free sketchup plugin that can do this easily.  It's called GroupByTexture  and was created by Rick Wilson. You can find a download for this plugin here: http://www.smustard.com/script/GroupByTexture

This plugin will explode all the groups in the model, and then regroup all the faces according to their texture. This will cause the exported .obj file to be organized properly according to the textures, and not draw order. I would only recommend using this plugin once your level is finalized, so you don't have to keep ungrouping all the faces you want to change.

Multi-Texture loading

Remember the unused 2KB of data in the TMEM? Well since there is enough room, why not use all of it? All we would have to do is change the number of texels (textured pixels) that are loaded with the 0xF3 command to account for both textures. Think of it like loading up a 32x64 texture, but we are only going to see half of the texture at a time. If we need to switch to the other texture, then we can call a 0xF5 command to change the TMEM offset to read from.

Optimization

Using both the Group by texture and Multi-Texture loading techniques, we can reduce the amount of effort it takes to setup textures. Compare the diagram below to the previous one above. We went from 3 load texture block commands down to just 1, and the number of fast3D commands has also significantly been reduced from 21 commands down to just 8.



So we can only load two (32x32) textures at a time? Big deal.

It's true that RGBA textures are expensive in terms of data size, but think about the 4-bit & 8-bit texture formats. With the 8-bit texture formats, I8 and IA8, you can load up 4 (32x32) textures at a time. If you can somehow get away with using 4-bit textures like I4 and IA4, then you can load up 8 (32x32) textures at a time. CI textures are a little different as half of the TMEM is reserved for the color palettes, so only 2 (32x32) CI8 textures and only 4 (32x32) CI4 textures can fit inside the TMEM.

The actual number of textures you can load vary based on the resolution of the image and the bit depth. As long as the data doesn't exceed 4096 bytes, you can load it to the TMEM.

Note: You can only multi-load textures with the same bit-sizes. You can load up I8 & IA8 textures at the same time, but not RGBA16 & I8.

Testing on console hardware. (Does this actually work?)

Yes, but more testing is needed. I did a quick test on my flash cartridge before writing this post, and the optimized Fast3D code above does seem to work. I cannot tell you how this affects performance yet, so take everything I say with a grain of salt. I'll try to do some more testing if I can once I'm done with my finals in less than two weeks.

If you have any questions or updates on the post, then please leave a reply below and I'll try to respond as soon as I can.
Thanks for the shout-out! #tb{^V^}

This looks very promising! Two questions: Do you think this could be applied to hacks that have already been released, or would we just be looking at new hacks going forward? And you say that more testing needs to be done - is there anything I can do to help, testing-wise? Although, I probably won't be getting much done now until after Christmas.

Originally posted by macN64
Thanks for the shout-out! #tb{^V^}

This looks very promising! Two questions: Do you think this could be applied to hacks that have already been released, or would we just be looking at new hacks going forward? And you say that more testing needs to be done - is there anything I can do to help, testing-wise? Although, I probably won't be getting much done now until after Christmas.


Yeah, we can probably apply this to older hacks. Someone would need to make a tool that can read the display lists, and then rewrite them to be more efficient.

I did some testing on SM74's third bowser course (Bowser's Crystal Castle). I basically removed all the textures in the level, and removed 1400 triangles to lower the polygon count. I managed to reduce the amount of Fast3D commands by 40%. You can see the results of that here:



Sadly, optimizing textures doesn't solve all our problems. The largest cause of lag is the number of triangles being rendered on screen. It seems like the Super Mario 64 engine just can't handle the large custom levels that people like to make. I don't even know exactly how many triangles can be on screen before the game starts slowing down. My best guess would be around 1800 to 2000 triangles.

We can write custom assembly code to cull out display lists, but I don't even know where I would start with that sort of thing.
Would be nice to see if there are any settings for draw distance. Large levels and cave levels would benefit greatly if you could lower it. You could even combine fog and low draw distance to make it look better (Though there is the issue of the fog not looking really great with the whole not affecting the skybox thing.)
Originally posted by TheStoneBanana
All of this was made possible by Totino's, guys
Sonic Mania wouldnt exist without it

Thanks for the amazing layout LDA!

Originally posted by Luigixhero
Would be nice to see if there are any settings for draw distance. Large levels and cave levels would benefit greatly if you could lower it.


...

I spent so much time trying to figure out a complicated solution, that I completly forgot about the simplest one. Thank you very much Luigixhero #tb{:)}

Yes, we can easily change the render distance by changing a value in the geometry layout script of the level. The Geometry Layout 0x0A command is used to modify how much the camera can see. The camera's far value is what is used to determine render distance. In SM74 the value is set to 0x7530, which is almost near maximum.

Code
------ SM74 Bowser course 3 geometry layout ------

01DAE610 / 19001700 [ 08 00 00 0A 00 A0 00 78 00 A0 00 78 ] // Start Geometry layout
01DAE61C / 1900170C [ 04 00 00 00 ]

/********************** Background image/color skybox **********************/
01DAE620 / 19001710    [ 0C 00 00 00 ] // disable z-buffer
01DAE624 / 19001714    [ 04 00 00 00 ]
01DAE628 / 19001718       [ 09 00 00 64 ]
01DAE62C / 1900171C       [ 04 00 00 00 ]
01DAE630 / 19001720          [ 19 00 00 00 00 00 00 00 ] // render background image/color (set to the color black)
01DAE638 / 19001728       [ 05 00 00 00 ]
01DAE63C / 1900172C    [ 05 00 00 00 ]

/********************** 3D Geometry **********************/
01DAE640 / 19001730    [ 0C 01 00 00 ] // enable z-buffer 
01DAE644 / 19001734    [ 04 00 00 00 ]
01DAE648 / 19001738       [ 0A 01 00 2D 00 64 75 30 80 29 AA 3C ] // set camera frustum properties (fov, near, far, etc)
01DAE654 / 19001744       [ 04 00 00 00 ]
01DAE658 / 19001748          [ 0F 00 00 10 00 00 07 D0 17 70 0C 00 00 00 EE 00 80 28 7D 30 ]
01DAE66C / 1900175C          [ 04 00 00 00 ]
01DAE670 / 19001760             [ 15 01 00 00 0E 05 58 F0 ] // render level geometry
01DAE678 / 19001768             [ 17 00 00 00 ] // render level objects
01DAE67C / 1900176C		[ 18 00 00 00 80 27 61 D0 ]
01DAE684 / 19001774          [ 05 00 00 00 ]
01DAE688 / 19001778       [ 05 00 00 00 ]
01DAE68C / 1900177C    [ 05 00 00 00 ]

/********************** HUD stuff **********************/
01DAE690 / 19001780    [ 0C 00 00 00 ] // disable z-buffer
01DAE694 / 19001784    [ 04 00 00 00 ]
01DAE698 / 19001788       [ 18 00 00 00 80 2C D1 E8 ] // render cannon HUD circle mask
01DAE6A0 / 19001790    [ 05 00 00 00 ]

01DAE6A4 / 19001794 [ 05 00 00 00 ]
01DAE6A8 / 19001798 [ 01 00 00 00 ] // End Geometry layout

I changed the far value to a much lower number, and the beginning part of the level became a lot more playable.

Code
 [ 0A 01 00 2D 00 64 14 00 80 29 AA 3C ] 



(You can see where the level geometry is getting cutoff at the top-left)

Originally posted by Luigixhero
You could even combine fog and low draw distance to make it look better (Though there is the issue of the fog not looking really great with the whole not affecting the skybox thing.)


One way to get around this is to make the background the same color as the fog. It might not look as nice as having a textured background, but it works. You can change the background color by modifying the geometry layout 0x19 command, as there isn't an option in Skelux's SM64 editor yet (as of v2.0.6).



I could probably make an optimizer tool that can let you change settings like this, optimize levels display lists, and have it fix older ROM hacks using queueRAM's and macN64's fixes in the console capability thread. I'm not going to guarantee anything, but I'll see what I can come up with.



this is really neat stuff, good work. even if its not a big enough speedup to make really high-poly levels suddenly playable, it sounds like a pretty significant optimization. it would be interesting to see if theres a noticeable difference in how short the draw distance needs to be for a level to run nicely with this optimization versus without.

could the draw distance vary depending on where you are in a level, or is it just a constant? thats beyond the scope of what an optimizer tool could do automatically I guess but I can see it being useful to increase the draw distance for more enclosed areas within a level.

how bad does the draw distance reduction need to be for some of these hacks though? fog to hide the draw distance looks fine up to a point but if shits gonna look like silent hill thats not exactly ideal lol. there's a certain point where it starts to affect the gameplay. manually reducing the polygon count and maybe using the 0D command to swap between low and hi poly geometry depending on distance might be better in a lot of cases but I guess that would require actual work.


Originally posted by ergazoobi
this is really neat stuff, good work. even if its not a big enough speedup to make really high-poly levels suddenly playable, it sounds like a pretty significant optimization.


Oh definitely. Something I noticed in the original Bowser 3 is that the level is slow even if there are no triangles shown on screen. I believe that this is caused by the game processing too many F3D commands. Reducing the number of commands from 10,000 to 6,000 made the game run much better when not looking at the castle.

Originally posted by ergazoobi
it would be interesting to see if theres a noticeable difference in how short the draw distance needs to be for a level to run nicely with this optimization versus without.


It really just depends on how densely packed all the triangles are from what I've seen. An open field level like Bob-omb Battlefield with only 950 triangles can render fine with max draw-distance, while a small castle level with 3,000 triangles can cause lag with a short render distance.

Originally posted by ergazoobi
could the draw distance vary depending on where you are in a level, or is it just a constant? thats beyond the scope of what an optimizer tool could do automatically I guess but I can see it being useful to increase the draw distance for more enclosed areas within a level.


Yes, we can easily modify the render distance using custom ASM code at any point in real time.

(Warning: Technical stuff including assembly code lies below)

Geometry layout scripts are made up of nodes. All the commands that go in-between the 0x04 & 0x05 commands make up a single node, and each node can contain multiple children with their own nodes. You can think of it like an XML file.

Code
<GeoLayout scrX="0xA0" scrY="0x78" scrW="0xA0" scrH="0x78">
	<Background zBuffer="disabled"> ... <\Background>
	<3DGeometry zBuffer="enabled"> 
		<Frustum near="0x64" far="0x7530" fov="45">
			...
		<\Frustum>
	<\3DGeometry>
	<Hud zBuffer="disabled"> ... <\Hud>
<\GeoLayout>

(Note: this is just a visual representation, SM64 doesn't actually use XML files)

What we would need to do is get the "far" value from the "Frustum" node. To do this we need to find the location of the "Frustum" node in RAM. I won't go into much detail on this, but the general path to get to it is: (Root node -> child node -> next child node -> grandchild node)

The pointer to the current level's root node is located at the RAM address 0x8033B910

With that information, we can get the render distance value itself with only 6 ASM instructions. This should work for any level in any ROM hack (assuming that the hack creator didn't change the overall structure of the geometry layout script).

Code
lui t0, 0x8034
lw t0, 0xB910(t0) // t0 = rootNode
lw t0, 0x10(t0) // t0 = rootNode->childNode
lw t0, 0x8(t0) // t0 = rootNode->childNode->nextNode
lw t0, 0x10(t0) // t0 = rootNode->childNode->nextNode->childNode
// Now we just add 0x22 to the pointer to get the far value's pointer.
addiu t0, t0, 0x22 // t0 = ptr to camera's far value


I wrote some code to let me increase/decrease the render distance manually using the dpad. The code should be compatible with most roms, and also works well with the 8MB rom.

See it in action: https://gfycat.com/InfatuatedCheeryBettong
Source code (CajeASM v7.24): http://pastebin.com/raw/LFtPZTp1

It wouldn't be difficult to write some custom collision code that would change the render distance once mario steps on a certain piece of geometry (like entering a room)

Originally posted by ergazoobi
how bad does the draw distance reduction need to be for some of these hacks though? fog to hide the draw distance looks fine up to a point but if shits gonna look like silent hill thats not exactly ideal lol. there's a certain point where it starts to affect the gameplay. manually reducing the polygon count and maybe using the 0D command to swap between low and hi poly geometry depending on distance might be better in a lot of cases but I guess that would require actual work.


I'd probably say that a good range would be between 0x1400 and 0x2000. The absolute minimum would probably have to be 0xB00, but a low render distance won't always mean a good framerate.

You are completely right about having to manually reducing polygons. Ultimately, a hack creator will have to find a way to reduce the polygon count themselves. No amount of display list optimization can possibly make the game run perfectly smooth. I would tell people to split up their levels into multiple sub areas, but even that is not very convenient at the moment. I just hope Skelux decides to natively add it to the SM64 Editor sometime in the future.
Incredible stuff here guys! I tried out your code David to change the render distance on the fly (very cool being able to do that!), pity it doesn't seem to help the slowdown much on it's own, from what I can see.

When I set the render distance very low (with Mario's model being essentially the only triangles on screen), I noticed I could make Mario long jump into a non-rendered wall and collide with it. That made me wonder about the non-visible collision map (unsure of my terminology here). If a very small area of a high-polygon level is visually rendered on-screen, does that mean the console still has to process the collision for the rest of the unseen level? I've no idea how resource intensive or otherwise dealing with collision is, it's just a thought that occurred to me.

Originally posted by macN64
I noticed I could make Mario long jump into a non-rendered wall and collide with it. That made me wonder about the non-visible collision map (unsure of my terminology here). If a very small area of a high-polygon level is visually rendered on-screen, does that mean the console still has to process the collision for the rest of the unseen level? I've no idea how resource intensive or otherwise dealing with collision is, it's just a thought that occurred to me.


In SM64 the collision map is loaded and processed separately from the visual map. You can view both maps in TT64.



I haven't dug too deep on how collision actually works, but from what I understand the game looks for the collision triangle directly under Mario. If a collision triangle cannot be found, then the game will instantly kill Mario. Each triangle has a "collision type id" associated with it, like id 0x01 is for lava that burns Mario on contact. Each collision triangle also has height information (based off of the collision type?). You can see the death floor in the above right image. When Mario falls off the level, the death floor can detect how far Mario is above it; so the game knows to warp out of the level if Mario gets too close to it.

The collision data is loaded from the level script command 0x2E from a segmented address.
Here are some notes by Kaze on how the data is loaded.

Kaze wrote his own optimized collision routine to improve the framerate on emulators, but I can't seem to get it to work myself with the original game. I think it only works for ROMs that have "Extend Level Boundaries" patch, as I know that patch creates a noticeable amount lag on emulators.
I'd imagine that extraneous collision data would cause some slowdown, so if the collision is more optimized, that would ideally cause smoother play.
Pages: « 1 »
Forum Index - Non-SMW Hacking - Super Mario 64 Hacking - ASM and Data Repository - [Fast3D] Improving the way we handle textures in custom levels

The purpose of this site is not to distribute copyrighted material, but to honor one of our favourite games.

Copyright © 2005 - 2020 - SMW Central
Legal Information - Privacy Policy - Link To Us


Total queries: 16

Menu

Follow Us On

  • YouTube
  • Twitch
  • Twitter

Affiliates

  • Super Mario Bros. X Community
  • ROMhacking.net
  • Mario Fan Games Galaxy