I forgot to reply this and I should feel bad.
Since 2011 I'm been working on various rotation algorithms. It is not the most efficient thing the SA-1 can do, it requires looping though every pixel of the image, which on a 32x32 image means 1024 operations, of adding + subtracting sine waves and reading + storing pointers.
For that reason, I recommend you putting the rotation code as parallel mode code. Putting it as parallel mode will make the code only run while SA-1 CPU is idle, so it won't affect main game's performance. In case it can't rotate the image in a single frame, the next frame will be simply dropped. So on worst case when the game is busy processing lot of sprites, the rotation code can run at around 30 FPS or so, but the user will rarely notice it, which is cool.
Here is the latest version from my rotation code. It's slight faster compared to old version, but still I recommend it putting on parallel mode for best performance with minimal impact. See the SA-1 Pack readme for more information about it. Feel free to ask here if you're in trouble with porting to it.
The code should not be very hard to other sizes like 16x16 or 64x64. To rip graphics, use SnesGFX's 4BPP Planar format and a .png image with the exact size you want.
GitHub - Twitter - YouTube - SnesLab Discord