Consider direct page is only faster when its low byte is 00 only. so if you set it for example at 2100 then its okay, but 2001 would to my understanding take as many cycles as address-sized reads/writes would.
To see how much cycles an operation takes, we refer to this
or use the command cycle
. (eg cycle lda #$00 : sta $00)
e: also in smw dp is never backed up during interrupts so should you change its value then an irq/nmi occurs that'll most likely cause a crash. so you should hack nmi/irq to preserve it (and set it to 0000) or use SEI : STZ $4200 when dealing with touching that, which is the main reason not to use it. The cycles saves are negligible, either way, unless in a time consuming operation such as a blank.
; when the nmi/irq vectors hit processor flags are also pushed.
; rti pulls them back.
; the game does php : plp again which isn't needed,
; so you can replace those with direct page preserves.
; and somewhere else hijack and do PEA $0000 : PLD