it's a low-level optimization.
it's done as that:
- many things can be done in many ways,
- every assembler mnemonic execution costs different amount of resources as processor cycles, registers, memory on various levels, ...
- often it's better to consider a many of possible options & select most optimal by given resources ... counting the costs as processor cycles & memory use in process,
- different processors have different parameters, the same operation costs different amount of resources ... has different electronics as blitter chip or a different amount of registers. optimal code for a given processor will use skilfully & wisely these features.
but let's remember that often it's better to write code in higher level programming language as C, then recompile for newer processor as it appears.
few of commands & we have a fairly optimal code for a new architecture.
low cost of operation & we have faster system thanks to newer hardware & higher level compiler.
we can use assembler inserts for bottlenecks still.
see also, if You wish or need, ... : Optimization Triad.