Replace the current 2-instruction 2-step tripling code by a
corresponding single instruction leveraging ARMv8-A's "flexible second
operand as a register with optional shift". This has the added benefit
(albeit arguably negligible) of reducing the final code size.
Fix the comment as the tripled cache level is placed in x12, not x0.
Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
/* x15 <- return address */
loop_level:
- lsl x12, x0, #1
- add x12, x12, x0 /* x0 <- tripled cache level */
+ add x12, x0, x0, lsl #1 /* x12 <- tripled cache level */
lsr x12, x10, x12
and x12, x12, #7 /* x12 <- cache type */
cmp x12, #2