t0 is the start address of gd, t1 is the end address of gd (t0 + GD_SIZE).
[PTR_ADDIU t0, PTRSIZE] is in the delay slot of [blt t0, t1, 1b], so it
will be executed before the branch operation.
However the comparison for the BLT instruction is done before executing the
delay slot. This means when the last word just before k1 is cleared, the
loop will continue to run once. This will clear an extra word at k1, which
is outside the global data.
Global data is placed at the top of the stack. If the initial stack is a
SRAM or locked cache, the area outside them may be inaccessible. A write
operation performed in this area may cause an exception.
To solve this, [PTR_ADDIU t0, PTRSIZE] should be placed before the BLT
instruction.
Reviewed-by: Daniel Schwierzeck <daniel.schwierzeck@gmail.com> Reviewed-by: Stefan Roese <sr@denx.de> Signed-off-by: Weijie Gao <weijie.gao@mediatek.com>