When Core-0 handle SMP_ASK_C0COUNT IPI, we should make other cores to
see the result as soon as possible (especially when Store-Fill-Buffer
is enabled). Otherwise, C0_Count syncronization makes no sense.
BTW, array is more suitable than per-cpu variable for syncronization,
and there is a corner case should be avoid: C0_Count of Core-0 can be
really 0.