From: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Date: Tue, 15 Mar 2011 15:23:47 +0000 (-0400)
Subject: fsl_ddr: Don't use full 64-bit divides on 32-bit PowerPC
X-Git-Url: http://git.dujemihanovic.xyz/?a=commitdiff_plain;h=e820a131f4084397a212c6ffe3ed8ea0ce43631f;p=u-boot.git

fsl_ddr: Don't use full 64-bit divides on 32-bit PowerPC

The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit
integer divide operations to convert between nanoseconds and DDR clock
cycles given arbitrary DDR clock frequencies.

Since all of the inputs to this are 32-bit (nanoseconds, clock cycles,
and DDR frequencies), we can easily restructure the computation to use
the "do_div()" function to perform 64-bit/32-bit divide operations.

On 64-bit this change is basically a no-op, because do_div is
implemented as a literal 64-bit divide operation and the instruction
scheduling works out almost the same.

On 32-bit PowerPC a fully accurate 64/64 divide (__udivdi3 in libgcc) is
over 1.1kB of code and thousands of heavily dependent cycles to compute,
all of which is linked from libgcc.  Another 1.2kB of code comes in for
the function __umoddi3.

It should be noted that nothing else in U-Boot or the Linux kernel seems
to require a full 64-bit divide on my 32-bit PowerPC.

Build-and-boot-tested on the HWW-1U-1A board using DDR2 SPD detection.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Acked-by: York Sun <yorksun@freescale.com>
Cc: Andy Fleming <afleming@gmail.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
---

diff --git a/arch/powerpc/cpu/mpc8xxx/ddr/util.c b/arch/powerpc/cpu/mpc8xxx/ddr/util.c
index b9a5a690cb..02908b4561 100644
--- a/arch/powerpc/cpu/mpc8xxx/ddr/util.c
+++ b/arch/powerpc/cpu/mpc8xxx/ddr/util.c
@@ -8,9 +8,17 @@
 
 #include <common.h>
 #include <asm/fsl_law.h>
+#include <div64.h>
 
 #include "ddr.h"
 
+/* To avoid 64-bit full-divides, we factor this here */
+#define ULL_2e12 2000000000000ULL
+#define UL_5pow12 244140625UL
+#define UL_2pow13 (1UL << 13)
+
+#define ULL_8Fs 0xFFFFFFFFULL
+
 /*
  * Round mclk_ps to nearest 10 ps in memory controller code.
  *
@@ -20,35 +28,51 @@
  */
 unsigned int get_memory_clk_period_ps(void)
 {
-	unsigned int mclk_ps;
+	unsigned int data_rate = get_ddr_freq(0);
+	unsigned int result;
+
+	/* Round to nearest 10ps, being careful about 64-bit multiply/divide */
+	unsigned long long mclk_ps = ULL_2e12;
 
-	mclk_ps = 2000000000000ULL / get_ddr_freq(0);
-	/* round to nearest 10 ps */
-	return 10 * ((mclk_ps + 5) / 10);
+	/* Add 5*data_rate, for rounding */
+	mclk_ps += 5*(unsigned long long)data_rate;
+
+	/* Now perform the big divide, the result fits in 32-bits */
+	do_div(mclk_ps, data_rate);
+	result = mclk_ps;
+
+	/* We still need to round to 10ps */
+	return 10 * (result/10);
 }
 
 /* Convert picoseconds into DRAM clock cycles (rounding up if needed). */
 unsigned int picos_to_mclk(unsigned int picos)
 {
-	const unsigned long long ULL_2e12 = 2000000000000ULL;
-	const unsigned long long ULL_8Fs = 0xFFFFFFFFULL;
-	unsigned long long clks;
-	unsigned long long clks_temp;
+	unsigned long long clks, clks_rem;
 
+	/* Short circuit for zero picos */
 	if (!picos)
 		return 0;
 
-	clks = get_ddr_freq(0) * (unsigned long long) picos;
-	clks_temp = clks;
-	clks = clks / ULL_2e12;
-	if (clks_temp % ULL_2e12) {
+	/* First multiply the time by the data rate (32x32 => 64) */
+	clks = picos * (unsigned long long)get_ddr_freq(0);
+
+	/*
+	 * Now divide by 5^12 and track the 32-bit remainder, then divide
+	 * by 2*(2^12) using shifts (and updating the remainder).
+	 */
+	clks_rem = do_div(clks, UL_5pow12);
+	clks_rem <<= 13;
+	clks_rem |= clks & (UL_2pow13-1);
+	clks >>= 13;
+
+	/* If we had a remainder, then round up */
+	if (clks_rem)
 		clks++;
-	}
 
-	if (clks > ULL_8Fs) {
+	/* Clamp to the maximum representable value */
+	if (clks > ULL_8Fs)
 		clks = ULL_8Fs;
-	}
-
 	return (unsigned int) clks;
 }