From: Ray Chason <johnnyreb.3@southland.smart.net.SPAMMEN.VERBOTEN>
Subject: Re: Fast multiplication and division
Date: 22 Oct 1999 05:56:15 GMT
Newsgroups: sci.math
Keywords: Karatsuba multiplication of large integers

Quanta <quanta@cybercable.fr> wrote:

>Hello, Do you where can I find any full description of the Karatsiba or
>Knuth algorithm to multiplie and divide two big integer as fast as
>possible to translate it in C programming??

Most fast integer multiplication algorithms are really polynomial 
multiplication algorithms.  You break the integers into pieces, consider 
each of the pieces a coefficient of a polynomial, multiply the polynomials, 
and then evaluate the product polynomial.

For instance:  1234 * 5678
(x^3 + 2*x^2 + 3*x + 4)(5x^3 + 6x^2 + 7*x +8)  (x = 10)
5x^6 +16x^5 + 34*x^4 + 60*x^3 + 61*x^2 + 52*x + 8
which is 7006628 at x=10.

So, to some algorithms.

1) Karatsuba multiplication is simple to explain:

   * Start with your factors A and B.

   * Consider each to be a binomial (A1*x+A0) and (B1*x+B0), choosing x so
     as to break the two numbers in half.

   * The product of these two binomials is:
     A1*B1*x^2 + (A1*B0+A0*B1)*x + A0*B0.

   * The trick is to evaluate the center term as:
     (A1+A0)*(B1+B0) - A1*B1 - A0*B0
     Since you need to evaluate A1*B1 and A0*B0 only once, this formula
     requires only one additional multiplication.  So you multiply numbers
     twice as long with three times as many operations, where the grammar
     school algorithm would require four times as many.

   * Use recursion to evaluate the three products.  Fall back to the grammar
     school algorithm once the factors are small enough (it's much like
     optimizing a Quicksort with an insertion sort).

2) I suppose anything called a "Knuth algorithm" would be in one of his
   books.  By this, do you perhaps mean an FFT based algorithm?  A Fast
   Fourier Transform evaluates a polynomial at a set of complex roots of
   unity.  Once you've done this with your two factors, you multiply
   each of the elements of FFT(A) with the corresponding element of FFT(B).
   Follow with the inverse transform and you have your product polynomial.

   There is a trap in this algorithm:  floating point arithmetic is only so
   accurate.  There will be some roundoff noise in the product, and you want
   to be sure it's to the right of the decimal point, or your product will
   be wrong.  Use long double if your compiler supports it, and put only a
   few bits in each coefficient.

3) A variant of the above uses the number theoretic transform.  An NTT is
   an FFT in a finite integer field.  The convolution with NTTs yields the
   residues, modulo the modulus (argh!) of the finite field, of the
   coefficients of the product  polynomial; but you can do multiple
   NTT-based convolutions (typically three) and apply the Chinese Remainder
   Theorem to recover the coefficients.

   NTT-based multiplication can be faster than FFT, despite having to do
   three sets of transforms, because it uses only integer arithmetic; it
   also doesn't suffer from roundoff noise, and so it can put more bits in
   each coefficient.


-- 
 --------------===============<[ Ray Chason ]>===============--------------
         PGP public key at http://www.smart.net/~rchason/pubkey.asc
                            Delenda est Windoze!
==============================================================================

From: Randy Poe <qnd@dgsys.com>
Subject: Re: Fast multiplication and division
Date: Sun, 24 Oct 1999 10:22:39 -0400
Newsgroups: sci.math

Ray Chason wrote:
> 
> Quanta <quanta@cybercable.fr> wrote:
> 
> >Hello, Do you where can I find any full description of the Karatsiba or
> >Knuth algorithm to multiplie and divide two big integer as fast as
> >possible to translate it in C programming??
> 
> Most fast integer multiplication algorithms are really polynomial
> multiplication algorithms.  You break the integers into pieces, consider
> each of the pieces a coefficient of a polynomial, multiply the polynomials,
> and then evaluate the product polynomial.
> For instance:  1234 * 5678
> (x^3 + 2*x^2 + 3*x + 4)(5x^3 + 6x^2 + 7*x +8)  (x = 10)
> 5x^6 +16x^5 + 34*x^4 + 60*x^3 + 61*x^2 + 52*x + 8
> which is 7006628 at x=10.

[snip a nice post]
If your polynomials use the decimal digits, what you are doing
is equivalent to long multiplication as we were taught to do
in school.

I have implemented efficient long-multiplication in assembly-language
algorithms. For efficiency, you can use a natural base of the
computer. For instance, break your numbers down into 16-bit
words and use 16-bit multiply operations (and 32-bit adds).
You are in effect doing long multiplication in base 2^16.
The same thing can easily be done in C using shift and
logical AND operations to extract the bits you need.
Be VERY careful about use of signed and unsigned integers
when you do this. A shift left of a number with a 1 in the
high bit will fill the result with 1's from the left. This
is fine if you are interpreting the result as a negative
number, but disastrous if not. I like assembly because it
gives me explicit control of such fine points.

When I read a little farther in your post, I realized that
you were not talking about long multiplication at all,
but actually the FFT algorithms I have heard of.

> 
>    * The trick is to evaluate the center term as:
>      (A1+A0)*(B1+B0) - A1*B1 - A0*B0
>      Since you need to evaluate A1*B1 and A0*B0 only once, this formula
>      requires only one additional multiplication.  So you multiply numbers
>      twice as long with three times as many operations, where the grammar
>      school algorithm would require four times as many.

This is, I believe, the "butterfly" at the heart of the FFT.

Thanks for a very nice review of FFT-based multiplication
algorithms. It goes in my save file.

        Randy Poe
        VisionPlace.com
        info1@visionplace.com
==============================================================================

From: Ray Chason <johnnyreb.3@southland.smart.net.SPAMMEN.VERBOTEN>
Subject: Re: Fast multiplication and division
Date: 25 Oct 1999 06:20:30 GMT
Newsgroups: sci.math

Randy Poe <qnd@dgsys.com> wrote:

>Ray Chason wrote:
>> 
>> Quanta <quanta@cybercable.fr> wrote:
>> 
>> >Hello, Do you where can I find any full description of the Karatsiba or
>> >Knuth algorithm to multiplie and divide two big integer as fast as
>> >possible to translate it in C programming??
>> 
>> Most fast integer multiplication algorithms are really polynomial
>> multiplication algorithms.  You break the integers into pieces, consider
>> each of the pieces a coefficient of a polynomial, multiply the polynomials,
>> and then evaluate the product polynomial.
>> For instance:  1234 * 5678
>> (x^3 + 2*x^2 + 3*x + 4)(5x^3 + 6x^2 + 7*x +8)  (x = 10)
>> 5x^6 +16x^5 + 34*x^4 + 60*x^3 + 61*x^2 + 52*x + 8
>> which is 7006628 at x=10.
>
>[snip a nice post]
>If your polynomials use the decimal digits, what you are doing
>is equivalent to long multiplication as we were taught to do
>in school.

Well, yes.  But viewing an integer as a polynomial points the way to the 
faster algorithms, such as Karatsuba, or the FFT.


>I have implemented efficient long-multiplication in assembly-language
>algorithms. For efficiency, you can use a natural base of the
>computer. For instance, break your numbers down into 16-bit
>words and use 16-bit multiply operations (and 32-bit adds).
>You are in effect doing long multiplication in base 2^16.
>The same thing can easily be done in C using shift and
>logical AND operations to extract the bits you need.
>Be VERY careful about use of signed and unsigned integers
>when you do this.

It's also much faster in assembly language, as I'm sure you've noticed.  I'd 
do it in C only for something I was going to distribute, so that it could be 
made to work on any hardware.

I see from your headers that you're using Windows 98.  That means at least a 
Pentium.  Have you tried using 32 bit multiplies?  The MUL instruction can 
multiply two 32 bit unsigned integers into a 64 bit product; it makes a good 
fast long-multiplication function that much easier to write.

Maybe you're using a DOS-based compiler.  Check out
 
    http://www.delorie.com/djgpp/

if you need some development tools that support the 32 bit instructions.


>When I read a little farther in your post, I realized that
>you were not talking about long multiplication at all,
>but actually the FFT algorithms I have heard of.

Mainly the Karatsuba algorithm, since you mentioned it by name.


>>    * The trick is to evaluate the center term as:
>>      (A1+A0)*(B1+B0) - A1*B1 - A0*B0
>>      Since you need to evaluate A1*B1 and A0*B0 only once, this formula
>>      requires only one additional multiplication.  So you multiply numbers
>>      twice as long with three times as many operations, where the grammar
>>      school algorithm would require four times as many.
>
>This is, I believe, the "butterfly" at the heart of the FFT.

Actually this is the key to the Karatsuba method, and has nothing to do with 
FFTs.  The "butterfly" is really a two-element DFT and looks like this:

   Decimation in frequency:
      (A[i], A[i+step]) <-- (A[i]+A[i+step], w*(A[i]-A[i+step]))

   Decimation in time:
      (A[i], A[i+step]) <-- (A[i]+w*A[i+step], A[i]-w*A[i+step])

Here are some FFT links for you:

http://www.fftw.org/
http://www.jjj.de/fxt/
http://www.eptools.com/tn/T0001/INDEX.HTM


-- 
 --------------===============<[ Ray Chason ]>===============--------------
         PGP public key at http://www.smart.net/~rchason/pubkey.asc
                            Delenda est Windoze!