From: stevenj@alum.mit.edu (Steven G. Johnson) Newsgroups: sci.math Subject: Re: primitive roots and parallel FFT Date: Sat, 25 Oct 1997 20:36:39 -0400 kpolit@irio.pz.zgora.pl (Kamil Politowicz) wrote: >2. The only FFT algorithm I know requires that the vector's length n >is power of 2. Is there any other algorithm suitable without this >limitation? Many other FFT algorithms exist, most of which are not limited to powers of two. In fact, there are even FFT algorithms for prime lengths. The most commonly-implemented FFT algorithm is Cooley-Tukey, which takes a DFT of composite size N = N1 * N2 and expresses it as N2 DFTs of size N1 followed by N1 DFTs of size N2 (with some multiplications by phase factors in between). You then do this recursively to get an O(N lg N) algorithm. The algorithm you are thinking of is just Cooley-Tukey in the special case of N2 = 2, which is known as the "radix-2" Cooley-Tukey algorithm. Implementations which handle other factors are typically called "mixed-radix" FFTs. Of course, the recursion in Cooley-Tukey has to stop when you reach a prime length. Most implementations require that the N be factorizable into small primes (e.g. < 10), and simply use the O(N^2) algorithm when those sizes are reached. However, there are algorithms to compute FFTs of prime lengths in O(N lg N) time. These are typically faster than the O(N^2) algorithm for anything more than very small primes, but are much slower than FFTs of nearby composite lengths. The most commonly-implemented algorithm of this sort is known as the "chirp-z" algorithm and is due to Bluestein. There are many other FFT algorithms, e.g. prime-factor algorithms, Winograd's algorithm, Bruun algorithms, and the QFT algorithm, to name a few. It is actually possible...and potentially useful...to mix several of these algorithms in a single program. I would recommend that you stick to Cooley-Tukey in the beginning, though; you can always implement other algorithms later. (Note that all the algorithms I am alluding to here are exact...I am not referring to any sort of zero-padding, which introduces errors into the spectrum.) Most parallel 1D FFT implementations that I am aware of do something like the following: Do a standard Cooley-Tukey algorithm using N1 = sqrt(N) (or the closest factor of N to sqrt(N)): 1) Do the N2 FFTs of size N1. You can set up the array so that these transforms are all uniprocessor transforms. 2) Perform a global transpose of the array (viewed as an N2 by N1 2D array). This requires communication among the processors. 3) Multiply by the phase factors ("twiddle factors"). 4) Do the N1 FFTs of size N2. Again, because of the transpose, these will all be uniprocessor transforms, and require no communication. 5) Optionally, transpose the array back to its original order. This method is sometimes called the "4-step" FFT, and was first described explicitly by Bailey. It is, however, essentially just a specialized form of Cooley & Tukey's 1965 FFT algorithm. You can find links to more information on FFTs, along with source code, at the FFTW home page: http://theory.lcs.mit.edu/~fftw FFTW, a C FFT library (not limited to powers of two), also contains parallel, distributed memory, multi-dimensional FFT code which could be adapted to do parallel 1D FFTs (by the above method) without great difficulty. I would recommend that you make use of some pre-existing FFT implementation, rather than writing one yourself from scratch, as implementing a fast FFT program turns out to be non-trivial. If FFTW doesn't suit your needs for some reason, the FFTW home page contains links to many other sources of FFT code on the web. Good luck! Cordially, Steven G. Johnson PS. Unfortunately, the information available on-line regarding FFT algorithms other than radix-2 Cooley-Tukey is extremely poor. If you really want to get into this, you'll have to dig into the print literature. As a starting point, you might look at the references listed in our paper on FFTW (available at the above web site).