From: ionels@mailcity.com (Ionel Santa) Subject: Re: ECM factoring Date: 5 Mar 00 23:11:52 GMT Newsgroups: sci.math.numberthy Summary: [missing] Some days ago, I asked something about the elliptic curve method of factoring large numbers. Paul Zimmermann recommended me the paper "Factorization of the tenth Fermat number" by Richard Brent, available at: ftp://ftp.comlab.ox.ac.uk/pub/Documents/techpapers/Richard.Brent/rpb161.ps.gz Also, a lot of informations about the ECM and gmp-ecm (a fast ECM implementation by Paul Zimmermann) are available at: http://www.loria.fr/~zimmerma/records/ecmnet.html And last but not least, these are a part of the answers that Gerhard Niklasch, one of the authors of the PARI package, kindly sent me in a couple of emails. +-------------------------+ | Choosing the exponent +-------------------------+ > What I do for the moment: supposing I try to factor N > which has n bits, I generate x0,y0, A, B such as y0^2 = x0^3 + A*x0 + > B (mod N). I know how to calculate (x0,y0)^M (in the group structure > which results), except that I don't know exactly how to optimally > choose M (for different n values). There's a fair lot of theory and some experimentation here, but mainly M should be chosen highly composite: the product of all prime powers not exceeding some bound B1 depending on the size of N, and instead of computing (x0,y0)^M in one swoop, it is better to apply the prime fac- tors of the desired M one after another (although minor gains are ultimately possible by combining factors depending on their bit patterns, but that is an advanced optimization). > ... if I understood correctly: > if I want for example to search for a factor of length 30, since > I've found in Paul Zimmermann's page that optimal B1 for 30 is > about 250,000, this depends slightly on the implementation, but his numbers are certainly good starting points > I calculate M=product (p^e_p) where e_p = log_base_p(B1) and then > (x0,y0)^M in the group. But it is better instead of pre-calculating > M, to calculate (((x0,y0)^(2^e_2)) ^ (3^e_3) ... )^ (Pmax^e_Pmax), > where Pmax=the largest prime less than B1. Yes. +-------------------------+ | Optimisations +-------------------------+ > Also, I don't know the Peter Montgomery's fast modular > multiplication algorithm for speeding up the process One part of it is working on several curves in parallel, which allows you to trade divisions mod N for multiplications mod N, 3 of the latter for one of the former. The other is a fancy way of decomposing the (x0,y0)^p operation into additions and doublings of points on the curve in such a way as to keep the number of operations low. It is good, but not always optimal; on the other hand, it is far simpler than the optimal procedure (for which the order of operations essentially needs to be determined in advance for each single case, and stored in a table). > For calculating (x0,y0)^n, currently I decompose n in base 2, and I > make one addition when it is 1 in the base 2 representation, and one > doubling for every step. Right. There's a constant factor to be gained here on average. One strategy is to write (x0,y0)^n = A^d+B^e (eek, this power notation is ghastly - most of the literature would use [n]P to denote the result of multiplying the point P by the integer n, thus [2]P is doubling on the curve) where A,B,d,e are initialized in some simple way, more or less A = (x0,y0) and B = [2]A and d=n and e=0, and then transformed in a sequence of simple steps until one of A,B contains the result. This is _one_ of Montgomery's many contributions! The simple steps here depend on properties of the current d,e mod 2 and mod 3, and amount to stuff like doubling one of A,B, replacing one of them by their sum on the curve, etc. I don't remember the details offhand (and they depend on the way points on the curve are represented - the algorithm is more complex when homo- geneous coordinates (x,y,z) are used, but can then dispense with most divisions mod N; in the PARI environment, we use affine coordinates (x,y) and a simpler form of the algorithm). This Montgomery trick gains, on average, a constant factor over the obvious binary shift algorithm, in terms of total number of doublings and additions performed. (And by the way, it works only when the multiplier is prime.) Slightly more can be squeezed out by constructing really optimal sequences of addings and doublings for each possible multiplier. Ony may need more than two intermediate points for these, though (instead of just the A,B in Montgomery's method). _Another_ optimization is to observe that, when using affine coordinates, the computation time is likely to be dominated by the divisions mod N, and when N is at all large, a single division takes more time than three multiplications mod N. On the other hand, we want to look at very many curves (with the same M, before switching to a larger M). Montgomery's next trick here is to observe that you can compute a^(-1), b^(-1), ... x^(-1) mod N simultaneously, with just one division mod N, trading three multiplications for each of the other divisions. Compute and remember ab, abc,..., abc...x mod N, then take (abc...wx)^(-1), then recover x^(-1) from (abc...w)*(abc...wx)^(-1) and (abc...w)^(-1) from x*(abc...wx)^(-1), and proceed backwards until you have a^(-1). > The fast modular multiplication which I didn't know about was: This is _yet_ _another_ of Montgomery's bright ideas! (Also starring in Knuth, TAOCP Vol.2, 4.3.1, exercise 41.) > If we want to make multiplication mod N, where N has n bits and > it is odd, we use the relations: > A -> A' = A*2^n (mod N) > A' [*] B' = A'*B'*2^(-n) (mod N) > A' -> A = A' [*] 1 > where [*] is a new opperation, and * the usual product. > I knew about the relations, but I couldn't figure out how this > may help, until I realized that for calculating A' [*] B', we > calculate A'*B' and then by adding proper multiples on N we can > make the last n bits to be equal to 0, and so we don't need > anymore to make divisions after the product. For my implementation > it resulted in a factor gain of 30 for the multiplication mod N > speed. The magnitude of the gain I find surprising - perhaps the division- with-remainder implementation wasn't quite optimal...? (Part of the gain should come from the possibility of conserving space, by inter- lacing the additions which make up the multiplication with fixing up those low-order bits which are already determined.) > You were right. After I looked over the division function, I've found > that I was making quite a lot of useless shifts. After fixing that, > the factor gain was ~ 8. +-------------------------+ | Second phase of ECM +-------------------------+ > and I don't know which is supposed to be the second phase of > ECM. The second phase consists in observing that many composite numbers have only one large prime factor, and the second largest is a lot smaller. Therefore it is advantegeous to shoot for curves and points (x0,y0) on them whose orders are not divisors of M, but rather, of the form (divisor of M) * (one prime between B1 and B2), where B2 is typically chosen about 100*B1. This can be checked a _lot_ faster than the first phase. Instead of powering ((x0,y0)^M)^p for each prime in the range, we proceed from ((x0,y0)^M)^p to ((x0,y0)^M)^p' by "multiplying" the former with ((x0,y0)^M)^(p'-p). (Actually, this amounts to an addition on the curve.) (x0,y0)^M remains fixed throughout this phase, and since p'-p frequently takes small even values and many of these repeatedly, a lot of work is saved by precomputing these. Moreover, we do not actually need to finish the computation of any of the ((x0,y0)^M)^p; we only need to determine whether one of them will give us a factor of N. Thus it is enough to compute denominators and forget about the numerators in the addition formula. > I use for additions the formulas: > x'' = -(x+x')+m^2, > y'' = -[y+m*(x''-x)], > where: > m = (y'-y)/(x'-x) if y<>y' > m = (3x^2+A)/(2y) if y=y' (when it is the same point) > > What means to forget about the numerators? If you want to know whether the computation of P+Q on the curve will reveal a factor of N, you don't have to compute P+Q. You only need to work out x'-x (and take its gcd with N). Provided you don't need P+Q for later computations, of course; but during the second phase most such sum-points will indeed never be needed again. If I already have computed (first phase) Q=[M]P from my original P, with B=181 say (just a toy example), and now want to check whether any of [191]Q, [193]Q, [197]Q, [199]Q, [211]Q, [223]Q, [229]Q, [233]Q will lead to a factor of N, I compute [211]Q along with a table of small even multiples of Q, and then check the x-x' arising from [211]Q-[20]Q, [211]Q-[18]Q, [211]Q-[14]Q, [211]Q-[12]Q, [211]Q+[12]Q, [211]Q+[18]Q, [211]Q+[22]Q without working out the x,y coordinates of any of these points. (In fact, I only need a table of x-coordinates of small even multiples of Q.) (And you don't want to compute gcd(x-x',N) for each individual pair x,x'. Instead, accumulate a product of a few hundred x-x' mod N, and then take the gcd of the product with N. If small prime factors, less than 10^8 say, have been removed from N by trial division and Pollard rho before launching ECM, the chances that you might miss a factorization N=RS by picking up one x-x' divisible by R and another divisible by S are going to be remote, and if it happens, you still have the option to go back and take individual gcds, or simply throw the current curves away and start with a fresh set of curves.) +-------------------------+ | Miscellaneous +-------------------------+ > I've found somewhere that it is possible to write the same curve > in different ways. It is better to use other formulas? That depends highly on the properties of your implementation. Con- ventional wisdom has it that projective coordinates should lead to a faster algorithm, mainly by getting rid of almost all the divisions mod N at the cost of an increased number of multiplications, but when I re-wrote the PARI implementation a couple of years ago, I found this was not borne out in practice, given PARI's underlying long-integer and modular arithmetic.-- Another aspect is that something can be gained by working only with curves where one knows in advance that the order of the initial point will be divisible by 4; however, the price comes in the form of a more complicated equation for the curve, and thus more complicated addition and doubling formulas. (More work on each curve, but you expect that you won't have to check as many curves before you hit upon a factor of N.) > Or it is better to work with fractions like x=xni/xdi, yi=yni/ydi? If you do it consistently, then you are working with projective coordinates on the curve. ;) For an implementation along these lines, check out the file src/basemath/ifactor1.c in the PARI/GP distribution. (http://www.parigp-home.de/ and follow your nose from there. :) ... and by all means check out Paul's code for comparison! The PARI code is heavily commented, but of course it is quite a mouthful... Standard reference for much of this, besides Knuth, is Henri Cohen's A Course in Computational Algebraic Number Theory, Springer GTM 138.