From: cmonico@nd.edu (Chris Monico) Subject: Re: Finding repetition cycle Date: Sat, 03 Apr 99 09:15:05 GMT Newsgroups: sci.math To: sven@customized.com Keywords: Floyd's cycle-finding algorithm In article <3705B3E3.11D2@customized.removethis.com>, Sven wrote: >Given a stream of numbers, unending, yet with the provision that after >an initial group of "random" ones, there will eventually be a sequence >that repeats, what is the algorithm to search for this sequence? > >It is not known what the size of sample in the sequence, nor the size of >the initial non-sequential set. I know it exists - but I can't remember >it. Something to do with keeping track of an ever-increasing distance >between sample numbers... the sequence may not be identified >immediately, but will be found, guaranteed. > >Can someone please point me to the answer? > >Sven I think it sometimes goes by the name of "Floyd's cycle finding algorithm". The idea is this: Let {x_k} denote the sequence. for i=1,2,... compare x_i and x_{2i} Suppose we've found x_j = x_{2j}, and the cycle length is L (unknown). Then 2j = mL + j, for some m. ==> j = mL ==> L | j. Now factor j, and find the smallest divisor of j, d, such that x_j = x_{j+d}. This d will be the cycle length. i.e., L=d. Now, the practical question: How long will this method take? The easiest way to see this (which, I hear, is where the Pollard Rho method gets it's name): We represent the sequence with a letter "rho" in the following way: Let a be the least positive integer such that x_n = x_{n+L} for all n>=a. i.e., where the first cycle begins. The sequence begins at the bottom of the "rho", and x_a will be the point where the "rho" self-intersects. It would help to draw the picture, but from there, it's easy to see that this method will succeed before (or when) x_j is exactly half-way around the circular part of the "rho". That is, if the first cycle begins at 'a' and the cycle length is L, we will find x_j = x_{2j} after at most a + L/2 comparisons. So, we have to examine 2a + L total elements of the sequence, but the number of comparisons we do is very low and the memory requirement for this algorithm is constant (independent of the sequence length). It's a good algorithm. If you draw the picture I tried to describe, it should be clear that this is, in fact, a good method for the problem. If it's not clear, look up Floyd's cycle finding algorithm or Pollard Rho factorization, in your favorite book of algorithms. Hope this helps.