From: cmonico@nd.edu (Chris Monico)
Subject: Re: Finding repetition cycle
Date: Sat, 03 Apr 99 09:15:05 GMT
Newsgroups: sci.math
To: sven@customized.com
Keywords: Floyd's cycle-finding algorithm
In article <3705B3E3.11D2@customized.removethis.com>,
Sven wrote:
>Given a stream of numbers, unending, yet with the provision that after
>an initial group of "random" ones, there will eventually be a sequence
>that repeats, what is the algorithm to search for this sequence?
>
>It is not known what the size of sample in the sequence, nor the size of
>the initial non-sequential set. I know it exists - but I can't remember
>it. Something to do with keeping track of an ever-increasing distance
>between sample numbers... the sequence may not be identified
>immediately, but will be found, guaranteed.
>
>Can someone please point me to the answer?
>
>Sven
I think it sometimes goes by the name of "Floyd's cycle finding
algorithm". The idea is this:
Let {x_k} denote the sequence.
for i=1,2,... compare x_i and x_{2i}
Suppose we've found x_j = x_{2j}, and the cycle length is L (unknown).
Then
2j = mL + j, for some m. ==> j = mL ==> L | j. Now factor j, and find
the smallest divisor of j, d, such that x_j = x_{j+d}. This d will be
the cycle length. i.e., L=d.
Now, the practical question: How long will this method take? The
easiest way to see this (which, I hear, is where the Pollard Rho
method gets it's name): We represent the sequence with a letter "rho"
in the following way:
Let a be the least positive integer such that x_n = x_{n+L} for all
n>=a. i.e., where the first cycle begins. The sequence begins at the
bottom of the "rho", and x_a will be the point where the "rho"
self-intersects. It would help to draw the picture, but from there,
it's easy to see that this method will succeed before (or when) x_j is
exactly half-way around the circular part of the "rho". That is, if
the first cycle begins at 'a' and the cycle length is L, we will find
x_j = x_{2j} after at most a + L/2 comparisons.
So, we have to examine 2a + L total elements of the sequence, but the
number of comparisons we do is very low and the memory requirement for
this algorithm is constant (independent of the sequence length). It's
a good algorithm.
If you draw the picture I tried to describe, it should be clear that
this is, in fact, a good method for the problem. If it's not clear,
look up Floyd's cycle finding algorithm or Pollard Rho factorization,
in your favorite book of algorithms.
Hope this helps.