21. Factoring Numbers I


1. Introduction


The RSA type encoding can be broken if an intruder can factor the number N of its public key. We here describe some of the methods that people have devised for factoring numbers.


Nowadays by distributing computations over many processors and many machines it is possible to perform up to roughly 1015 operations in attempting to do such a thing.


The mindless way to factor a large number N is to do it by testing every prime up to the square root of N to see if it is a factor. Even with the massive efforts required to perform 1015 operations and even if testing a prime to see if it is a factor could be accomplished with one operation, you could not factor a number much above 1030 this way.


There is a second algorithm, which we will call the tortoise and hare algorithm that is somewhat better, but not great, that is much more fun, and we will describe it.


Finally there are two methods which have roughly the same prospects and are considered to be the best available, and we will describe them briefly.


We begin by asking, what information can lead us to a factor of N. We will suppose that N is a product of two distinct primes p and q.


(In all our discussions we will ignore the possibility that N is not a prime but a power of primes, like 9, 27, 125 and so on. For any N it is easy to check whether each of its k-th roots are integers for k up to log1000N and whether N is divisible by any number up to 1000, and that is all we need do to investigate this possibility. Therefore we will assume here that N is the product of two relatively prime numbers ab. The hardest case will be when a and b are prime, so we shall assume that as well.)


By the Chinese Remainder Theorem, we know that we can describe any remainder x mod N by its remainders mod p and mod q, (xmodp,xmodq).


If we should stumble across an x whose pq remainders are of the form (0modp,amodq) and a is not 0, we can factor N by applying Euclid’s algorithm to x and N.


x of this form will be a multiple of p, and so is N so the gcd of the two, which Euclid’s algorithm will disclose, will be p, and we will have factored N.


Thus we want to seek x of the form (0modp,amodq).

Two of our methods of factoring are based on clever ways to seek such numbers. The third is based on looking for pairs of numbers whose squares, mod N are the same.


Suppose we can find two remainders mod N, x and y, that take the form (amodp, bmodq) and (amodp, (q-b)modq). These numbers do not sum to N, but have the same square.


If we take their difference, or their sum, we get (0modp, 2bmodq) or (2amodp, 0modq) and we will have a number either way of the form that allows us to factor N.


2. The Tortoise and Hare Algorithm


This approach is based upon the fact, that if you wander at random among p vertices, you can expect to visit a vertex a second time after roughly p1/2 steps.


This claim is related to the fact that with 23 people there is an even chance that two of them have the same birthday.


For, the probability that you visit a new vertex on your next step after you have been to k of them is (1-k/p). And similarly if a new person, x, enters a room after k are already present and all have different birthdays, the probability that x has birthday different from all is (1-k/365).


And here is the method. We choose a polynomial f like x2 + x + 1, and define f(k)(x) to be its k-th iterate mod N: that is, we define


f(k)(x) = f(f(k-1)(x)) mod N.


The Chinese remainder theorem tells us that the representation of remainders mod N by pairs of remainders mod p and mod q is preserved under arithmetic operations like those involved in the formation and iteration of f.


As a result we can keep track of the behavior of these iterates mod p and mod q separately.


And what will that behavior be like?


Well, mod either p or q this function, starting anywhere, can be expected to wander around, in some unknown fashion, until it hits an integer that it has hit before.

After that it will follow its old path around the cycle formed by its steps between these hits on forever.


For p=23, for example, the function 1 + x + x2 goes from 1 to 3 to 13 to 22 and then to 1 and around again.


While this is happening mod p, something similar will happen mod q, but quite probably will cycle at some slightly different step.


For example, the same equation for q=19 goes from 1 to 3 to 13 to 12 to 5 and then to 12 then 5 12 5 12 on forever.


How do we exploit this behavior?


Imagine that we have a tortoise and a hare that race each other by iterating f. The tortoise makes one iteration each step, while the hare makes 2.


This means the hare will enter the cycle mod p or mod q before the tortoise does, but once the tortoise enters the cycle the hare will catch up to the tortoise by one vertex of the cycle, until they meet. They will meet again in a number of steps given by the length of the cycle, on forever.


Notice that once the tortoise enters the cycle, the hare will catch up in a number of steps that is at most the length of the cycle. Thus, at worst, the tortoise and hare will meet (that is, agree) mod p or mod q in a number of steps given by the step number at which the iteration hit it a value that it had previously hit.


So our algorithm consists of forming f2k)(x0) – f(k)(x0) mod N for each k, and applying Euclid’s algorithm to N and this difference.


It so happens that each of these actions is extremely easy to do on a spreadsheet, so you can factor by this method very conveniently.


Since we guess it will take roughly p1/2 or q1/2 steps for a cycle to be formed, and since one of p and q has to be less than N1/2, this method can be expected to take on the order of N1/4 steps, and this is a bit too much for really large n, (above 50 or 60 decimal digits).


By the way if you guess wrong and happen to choose a function for which you do not get cycling after N`1/4 steps, you can suspect that N is a prime, or try a different function. (More than likely there is a bug in your algorithm though!)


However you can factor numbers whose size is half the precision of your computation on a spreadsheet with hardly any effort.


Here is what the first few columns should look like


factor N

33897117

1 + x + xx)




1

=MOD(1+B2+B2*B2,$B$1)

=B2

=B1


=MOD(1+C2+C2*C2,$B$1)

=MOD(1+B3+B3*B3,$B$1)

=MOD(1+D2+D2*D2,$B$1)

=E2


=MOD(1+C3+C3*C3,$B$1)

=MOD(1+B4+B4*B4,$B$1)

=MOD(1+D3+D3*D3,$B$1)

=E3


=MOD(1+C4+C4*C4,$B$1)

=MOD(1+B5+B5*B5,$B$1)

=MOD(1+D4+D4*D4,$B$1)

=E4


=MOD(1+C5+C5*C5,$B$1)

=MOD(1+B6+B6*B6,$B$1)

=MOD(1+D5+D5*D5,$B$1)

=E5


=MOD(1+C6+C6*C6,$B$1)

=MOD(1+B7+B7*B7,$B$1)

=MOD(1+D6+D6*D6,$B$1)

=E6


=MOD(1+C7+C7*C7,$B$1)

=MOD(1+B8+B8*B8,$B$1)

=MOD(1+D7+D7*D7,$B$1)

=E7


=MOD(1+C8+C8*C8,$B$1)

=MOD(1+B9+B9*B9,$B$1)

=MOD(1+D8+D8*D8,$B$1)

=E8


=MOD(1+C9+C9*C9,$B$1)

=MOD(1+B10+B10*B10,$B$1)

=MOD(1+D9+D9*D9,$B$1)

=E9


=MOD(1+C10+C10*C10,$B$1)

=MOD(1+B11+B11*B11,$B$1)

=MOD(1+D10+D10*D10,$B$1)

=E10


=MOD(1+C11+C11*C11,$B$1)

=MOD(1+B12+B12*B12,$B$1)

=MOD(1+D11+D11*D11,$B$1)

=E11


=MOD(1+C12+C12*C12,$B$1)

=MOD(1+B13+B13*B13,$B$1)

=MOD(1+D12+D12*D12,$B$1)

=E12


=MOD(1+C13+C13*C13,$B$1)

=MOD(1+B14+B14*B14,$B$1)

=MOD(1+D13+D13*D13,$B$1)

=E13


=MOD(1+C14+C14*C14,$B$1)

=MOD(1+B15+B15*B15,$B$1)

=MOD(1+D14+D14*D14,$B$1)

=E14


=MOD(1+C15+C15*C15,$B$1)

=MOD(1+B16+B16*B16,$B$1)

=MOD(1+D15+D15*D15,$B$1)

=E15


And then to the right of these:






=ABS(B3-D3)

=MOD(E3,F3)

=MOD(F3,G3)

=MOD(G3,H3)

=ABS(B4-D4)

=MOD(E4,F4)

=MOD(F4,G4)

=MOD(G4,H4)

=ABS(B5-D5)

=MOD(E5,F5)

=MOD(F5,G5)

=MOD(G5,H5)

=ABS(B6-D6)

=MOD(E6,F6)

=MOD(F6,G6)

=MOD(G6,H6)

=ABS(B7-D7)

=MOD(E7,F7)

=MOD(F7,G7)

=MOD(G7,H7)

=ABS(B8-D8)

=MOD(E8,F8)

=MOD(F8,G8)

=MOD(G8,H8)

=ABS(B9-D9)

=MOD(E9,F9)

=MOD(F9,G9)

=MOD(G9,H9)

=ABS(B10-D10)

=MOD(E10,F10)

=MOD(F10,G10)

=MOD(G10,H10)

=ABS(B11-D11)

=MOD(E11,F11)

=MOD(F11,G11)

=MOD(G11,H11)

=ABS(B12-D12)

=MOD(E12,F12)

=MOD(F12,G12)

=MOD(G12,H12)

=ABS(B13-D13)

=MOD(E13,F13)

=MOD(F13,G13)

=MOD(G13,H13)

=ABS(B14-D14)

=MOD(E14,F14)

=MOD(F14,G14)

=MOD(G14,H14)

=ABS(B15-D15)

=MOD(E15,F15)

=MOD(F15,G15)

=MOD(G15,H15)

=ABS(B16-D16)

=MOD(E16,F16)

=MOD(F16,G16)

=MOD(G16,H16)


The output looks like


33897117

1 + x + xx)












1

3

1

33897117










13

183

3

33897117

10

7

3

1

0

#DIV/0!

#DIV/0!

#DIV/0!


33673

15299742

13

33897117

33660

1497

726

45

6

3

0

#DIV/0!

#DIV/0!

19995619

33277233

183

33897117

19995436

13901681

6093755

1714171

951242

762929

2E+05

9677

4450

31732378

19730835

33673

33897117

31698705

2198412

920937

356538

207861

148677

59184

30309

28875

23066836

2119890

15299742

33897117

7767094

2828741

2109612

719129

671354

47775

2504

199

116

25445716

29810436

19995619

33897117

5450097

1196535

663957

532578

131379

7062

4263

2799

1464

17448766

21896967

33277233

33897117

15828467

2240183

147186

32393

17614

14779

2835

604

419

25138633

27004455

31732378

33897117

6593745

928392

95001

73383

21618

8529

4560

3969

591

7045297

24395184

19730835

33897117

12685538

8526041

4159497

207047

18557

2920

1037

846

191

21352090

19489635

23066836

33897117

1714746

1316943

397803

123534

27201

14730

12471

2259

1176

18976879

1463031

2119890

33897117

16856989

183139

8201

2717

50

17

16

1

0

27717028

18979434

25445716

33897117

2271312

2098749

172563

27993

4605

363

249

114

21

193564

10901376

29810436

33897117

29616872

4280245

3935402

344843

142129

60585

20959

18667

2292

5413102

27724314

17448766

33897117

12035664

9825789

2209875

986289

237297

37101

14691

7719

6972

12661243

27719085

21896967

33897117

9235724

6189945

3045779

98387

94169

4218

1373

99

86

21164344

25377987

25138633

33897117

3974289

2102805

1871484

231321

20916

1245

996

249

0


Notice what happens in the last row here.



3. The Quadratic Sieve Method of Factoring


This method involves a systematic search for two remainders that do not sum to N that have the same square mod N.


The basic plan is to square many numbers, and pick out those whose squares have only small prime factors.


When you accumulate enough of these, you will be able to find a product of them z, mod N that is a perfect square of small prime factors.


This means a number that can be written as p12s1*p22s2 …pk2sk.


We can then identify p1s1*p2s2 …pksk as a square root of z, and also we have a square root of z that is the product of the numbers whose squares had product z mod N.


These two square roots could be the same, or negatives of one another, but they are just as likely to be completely different, in which case they allow you to factor N. After you accumulate enough such z you will be able to factor N with very high probability.


We can describe this algorithm by describing how to get lots of squares of numbers mod N efficiently, and how to find a product of them that is a perfect square.


The awkward part of dealing with large numbers is that it takes time and effort to do anything with them However, one thing that is relatively easy to do is to divide one by a small number and examine the remainder obtained.


So here is the plan for getting lots of squares with only small prime factors.


We evaluate the greatest integer less than N1/2 (or (kN)1/2 for small k) ; call it y.

Then numbers of the form y+j for reasonably small j will have squares that are of the form y2 + jy + j2.


But we have y2 <N and (y + 1)2 >0, Since the difference between these two squares is 2y+1 we know that both y and (y+1)2 are numbers that mod N are of the order of the square root of N.


. This means that all the numbers of the form y2 + 2jy + j2 will have order N1/2 and will be perfect squares mod N.


For any prime p, we determine (y+1)2 and y mod p and these allow us to determine what all the numbers of the form (y+j)2 are mod p, by manipulating only small numbers.


So here is the plan. You pick a few million numbers of the form (y+j)2, and find what each of them are mod each prime p up to say a million.


From this information you can divide each square by each prime p that divides it until the result is no longer a multiple of p. If you reduce the result to 1 in this way, then the number you are dealing with is the product of powers of your small primes.


Now you accumulate a large number of such squares. As we saw in the previous Chapter, the proportion of numbers having this property is roughly log M/log N1/2.where M is the size of the largest prime considered small here.


Each of the squares s obtained can be described by a vector each of whose components corresponds to a prime p, and whose entry is a 1 if s/p2k+1 is an integer that is not a multiple of p, and 0 otherwise.


You will find a perfect square if you can find a linear dependence among these vectors, mod 2.


To see this let us take a mini-example


Suppose we found squares that mod N were 10. 15 18 and 3.


We would describe these as the following vectors, with component order 2,3,5.


(101), (011), (100), and (010)

.

The standard technique called row reduction, allows us to find a linear dependence.


It can be described as follows. Assign to each vector v its largest non-zero component unless that component has already been assigned to another vector w; in that case replace v by v+w and repeat using this step starting with v+w.


In our case the first vector is assigned to the third component. The second is replaced by its sum with the first (110) which is assigned to the second component. The third is assigned to the first component.

The fourth vector can be assigned no component and this means we have a linear dependence. It is first replaced by its sum with the first and second (100), and this is replaced by its sum with the third (000).


You would try to assign it to the second component, but that already has the sum of the first two vectors assigned to it. Adding the fourth to that leaves the first component but that has been assigned to the third vector. Adding that in gives the (0) vector, and the linear dependence is the sum of all four vectors, which is (0) mod 2.



This tells us that 10*15*18*3 is a perfect square, and in fact it is the square of 90. It also would be the square of the product of the four numbers that were used to produce these squares.


And that is the quadratic sieve method for factoring.


When you find two numbers with the same square, of course you will use Euclid’s algorithm starting with N and the difference (or sum) of the two, and obtain a prime factor of N as the gcd.


Exercise: 1.Form the product of two 4 digit primes and factor it with the tortoise hare algorithm. Try again with the product of 5 digit primes.

2. Explain the quadratic sieve algorithm in your own words. Which steps will take the most time?