Building on the prior lesson of the Division Algorithm and Modular Arithmetic, we move onto Prime Factorization and Euclid’s Algorithm. These are not only extremely interesting facets of math and number theory, but one of the most utilized pieces of math in the modern digital world. Prime Factorization is fundamental to almost all encryption, and it is by the sheer scale of the math that allows it to be so — for even simple encryption breaking, a conventional computer could take over a hundred years running full bore before it could crack some types of encryption. This is also what makes things like Quantum Computers so terrifying — they have the capacity to break prime factorization, and thus, our entire modern digital security infrastructure.
But, until they figure out coherency, encryption is still our go-to method for keeping communications secure. This begins with defining prime factorization itself — a number is considered prime if it is an integer greater than 1 and its only factors are 1 and that number, while a composite number is any number that has a factor other than 1 and itself. These two pieces apply to every single number greater than 1, and have been studied for thousands of years.
Prime factorization is breaking down another number into its prime factors, in other words breaking it down into two numbers that hold only 1 and itself as its own factors. For example, 25 can be broken down into 5 and 5 — a prime factorization since 5 is prime, as it can only be made up of 5 and 1; there is no other number that can be multiplied to become 5 besides 5. Sometimes numbers can have multiple prime factorizations, such as 120.
120 can be broken down into $2 \times 2 \times 3 \times 2 \times 5$ and $2 \times 5 \times 2 \times 2 \times 3$ — these are considered different because their ordering matters. A nondecreasing sequence is one that grows larger with each number — essentially just sorting your factors into a standard pattern you can reference. For 120, it would be $2 \times 2 \times 2 \times 3 \times 5$. These can be consolidated through multiplicity. Multiplicity is just counting how many times a factor appears in the prime factorization of a number. For 120, 2 would be said to have a multiplicity of 3. 3 would have a multiplicity of 1, since it only appears once. Same for 5, just 1.
Thus, you can write 120’s prime factorization as $2^3 \cdot 3^1 \cdot 5^1$. Notably, once sorted, every single number has a unique prime factorization. This is simply understood, because there is no other number besides 120 you can get from $2^3 \cdot 3 \cdot 5$. With that in mind, we can start to imagine just how impactful something like this could be — especially when you start to consider the very implications of encryption. If you sent out a message where you hid the 3 and 5, but publicly you signed it with $2^3$ — people who “know” that you need 3 and 5 could decrypt your message, rebuilding the “120” out of something otherwise unknown. Modern encryption does something similar, but with numbers on the order of hundreds of digits — not just 120.
GCD and LCM
The prime factorization has another use — finding the greatest common divisor (GCD) and least common multiple (LCM). GCD is a number that both $x$ and $y$ share as a divisor, but specifically the largest positive integer they both share. 10 and 4, for example, would have a GCD of 2 — because both can be divided by 2 and no other number larger than that (with no remainder, mind you). LCM is the smallest multiple that two numbers $x$ and $y$ share. 2 and 5, for example, would have an LCM of 10, since both numbers’ smallest common multiple is just 10 — there would be no reason to go to 20, which would be the next largest one.
Two numbers can be considered relatively prime if the greatest common divisor of the two numbers is 1. 7 and 6, for example, would be considered relatively prime — as there is no number other than 1 that could suffice to divide them both equally. This is also an important factor when you’re considering encryption later on.
Euclid’s Algorithm
It is easy to find the GCD and LCM once you have the prime factorization of a number, but getting the prime factorization can sometimes be hard. Instead, there are methods that can expedite the process of finding the GCD and LCM, the primary being known as Euclid’s Algorithm. This was found thousands of years ago, and has been a powerful tool in the pocket of every number theorist since.
Euclid’s Algorithm is simple, but it can seem far more complicated when you’re trying to keep track of it. In its barest essence, Euclid’s Algorithm works off a single principle — if $x$ and $y$ are both divisible by $d$, then they are both multiples of $d$. This is just the inverse of the division statement, like putting $x/d = n$ into $x = n \cdot d$. Let’s take two numbers we can easily track — 10 and 15. We know both of these numbers can be divisible by some number $d$, which is the GCD.
So, we do something really simple — we take $x$ and we subtract $y$. In this case, $15 - 10 = 5$. Notice something? 15, 10, and 5 are all multiples of 5. In a very straightforward way, we actually found an interesting note: that linear combinations are also divisible by $d$. In our 10 and 15 case, you could say it was $15 \cdot 1 + (-1) \cdot 10 = 5$. We can take this deeper, however — because the magic of Euclid isn’t in the pattern, but the result.
We showed with a simple case that 10 and 15 have a GCD of 5 by a simple subtraction. Let’s take this to a higher number — the GCD of 675 and 210. $\gcd(675, 210)$ might seem complicated, but using the steps Euclid laid out we can build quick intuition as to why it works. First, we set up our problem:
$$Y = 675$$ $$X = 210$$
We want to find the mod of these two numbers — $675 \mod 210$. This is the same kind of operation as what we saw before with $15 \mod 10$. Notice how it rhymes? $675 \mod 210$ is 45.
So, we have 45 — but did we get any closer to the answer? Unequivocally yes! We just repeat what we did before, but moving the numbers over. Instead of $675 \mod 210$, now we do $210 \mod 45$ — giving us 30. The remainder isn’t zero, so we keep going. This time, we do $45 \mod 30$, getting 15. One more time and we see $30 \mod 15 = 0$. We have hit the floor, and thus can’t go any further. What do we have to show for it?
The second we hit 0, we see that 15 is the GCD of 675 and 210. All we had to do was keep applying the mod operation to our remainders until we had no more remainder. The question can be asked then, why does this work? How did someone even figure that out? Let’s take a look at what we created and see how it looks all laid out.
Writing down our steps, we have:
$$675 = 3 \cdot 210 + 45$$ $$210 = 4 \cdot 45 + 30$$ $$45 = 1 \cdot 30 + 15$$ $$30 = 2 \cdot 15 + 0$$
Critically, each remainder (the $+$ some number) is a linear combination of the previous two numbers in the chain. Working bottom up, we see this through simple algebraic rearranging across the equal sign:
$$15 = 45 - 1 \cdot 30$$ $$30 = 210 - 4 \cdot 45$$
Notice how we can fill things back in?
$$15 = 45 - (210 - 4 \cdot 45)$$ $$15 = 5 \cdot 45 - 210$$
Looking back, we can see that 45 is also able to be represented by another form of the equation:
$$45 = 675 - 3 \cdot 210$$
Thus:
$$15 = 5 \cdot (675 - 3 \cdot 210) - 210$$ $$15 = 5 \cdot 675 - 16 \cdot 210$$
This comes right back to the proof we saw before — that $d = 675 \cdot n + 210 \cdot m$, where $n$ is now clearly 5 and $m$ is $-16$! This can apply to any GCD — it is generalizable because the only properties we used were the division algorithm for integers and the well-ordering principle from before. There was no special characteristic of 675 or 210 that we relied on. This tracking, this seeming verbosity, is actually an extension of Euclid’s Algorithm called the Extended Euclidean Algorithm. The linear combination itself, however, is something known as Bézout’s Identity — the idea that for any nonzero integers $a$ and $b$, there exist integers $x$ and $y$ (referred to as Bézout’s coefficients) such that $ax + by = \gcd(a, b)$. In our case, that’s exactly $5 \cdot 675 + (-16) \cdot 210 = 15$.