Number Theory & Modular Arithmetic

More often than not, the first mathematical object we are exposed to is Integers. Integers are whole numbers, things like 1, 2, 3. These are simple, and relatively profound in their ability to represent quantities and otherwise unrelated topics. In fact, we have studied integers, known as Number Theory, for thousands of years with no real discernable purpose — until recently.

That is to say, not all science comes with some clear endgoal, but it is the search in of itself that provides us with dividends. Number Theory is one such case — where we have built out a system that allows us to encrypt large amounts of data using nothing more than prime factorization for exceptionally large numbers — something that up until today, was almost entirely pointless.

Today, however, cryptography is one of the foundational elements to modern info security — constantly building upon itself in every iteration of the modern internet. Once, there were ways to interject into ‘conversations’ between two computers — known as ‘man in the middle’ attacks. Without encryption, these attacks were able to steal every possible point of information transferred along the line — and were it not for Number Theory and mathematicians, that could very well still be the case. Instead, we implement things like public keys, private keys, hashing, and more to ensure data stays valid and secure. HTTPS, SSH, and many more protocols implement these very strategies, built on the concept of cryptography and Number Theory as their bread and butter in stopping hackers and nefarious agents from spying on your data.

But to understand Number Theory and Cryptography requires you understand a few of the most basic pieces of how they function — from division, to modular division, to factoring, primality testing, Euclid’s algorithm, number representation, fast exponentiation, and more.

The Division Algorithm

Beginning with the most simple and building our way up, we begin with ‘The Division Algorithm’, a basic set of steps that allows us to divide any integer by some other number. The definition of such is as follows:

Let x and y be two integers. Then x divides y (denoted x ∣ y) if and only if x ≠ 0 and there is an integer k such that y = kx. “x does not divide y” is denoted by x ∤ y. If x divides y, then y is said to be a multiple of x, and x is a factor or divisor of y.

Now, what does that truly mean? In simple terms, it means that if we have two numbers, they can only be said to divide each other if one can fit into the other perfectly k amount of times. For example, if 2 divides 8, how many times does it fit into 8? It fits 4 times — 2, 2, 2, 2, each of which culminates in a total value of 8. That means that 2 ∣ 8, such that 8 = 4 × 2, where 4 is k. It is a convoluted way (but not nearly as convoluted as it could be) to explain division in as atomic an idea as possible. Clarity in math is extremely important, and it is in definitions and proof we often see the most atomized clarity we possibly can. Don’t believe me? See how long the proof that 1 + 1 = 2 is, and see just how complicated a truly atomic proof can be (see Whitehead and Russell’s Principia Mathematica).

Linear Combinations

Another idea to internalize is the concept of Linear Combinations. These are sums of multiples of other numbers. In most situations, the number in front of the variables (or coefficient) can be any real number. For this, and the material I studied from, we will instead assume these are integer coefficients. To express these directly, let us have our two main variables x and y, and their coefficients, s and t. A linear combination of x and y would be sx + ty then.

That is what a linear combination is, but here is an important property; for a number (z) that divides both x and y, it also holds that it divides any linear combination of x and y as well. For example: 6 ∣ 12 and 6 ∣ 18, thus 6 also divides 462 since it is a linear combination of 12 and 18 (19 × 12 + 13 × 18 = 462).

The proof for this is as follows:

Assume x ∣ y. Assume also that x ∣ z. Since x ∣ y, then y = kx for some integer k. Similarly, since x ∣ z, then z = jx for some integer j. A linear combination of y and z can be expressed as:
sy + tz = s(kx) + t(jx) = (sk + tj)x
For some integers s and t. Since sy + tz is an integer multiple of x, then x ∣ (sy + tz).

What use does this have, you may ask? It actually serves as the foundation of Bézout’s identity, which makes the Extended Euclidean Algorithm work. This is directly useful in cryptography because finding modular inverses (essential for RSA) relies on exactly that. Diophantine equations like “can I make exactly x amount of cents using only 5-cent and 12-cent coins?” reduce to asking exactly that — is x a linear combination of 5 and 12 with non-negative integer coefficients. In Linear Algebra, span, basis, and dimension is built on linear combinations. The logic you can connect to linear combinations is almost endless for how foundational a concept like this is.

The Mod Operation

However, if x ∤ y, then you result in a non-zero remainder. The Division Algorithm actually states that the result of the division and the remainder are unique. It should be equally noted that the Division Algorithm is not an algorithm at all, but a statement of a mathematical fact.

Most computers can test for this directly using the mod (%) operation. This will take the division you’re asking for and return only the remainder. For positive numbers, this is simple — 13 mod 6 returns 1, because 12 (6 × 2) is one short of 13. That can become more complicated in negative numbers, however. Many languages can even return totally different answers for negative mod operations — but the actual mathematical definition of the Division Algorithm makes the interpretation of the mod operation completely well defined.

For instance, if I took −7 mod 4, we would return 1. Why is that? We might assume that it might return something like 3, or −3, but the return of mod is never negative. Instead, we would take 4 × −2, as that is the closest possible value smaller than −7 (notice how in our previous example, 12 was also the smallest number before 13). −8 to −7 requires a transition of 1, thus −7 mod 4 returns 1, since the smallest number we can reach prior to −7 is −8, and 1 remains between those two numbers.

Arithmetic Mod m

Mod can also be used on a set — for example, if an integer m > 1 (read: any integer greater than one) is selected, then mod m becomes a function that takes a single integer x as input and outputs x mod m, resulting in a number from the set {0, 1, 2, …, m − 1}. For example, if m = 5, then the mod 5 operation maps 12 to 2, and −11 to 4. Given the set {0, 1, 2, 3, 4}, addition and multiplication are defined on the elements in this set in the usual way, except that the mod 5 function is applied afterwards to ensure that the result is again from that set. In practice, this takes 2 + 4 and returns 1, since 2 + 4 = 6, and 6 mod 5 is 1. This is called multiplication mod m. These structures — sets with addition and multiplication defined mod m — are called rings, something you’ll encounter more formally in abstract algebra.

In real world use, this has immediate implications. Because the values are unique, we can restrict numbers down to a range of 0 to m − 1 no matter the inputs — and that gives us credible, intentional action for programs. Hash tables, for example, use exactly this: given m slots, key mod m maps any input to a valid index.

Cryptographic hash functions like SHA-256 build on the same foundational principle — mod arithmetic keeping values within fixed ranges — but layer on far more complexity to ensure the mapping is effectively irreversible. If you’ve ever verified a download hash for something like Tails or Kali Linux, you’ve relied on that irreversibility.

Congruence

When two numbers in a ring are equal (x and y are equivalent if and only if x mod m = y mod m), you can draw an equivalence relation between them. This is called congruence mod m. It can be further shown that two integers are congruent if (x − y) mod m is 0. For example, 17 and 5 are congruent mod 12, since 17 mod 12 = 5, 5 mod 12 = 5, or equivalently, (17 − 5) mod 12 = 0.

Mod’s precedence (the order of operations) is the same as multiplication and division — so it takes precedence over addition. Afterwards, operations are done left to right. Programming languages, however, vary regarding the precedence of the mod operation — so always check program-specific conventions before making assumptions.

Mod on Intermediate Results

You can also use mod on intermediate results, such as 13¹⁵⁹ mod 71. In this example, doing this in the straightforward manner (13¹⁵⁹ then mod 71) would take time, and directly inputting that to a computer would cause an overflow. Instead, you could let mod 71 be used on every intermediate product. Due to the length of the proof that shows mod on intermediate products does not change the result of arithmetic and multiplication, I won’t post it here — but I encourage you to look it up! (Computing arithmetic operations mod m proof, for the lazy.)

This is the basic idea behind some key, but maximally important, ideas in computer science. For those who have not yet learned about mod, let this serve as a good way to understand them so you don’t have to struggle like I did!

The Division Algorithm#

Linear Combinations#

The Mod Operation#

Arithmetic Mod m#

Congruence#

Mod on Intermediate Results#