# An introduction to the LLL-algorithm

Just a note in case you are here to evaluate this blogpost for my GSoC-proposal: This is my old website, I am currently building a new one here (which should be finished before the summer starts). When it is done I will move this blogpost. So that’s just an explanation for why the website might seem a bit outdated.

In this blogpost we will take a closer look at what exactly the LLL-algorithm does. We will not describe the internal workings of the algorith, but rather try to explain what it tries to achieve.

## The case of regular vector spaces

Given a set of vectors $$v_1,\cdots,v_n$$, the vector (sub)space generated by these vectors is defined as $$V = \{\lambda_1 v_1 + \lambda_2 v_2 + \cdots + \lambda_n v_n | \lambda_i \in \mathbb{R}\}$$, i.e. all linear combinations of these basis vectors. Note that, given such a space $$V$$, we can pick any set of n vectors as a basis, as long as they are linearly independent. However, it quickly becomes clear that some bases are just ‘nicer’ than others, consider the following bases of $$\mathbb{R}^3$$: $(1, 0, 0)^T, (0, 0, 1)^T, (0, 1, 0)^T$ or $(1, 2, 3)^T, (23, 54, 6254)^T, (\pi, \mathbf{e}, -1)^T$ While the second set of vectors is indeed linearly independent, it is quite clear that working with this basis would be quite a pain. Why? For two reasons: 1) the vectors are not orthogonal, and 2) the vectors do not have length 1.

## A ‘nice’ basis

While not having an orthonormal basis is not an insurmountable disaster, there are a lot of things that become a lot easier when your basis is orthonormal. For example, given that we have some vector that is a linear combination of the basis vectors described as $$\lambda_1, \lambda_2,\cdots, \lambda_n$$, what is the projection of this vector on the i’th basisvector? If the basis is orthonormal, this is clearly $$\lambda_i$$, whereas with a non-‘nice’ basis you would have to calculate the projection. In other words, we can define a ‘nice’ basis for a vector (sub)space as a basis that is orthonormal. But then the next question is: What if we only have an ‘ugly’ basis, can we get to a ‘nice’ one quickly? It turns out: we can! The Gram-Schmidt-process is a simple algorithm that orthonormalizes a given set of linearly independent vectors in $$\mathcal{O}(n^2)$$ such that the generated space is the same as before. The algorithm essentially works like this: for each basis vector $$v_i$$, normalize it and project all $$v_j$$ ($$j > i$$) into $$v_i^\perp$$ (the orthagonal complement of $$v_i$$). We can observe that this projection is a non-trivial linear combination of $$v_i$$ and $$v_j$$, and thus preserves the generated vector space. An implementation I made of this algorithm can be found here.

## Lattices

We move on to lattices. The definition of a lattice is quite simple: we are again given a set of  basis vectors $$v_1, \cdots, v_n$$, and look at its lattice: $\mathcal{L}(v) = \{\lambda_1 v_1 + \lambda_2 v_2 + \cdots + \lambda_n v_n | \lambda_i \in \mathbb{Z} \}$ What has changed? We only allow integer multiples of the basis! It turns out, this complicates things considerably. As a start, we cannot just replace our $$n$$ basis vectors by $$n$$ other linearly independent vectors from the same space. As an example, consider $$(1, 0)$$ and $$(0, 1)$$, which generate $$\mathbb{Z}^2$$, whereas $$(2, 0)$$ and $$(0, 2)$$ generate $$2\mathbb{Z}^2$$ (all tuples of even numbers). This implies that our Gram-Schmidt-process won’t work anymore either: it would convert the second basis in our example to the first one, which does not lie in the lattice. This is because $$(1, 0)$$ is not an integer multiple of $$(2, 0)$$. As another example, if our basis is $$(1, 0)$$, and the same vector rotated 60 degrees. This results in the following lattice:

Taken from wikimedia

After playing with this basis a bit on paper, it becomes clear that there is no orthonormal basis for this lattice.

## Can we still have nice bases?

So, when we cannot find a nice basis anymore, what do we do? We redefine ‘nice’ of course! On a more serious note, it is not entirely clear what a ‘nice’ basis for a lattice is. Ideally we would like the vectors in our basis to be short and as orthogonal as possible. In their 1982 paper “Factoring polynomials with rational coefficients”, Arjen Lenstra, Hendrik Lenstra and László Lovász introduced both a concrete notion of ‘a nice lattice basis’ and an algorithm to compute one: the LLL-algorithm. So what is the new ‘nice’? Given our basis $$v_1,\cdots, v_n$$, let $$v_1^*,\cdots,v_n^*$$ be the orthonormalization of this basis (i.e. the ‘nice’ basis if we forgot for once second that we were in a lattice). Then an LLL-reduced basis satisfies the following conditions:

• The basis is size-reduced: $$|({v_i \cdot v_j^*}) / ({v_j^* \cdot v_j^*})| \leq \frac{1}{2}$$ for $$1\leq j < i \leq n$$ where $$\cdot$$ denotes the inproduct. What does this mean? Note that the value of the left hand side is the (absolute value of the) projection of $$v_i$$ onto $$v_j^*$$. If this was the projection of $$v_i^*$$, this value would be zero as $$v_i^*$$ and $$v_j^*$$ are orthagonal. By requiring the value to be smaller than $$\frac{1}{2}$$, we are asking $$v_i$$ to simultaneously be short and sufficiently orthagonal to $$v_j^*$$ (note that since $$j < i$$, $$v_j^*$$ is not based on $$v_i$$. For $$j > i$$, this condition trivially holds.
• The Lovász condition: $$|v_i^* + \mu_{i i-1} v_{i – 1}^*|^2 \geq \delta |v_{i – 1}^*|^2$$ for $$\frac{1}{4} < \delta < 1$$ and $$\mu_{i i – 1} = |v_i \cdot v_{i – 1}^*| / |v_{i – 1}^* \cdot v_{i – 1}^*|$$. Note that the left hand side of this expression is the projection of $$v_i$$ into the orthagonal complement of the first i – 2 vectors, and the right hand side the projection of $$v_{i – 1}$$ into the first i – 2 vectors. In other words, $$v_i$$ should be ‘further into this complement’ than $$v_{i – 1}$$.

It turns out that using these conditions, we can compute a ‘nice’ basis for a lattice in polynomial time! In particular, the LLL-algorithm does this in $$\mathcal{O}(d^5 \cdot n \cdot \log_3 B)$$ for $$B$$ the length of the largest of the $$v_i$$ (under the euclidean norm). We will look deeper into this algorithm later, but if you can’t wait I would recommend either the original paper (Factoring Polynomials with Rational Coefficients) or the book ‘A Course in Computational Algebraic Number Theory’ by Henri Cohen.

# Mersenne Digits

Have you ever noticed that whenever a large prime is found in mathematics, the following sentence always pops up somewhere: “.. and it’s last 10 digits are: xxxxxxxxxx “. How are these terminating digits found? Do they just calculate the whole number and then look at the tail? Does it take a supercomputer to actually find these digits?

Turns out: no, not really. Even you can do it.

## Where do the last digits come from?

Let’s first consider where these terminating digits actually come from. Say that we want to find the last 2 digits of 7531 * 8642. Don’t pull out a calculator, instead, think about where these last 2 digits would actually come from. Would they be affected by all digits in our numbers? Of course not, if we visualize this multiplication using the area model, only the red areas affect the last 2 digits of our number:

As you can see, the last 2 digits of all other products are ’00’. Thus, the product of any two numbers whose last 2 digits are ’42’ and ’31’ will have the same last 2 digits, everytime. Specifically, 42 * 31 = 1302, so the last two digits of 7531 * 8642 are ’02’.

## Mersenne Primes

Time to go a little bigger. Mersenne Primes are primes given by $$2^p – 1$$ where $$p$$ itself is prime as well. Pretty much all of the largest primes we know are Mersenne Primes. In fact, the largest prime we know is a Mersenne Prime: $$2^{57885161} – 1$$. Want to know what its last 10 digits are?

The problem is somewhat similar to the one we faced previously, except instead of two numbers, we have to multiply 57885161 numbers together. Luckily, since all of these numbers are the same, there is a quick way to do this! Here’s a Python implementation of the algorithm:

def mypow(x, n):
if(n == 0):
return 1
if(n == 1):
return x
if(n == 2):
return x * x
if(n % 2 == 0):
return mypow(x * x, int(n / 2))
return x * mypow(x * x, int((n - 1) / 2))

However, if we now ask it to calculate mypow(2, 57885161) % 10000000000, nothing happens. That’s because in the end, this query will only return the last 10 digits, but it calculates all of them, which is exactly what we were trying to avoid.

The key now is to split our gigantic power into the string of multiplications it really is, and after a multiplication, drop all digits that aren’t in the last 10. As you saw above, to do this we use % 10000000000 or % 10**10 ($$10^{10}$$). Here’s what the code looks like:

def mypow(x, n, digits):
if(n == 0):
return 1
if(n == 1):
return x % (10 ** digits)
if(n == 2):
return x * x % (10 ** digits)
if(n % 2 == 0):
return mypow(x * x, int(n / 2), digits) % (10 ** digits)
return x * mypow(x * x, int((n - 1) / 2), digits) % (10 ** digits)

As you can see, I went for an extra argument specifying how many digits we are looking for. First, test it a little and see if it works. For example, 2^10 = 1024, so mypow(2, 10, 2) should return 24. After that, try mypow(2, 57885161, 10) - 1 (don’t forget the -1!). It should return 1724285951, the last 10 digits of our prime!

## Bigger!

If you have a decent computer, the answer should have appeared in a few seconds. We should be able to calculate even larger Mersenne Primes with this method. Prime(3443958) is the largest known prime for which $$2^p-1$$ is prime, but, hypothetically, let’s just assume that prime(10000000) satisfies this condition as well! The number would be astronomically large, WolframAlpha estimates it has approximately $$3.43\cdot{10}^{54012208}$$ digits. But in a few seconds, our function tells us the last ten digits of this number are 6819899391!

## Does this always work?

You might be wondering whether this works for any sort of large number. Unfortunately, it doesn’t always work out as well as for Mersenne Primes. Generally, any number that can be calculated by multiplications and additions only, is fine. Take, for example, the non-Mersenne but still enormously large prime 19249 * 2^13018586 + 1. The following query gives its last 10 digits: (19249 * mypow(2, 13018586, 10) + 1) % (10 ** 10), returning 8618996737.

Anyway, that’s it for now. Happy digit-hunting.

# Stirling’s Formula

Meet Stirling’s Formula:

$\lim_{n \to \infty} \frac{n!}{\sqrt{2\pi{n}}\cdot n^{n} \cdot e^{-n}} = 1$

Essentially, this formula says that for very large n, the formula $$\sqrt{2\pi{n}}\cdot{n}^{n}\cdot{e}^{-n}$$ approximates $$n!$$, since the ratio between these two terms approaches one. This is very useful for calculating large $$n!$$ since it is a lot faster.

## Faster squaring

But why, you might ask? Calculating $$n!$$ takes $$n-1$$ multiplications, since you are multiplying n numbers together, while the term $$\sqrt{2\pi{n}}\cdot(\frac{n}{e})^{n}$$ takes at least as much, since a) it involves an n-th power, and b) it has a squareroot, and other scary things like $$\pi$$ and $$e$$.

Turns out, there is a really neat method called exponentiation by squaring that allows us to calculate powers (in our case $$(\frac{n}{e})^{n}$$ ) a lot faster. Here is how it works:

Let’s say we want to calculate the following power: $$3^8$$. If we just multiply eight 3’s together, we’ll need 7 multiplications. However, remember that we can split exponents, specifically:

$a^{nm} = (a^n)^m$

We can do the same thing with our power by dividing out the two’s:

$3^8 = (3^2)^4 = ((3^2)^2)^2$

This is pretty neat – after all, squaring a number is just one multiplication, so now we only need 3 multiplications rather than 7! Let’s try that again, now with something larger:

$2^{28} = (2^2)^{14} = ((2^2)^2)^7 = 16^7$

Now we’ve run into a problem: uneven exponents. We can’t split the 7 into 2 and 3.5 since then we’d have to take roots, and our goal was to find a faster  way to calculate power of n. In fact, since 7 is prime there is no good way to split it into two other integer exponents using the above method. However, not everything is lost! We’ll use this other rule for exponents:

$a^{n+m} = a^n\cdot{a}^m$

How about we rewrite 7 to 6 + 1?

$16^7 = 16\cdot{16}^6 = 16\cdot(16^2)^3 = 16\cdot(256)^3 = 16\cdot{256}\cdot(256)^2$

In its final form our equation only needs 3 more multiplications, and it took us another 3 multiplications to get the 16’s and 256’s. That makes 6 multiplications in total, as opposed to the original 27 we’d need for $$2^{28}$$. This is pretty cool, now let’s try coming up with a formal algorithm for this method.

## The algorithm

Let’s assume we are given a value x and a positive integer exponent n. We’re not going to worry about non-positive exponents since those would be somewhat trivial to handle (Cue “this proof is left as an exercise for the reader.“), and fractional exponents are a whole different beast. In other words, n is positive and whole, or, if you’re a mathematician, $$n\in\mathbb{N}$$.

The best way to do this would be by using recursion, since our algorithm continuously splits powers until the exponents are small enough (<= 2). Let’s start by defining a function that does this:

function power(x, n)
if(n == 1)
return x;
elseif(n == 2)
return x * x;
else ?

Those are the special cases, now we get to the recursion. First, we’ll have to check whether the exponent n is even or uneven. If it is even, we can divide it by two and raise x*x to this power. If it is uneven, we subtract 1 and try again.

function power(x, n)
if(n == 1)
return x;
elseif(n == 2)
return x * x;
elseif((n % 2) == 0)
return power(x * x, n / 2);
else
return x * power(x, n - 1);

(Where % is the modulo operator, i.e. if c is the result of a % b, then c is rest after division a/b. So (n % 2) == 0 if and only if n is even.)

This actually works fairly well, but we can make a slight optimization. We already noticed how, after dividing our exponent by two, there’s no good way to predict whether the new exponent will be even or uneven. However, when we subtract one we do know. If n is uneven, then n-1 must be even! We can skip a step in our recursion by changing the last line to this:

return x * power(x * x, (n - 1) / 2);

The new exponent (n – 1) / 2 might very well be even as well, but this time there is no simple way to predict that. Anyway, our full (pseudo)code now looks like this:

function power(x, n)
if(n == 1)
return x;
elseif(n == 2)
return x * x;
elseif((n % 2) == 0)
return power(x * x, n / 2);
else
return x * power(x * x, (n - 1) / 2);

So there you have it! Exponentation by squaring, a simple method to greatly reduce the number of multiplications needed for calculating $$x^n$$. If you’re a programmer and you think you should maybe implement this for your projects: don’t worry, most of the time compilers use this (or a similar) method as well. It’s just a nice example of how some creative math can get you a long way when it comes to optimizing code.

## Back to Stirling

There are other reasons that Stirling’s Formula (or Stirling’s Approximation) is a useful method to approximate n!, but if you hadn’t noticed yet, it was just an excuse to talk about exponentiation by squaring. I would encourage you to research it on your own, though.