An introduction to the LLL-algorithm

Just a note in case you are here to evaluate this blogpost for my GSoC-proposal: This is my old website, I am currently building a new one here (which should be finished before the summer starts). When it is done I will move this blogpost. So that’s just an explanation for why the website might seem a bit outdated.


In this blogpost we will take a closer look at what exactly the LLL-algorithm does. We will not describe the internal workings of the algorith, but rather try to explain what it tries to achieve.

The case of regular vector spaces

Given a set of vectors \(v_1,\cdots,v_n\), the vector (sub)space generated by these vectors is defined as \(V = \{\lambda_1 v_1 + \lambda_2 v_2 + \cdots + \lambda_n v_n | \lambda_i \in \mathbb{R}\}\), i.e. all linear combinations of these basis vectors. Note that, given such a space \(V\), we can pick any set of n vectors as a basis, as long as they are linearly independent. However, it quickly becomes clear that some bases are just ‘nicer’ than others, consider the following bases of \(\mathbb{R}^3\): \[ (1, 0, 0)^T, (0, 0, 1)^T, (0, 1, 0)^T \] or \[ (1, 2, 3)^T, (23, 54, 6254)^T, (\pi, \mathbf{e}, -1)^T \] While the second set of vectors is indeed linearly independent, it is quite clear that working with this basis would be quite a pain. Why? For two reasons: 1) the vectors are not orthogonal, and 2) the vectors do not have length 1.

A ‘nice’ basis

While not having an orthonormal basis is not an insurmountable disaster, there are a lot of things that become a lot easier when your basis is orthonormal. For example, given that we have some vector that is a linear combination of the basis vectors described as \(\lambda_1, \lambda_2,\cdots, \lambda_n\), what is the projection of this vector on the i’th basisvector? If the basis is orthonormal, this is clearly \(\lambda_i\), whereas with a non-‘nice’ basis you would have to calculate the projection. In other words, we can define a ‘nice’ basis for a vector (sub)space as a basis that is orthonormal. But then the next question is: What if we only have an ‘ugly’ basis, can we get to a ‘nice’ one quickly? It turns out: we can! The Gram-Schmidt-process is a simple algorithm that orthonormalizes a given set of linearly independent vectors in \(\mathcal{O}(n^2)\) such that the generated space is the same as before. The algorithm essentially works like this: for each basis vector \(v_i\), normalize it and project all \(v_j\) (\(j > i\)) into \(v_i^\perp\) (the orthagonal complement of \(v_i\)). We can observe that this projection is a non-trivial linear combination of \(v_i\) and \(v_j\), and thus preserves the generated vector space. An implementation I made of this algorithm can be found here.

Lattices

We move on to lattices. The definition of a lattice is quite simple: we are again given a set of  basis vectors \(v_1, \cdots, v_n\), and look at its lattice: \[ \mathcal{L}(v) = \{\lambda_1 v_1 + \lambda_2 v_2 + \cdots + \lambda_n v_n | \lambda_i \in \mathbb{Z} \} \] What has changed? We only allow integer multiples of the basis! It turns out, this complicates things considerably. As a start, we cannot just replace our \(n\) basis vectors by \(n\) other linearly independent vectors from the same space. As an example, consider \((1, 0)\) and \((0, 1)\), which generate \(\mathbb{Z}^2\), whereas \((2, 0)\) and \((0, 2)\) generate \( 2\mathbb{Z}^2 \) (all tuples of even numbers). This implies that our Gram-Schmidt-process won’t work anymore either: it would convert the second basis in our example to the first one, which does not lie in the lattice. This is because \((1, 0)\) is not an integer multiple of \((2, 0)\). As another example, if our basis is \((1, 0)\), and the same vector rotated 60 degrees. This results in the following lattice:

Taken from wikimedia

After playing with this basis a bit on paper, it becomes clear that there is no orthonormal basis for this lattice.

Can we still have nice bases?

So, when we cannot find a nice basis anymore, what do we do? We redefine ‘nice’ of course! On a more serious note, it is not entirely clear what a ‘nice’ basis for a lattice is. Ideally we would like the vectors in our basis to be short and as orthogonal as possible. In their 1982 paper “Factoring polynomials with rational coefficients”, Arjen Lenstra, Hendrik Lenstra and László Lovász introduced both a concrete notion of ‘a nice lattice basis’ and an algorithm to compute one: the LLL-algorithm. So what is the new ‘nice’? Given our basis \(v_1,\cdots, v_n\), let \(v_1^*,\cdots,v_n^*\) be the orthonormalization of this basis (i.e. the ‘nice’ basis if we forgot for once second that we were in a lattice). Then an LLL-reduced basis satisfies the following conditions:

  • The basis is size-reduced: \( |({v_i \cdot v_j^*}) / ({v_j^* \cdot v_j^*})| \leq \frac{1}{2} \) for \(1\leq j < i \leq n\) where \(\cdot\) denotes the inproduct. What does this mean? Note that the value of the left hand side is the (absolute value of the) projection of \(v_i\) onto \(v_j^*\). If this was the projection of \(v_i^*\), this value would be zero as \(v_i^*\) and \(v_j^*\) are orthagonal. By requiring the value to be smaller than \(\frac{1}{2}\), we are asking \(v_i\) to simultaneously be short and sufficiently orthagonal to \(v_j^*\) (note that since \(j < i\), \(v_j^* \) is not based on \(v_i\). For \(j > i\), this condition trivially holds.
  • The Lovász condition: \(|v_i^* + \mu_{i i-1} v_{i – 1}^*|^2 \geq \delta |v_{i – 1}^*|^2 \) for \(\frac{1}{4} < \delta < 1\) and \(\mu_{i i – 1} = |v_i \cdot v_{i – 1}^*| / |v_{i – 1}^* \cdot v_{i – 1}^*|\). Note that the left hand side of this expression is the projection of \(v_i\) into the orthagonal complement of the first i – 2 vectors, and the right hand side the projection of \(v_{i – 1}\) into the first i – 2 vectors. In other words, \(v_i\) should be ‘further into this complement’ than \(v_{i – 1}\).

It turns out that using these conditions, we can compute a ‘nice’ basis for a lattice in polynomial time! In particular, the LLL-algorithm does this in \(\mathcal{O}(d^5 \cdot n \cdot \log_3 B)\) for \(B\) the length of the largest of the \(v_i\) (under the euclidean norm). We will look deeper into this algorithm later, but if you can’t wait I would recommend either the original paper (Factoring Polynomials with Rational Coefficients) or the book ‘A Course in Computational Algebraic Number Theory’ by Henri Cohen.

Warlight Bot Tactics

Just a couple of ideas regarding tactics for the Warlight AI Challenge. I haven’t had time to try all of them, and I’m not even sure whether I will, since the finals are approaching rather quickly, but I thought I’d pen them down anyway. Also note that my bot is open source, if you need a basis for yours.

Joint/Split invasions

Implemented: Yes.

The game takes place on a simple graph, rather than a straight line, so when you only consider attacking from a single region, you are missing a lot of opportunities. To see what I mean, consider the following situation:

This is a screenshot of round 99(!) of this match, and the blue bot is mine. I should definitely have lost, but my opponent’s attack logic probably looked something like this:

for each friendly region R
    for each non-friendly neighbour N of R
        if attack R -> N is feasible
            attack N from R

Now consider the following attack sequence. The red bot is mine (yes, I cherrypicked the examples):

The bot determines that the region containing 17 armies actually has 2 potential targets, and it has sufficient armies to attack both. Additionally, there are 2 other regions that could potentially aid in the attack.

As you can see this can make quite a difference. Not only do you avoid situations like we saw in the first picture, but you also move across the board much faster. This is roughly how I do it:

for each non-friendly Region R
    # Gauge how many armies would be available for an attack on R
    for each friendly neighbor N of R
        R.availableArmies += N.CanMiss(R)
    # Attack if there are enough
    if R.availableArmies >= R.requiredArmies
        Attack R

The CanMiss(R) subroutine of N is essential here. It specifies how many armies the friendly region N could miss if region R wasn’t there. In other words, this function loops through the hostile neighbors of N that are not R, decides how many armies it needs to defend itself against them and subtracts that from the armies it actually has (clamping at zero of course).

This way, not only can you organize joint attacks on hostile regions, you can also do so without compromising the safety of the attacking regions.

However, there is a slight problem. Going back up, to the image sequence showcasing this behavior, notice how the region containing 17 armies splits a bit unevenly, at 15:1, rather than, say, 12:4? This is because there is some discrepancy between armies needed for an attack, and armies needed for defence.

To be more precies, the 17 army region R in question sees two neighboring regions: N1 containing 8 armies, and N2 containing 2 armies. It first of all decides that if it is going to attack N1, it still needs to be able to defend itself against N2. How many armies are needed for this? Let’s say two is enough. That means we can attack with fifteen!

Next up, it wants to attack N2. How many armies are necessary for the defence against N1? Uh, ten should be enough. Attack with seven!

I haven’t really implemented a solution to this sub-problem yet, but here are some ideas:

  • Execute all attacks the region, in ascending order of how many armies the region provides. Simple to implement, but probably won’t really work that well. Note that occassionally (but very rarely), two regions might both be involved in two different attacks, and require them to be executed in opposite order.
  • Once a region has attacked once, block all other attacks it is involved in (or, continue the attack without the relevant region, if possible).
  • Run over the final set of attacks several times, and recalculate attacks where the total amount of armies required from a region exceeds the available, potentially discarding one or more attacks if necessary.

Move idle armies to frontlines

Implemented: Yes.

Another thing I notice a lot is that when bots are done in an area, they just leave their excess armies there. For example:

And on the opposite side, look how this match progresses. Really, there is nothing too fancy going on. All you need is a simple implementation of breadth-first search that searches for regions in a specific list, rather than a single target. Just run all frontline regions through it and head to the closest.

Again, people are really missing opportunities here, especially since you can make as many moves as you want in a turn. As long as you do all the transferring after the attacking, you have nothing to lose.

Properly handle a fragmented playing field

Implemented: No.

Consider the following situation:

Not so good, right? In this case it would probably be best to give up the regions in Asia and North America, and focus on defending Australia. However, currently my bot (and probably many others) only cares that Ontario is significantly threatened, and gives it some extra armies.

Obviously, it should instead follow the US and give up on conquering Canada. This is actually harder than it looks because it requires the bot to look at the Big Picture. Currently my bot only does this by looking at how much of a region’s superregion is owned (proportionally), which evidently is not enough.

Ideally, the bot would pick a certain network of friendly regions (in this case Australia), and only focus on regions that are directly connected to it. I’m not yet sure how to react to this network being split up (as happened in China in the image above), but the best option is probably to continue with the largest network, or the network that contains the most (nearly) completed superregions.

Either way, that’s it for now. If you have any other ideas or comments on the tactics proposed above, let me know in the comments!

Mersenne Digits

Have you ever noticed that whenever a large prime is found in mathematics, the following sentence always pops up somewhere: “.. and it’s last 10 digits are: xxxxxxxxxx “. How are these terminating digits found? Do they just calculate the whole number and then look at the tail? Does it take a supercomputer to actually find these digits?

Turns out: no, not really. Even you can do it.

Where do the last digits come from?

Let’s first consider where these terminating digits actually come from. Say that we want to find the last 2 digits of 7531 * 8642. Don’t pull out a calculator, instead, think about where these last 2 digits would actually come from. Would they be affected by all digits in our numbers? Of course not, if we visualize this multiplication using the area model, only the red areas affect the last 2 digits of our number:

As you can see, the last 2 digits of all other products are ’00’. Thus, the product of any two numbers whose last 2 digits are ’42’ and ’31’ will have the same last 2 digits, everytime. Specifically, 42 * 31 = 1302, so the last two digits of 7531 * 8642 are ’02’.

Mersenne Primes

Time to go a little bigger. Mersenne Primes are primes given by \(2^p – 1\) where \(p\) itself is prime as well. Pretty much all of the largest primes we know are Mersenne Primes. In fact, the largest prime we know is a Mersenne Prime: \(2^{57885161} – 1\). Want to know what its last 10 digits are?

The problem is somewhat similar to the one we faced previously, except instead of two numbers, we have to multiply 57885161 numbers together. Luckily, since all of these numbers are the same, there is a quick way to do this! Here’s a Python implementation of the algorithm:

def mypow(x, n):
	if(n == 0):
		return 1
	if(n == 1):
		return x
	if(n == 2):
		return x * x
	if(n % 2 == 0):
		return mypow(x * x, int(n / 2))
	return x * mypow(x * x, int((n - 1) / 2))

However, if we now ask it to calculate mypow(2, 57885161) % 10000000000, nothing happens. That’s because in the end, this query will only return the last 10 digits, but it calculates all of them, which is exactly what we were trying to avoid.

The key now is to split our gigantic power into the string of multiplications it really is, and after a multiplication, drop all digits that aren’t in the last 10. As you saw above, to do this we use % 10000000000 or % 10**10 (\(10^{10}\)). Here’s what the code looks like:

def mypow(x, n, digits):
	if(n == 0):
		return 1
	if(n == 1):
		return x % (10 ** digits)
	if(n == 2):
		return x * x % (10 ** digits)
	if(n % 2 == 0):
		return mypow(x * x, int(n / 2), digits) % (10 ** digits)
	return x * mypow(x * x, int((n - 1) / 2), digits) % (10 ** digits)

As you can see, I went for an extra argument specifying how many digits we are looking for. First, test it a little and see if it works. For example, 2^10 = 1024, so mypow(2, 10, 2) should return 24. After that, try mypow(2, 57885161, 10) - 1 (don’t forget the -1!). It should return 1724285951, the last 10 digits of our prime!

Bigger!

If you have a decent computer, the answer should have appeared in a few seconds. We should be able to calculate even larger Mersenne Primes with this method. Prime(3443958) is the largest known prime for which \(2^p-1\) is prime, but, hypothetically, let’s just assume that prime(10000000) satisfies this condition as well! The number would be astronomically large, WolframAlpha estimates it has approximately \(3.43\cdot{10}^{54012208}\) digits. But in a few seconds, our function tells us the last ten digits of this number are 6819899391!

Does this always work?

You might be wondering whether this works for any sort of large number. Unfortunately, it doesn’t always work out as well as for Mersenne Primes. Generally, any number that can be calculated by multiplications and additions only, is fine. Take, for example, the non-Mersenne but still enormously large prime 19249 * 2^13018586 + 1. The following query gives its last 10 digits: (19249 * mypow(2, 13018586, 10) + 1) % (10 ** 10), returning 8618996737.

Anyway, that’s it for now. Happy digit-hunting.

Stirling’s Formula

Meet Stirling’s Formula:

\[ \lim_{n \to \infty} \frac{n!}{\sqrt{2\pi{n}}\cdot n^{n} \cdot e^{-n}} = 1\]

Essentially, this formula says that for very large n, the formula \( \sqrt{2\pi{n}}\cdot{n}^{n}\cdot{e}^{-n} \) approximates \( n! \), since the ratio between these two terms approaches one. This is very useful for calculating large \(n!\) since it is a lot faster.

Faster squaring

But why, you might ask? Calculating \(n!\) takes \(n-1\) multiplications, since you are multiplying n numbers together, while the term \(\sqrt{2\pi{n}}\cdot(\frac{n}{e})^{n}\) takes at least as much, since a) it involves an n-th power, and b) it has a squareroot, and other scary things like \(\pi\) and \(e\).

Turns out, there is a really neat method called exponentiation by squaring that allows us to calculate powers (in our case \((\frac{n}{e})^{n}\) ) a lot faster. Here is how it works:

Let’s say we want to calculate the following power: \(3^8\). If we just multiply eight 3’s together, we’ll need 7 multiplications. However, remember that we can split exponents, specifically:

\[a^{nm} = (a^n)^m\]

We can do the same thing with our power by dividing out the two’s:

\[3^8 = (3^2)^4 = ((3^2)^2)^2\]

This is pretty neat – after all, squaring a number is just one multiplication, so now we only need 3 multiplications rather than 7! Let’s try that again, now with something larger:

\[2^{28} = (2^2)^{14} = ((2^2)^2)^7 = 16^7\]

Now we’ve run into a problem: uneven exponents. We can’t split the 7 into 2 and 3.5 since then we’d have to take roots, and our goal was to find a faster  way to calculate power of n. In fact, since 7 is prime there is no good way to split it into two other integer exponents using the above method. However, not everything is lost! We’ll use this other rule for exponents:

\[a^{n+m} = a^n\cdot{a}^m\]

How about we rewrite 7 to 6 + 1?

\[16^7 = 16\cdot{16}^6 = 16\cdot(16^2)^3 = 16\cdot(256)^3 = 16\cdot{256}\cdot(256)^2 \]

In its final form our equation only needs 3 more multiplications, and it took us another 3 multiplications to get the 16’s and 256’s. That makes 6 multiplications in total, as opposed to the original 27 we’d need for \(2^{28}\). This is pretty cool, now let’s try coming up with a formal algorithm for this method.

The algorithm

Let’s assume we are given a value x and a positive integer exponent n. We’re not going to worry about non-positive exponents since those would be somewhat trivial to handle (Cue “this proof is left as an exercise for the reader.“), and fractional exponents are a whole different beast. In other words, n is positive and whole, or, if you’re a mathematician, \(n\in\mathbb{N}\).

The best way to do this would be by using recursion, since our algorithm continuously splits powers until the exponents are small enough (<= 2). Let’s start by defining a function that does this:

function power(x, n)
	if(n == 1) 
		return x;
	elseif(n == 2) 
		return x * x;
	else ?

Those are the special cases, now we get to the recursion. First, we’ll have to check whether the exponent n is even or uneven. If it is even, we can divide it by two and raise x*x to this power. If it is uneven, we subtract 1 and try again.

function power(x, n)
	if(n == 1) 
		return x;
	elseif(n == 2) 
		return x * x;
	elseif((n % 2) == 0)
		return power(x * x, n / 2);
	else
		return x * power(x, n - 1);

(Where % is the modulo operator, i.e. if c is the result of a % b, then c is rest after division a/b. So (n % 2) == 0 if and only if n is even.)

This actually works fairly well, but we can make a slight optimization. We already noticed how, after dividing our exponent by two, there’s no good way to predict whether the new exponent will be even or uneven. However, when we subtract one we do know. If n is uneven, then n-1 must be even! We can skip a step in our recursion by changing the last line to this:

return x * power(x * x, (n - 1) / 2);

The new exponent (n – 1) / 2 might very well be even as well, but this time there is no simple way to predict that. Anyway, our full (pseudo)code now looks like this:

function power(x, n)
	if(n == 1) 
		return x;
	elseif(n == 2) 
		return x * x;
	elseif((n % 2) == 0)
		return power(x * x, n / 2);
	else
		return x * power(x * x, (n - 1) / 2);

So there you have it! Exponentation by squaring, a simple method to greatly reduce the number of multiplications needed for calculating \(x^n\). If you’re a programmer and you think you should maybe implement this for your projects: don’t worry, most of the time compilers use this (or a similar) method as well. It’s just a nice example of how some creative math can get you a long way when it comes to optimizing code.

Back to Stirling

There are other reasons that Stirling’s Formula (or Stirling’s Approximation) is a useful method to approximate n!, but if you hadn’t noticed yet, it was just an excuse to talk about exponentiation by squaring. I would encourage you to research it on your own, though.

Major website redesign!

If you are one of the millions of people following my site, you may have noticed a few sliiight changes to how it looks. I ditched the giant empty header, and switched to a sliiightly less depressing monochrome color scheme.

CMS

More importantly though, I switched to using WordPress for the entire site rather than just the blog. It was definitely time to switch to a CMS – to give you an idea, when I wrote my old website I had no idea how mod_rewrite worked, and had to create paths to new pages by hand (i.e. creating the whole folder structure), amongst other things.

WordPress is very customizable and it was quite easy to write a custom theme that allowed me to display sets of pages (tutorials, games, etc) on different pages in a custom format. All I have to do now is just write a page in the WYSIWYG-editor and file it under, say, ‘Tutorials’, add a custom field detailling what programming language it is about, and it’ll show up in the list.

I have added all my old games and software, and the majority of my tutorials. There are a handful left but those will get added later on. I didn’t add all the old blogposts as I felt they weren’t really interesting anyway. The exception to this is the blogpost about reskinning GM:Studio to GM8-looks, but that one is now filed under the tutorials.

Future plans

So let’s talk content. As I mentioned in an earlier, now deleted blogpost, there’s still a lot of tutorials (primarily about GameMaker) that haven’t made it to my website yet. I’m going to collect and add them, though that process might take a while.

In addition to that, I’m going to expand into other subject areas such as Javascript (for HTML5, mostly) and C#, as well as tutorials that are not language-specific, but rather present pseudocode or more theoretical knowledge. Game Design seems interesting as well.

I’m also going to try to write some blogs more often. This, admittedly, is an aspiration shared by many an amateur blogger, but hey, I’m different! Potential topics include Math, Game Design and Computer Science.