So I’m taking this awesome class this semester, Computer and Network Security, taught by Prof. Ron Rivest, and today the class talked a bit about the history of cryptography and security. After lecture, I felt the need to do a more in-depth research of various number theory proofs and concepts that are the foundations of cryptography and here’s the result of that. I’ve also been doing quite a bit of application building over the past couple of weeks so this is certainly a nice switch back to theory.
To set the basis for modern cryptography, we need to know that:
- There are an infinite number of prime numbers. The proof is very straightforward.
We may prove by contradiction. We assume that we only have a finite set of primes given by . Then, we can create another number, , such that . Given our assumption that we know all of the finite primes in the universe, must be composite. Therefore, some prime in must divide . However, this is not possible because that would imply that some prime in also divides . Therefore, either another prime divides or is a prime. Both of these possibilities contradict our assumption. QED
But how does knowing this lead to better cryptographic systems? Well, this knowledge wouldn’t have as much weight if the RSA algorithm didn’t exist. The RSA algorithm is an example of what is known as public key cryptography. Public key cryptography is a simple scheme where someone named Alice can set a public key, that others can use to encrypt a message, , and this encrypted message, to Alice: . Alice can then use a separate key known as her secret key, to decrypt the encrypted message, , back to : .
This sounds great and all except there is one condition that all safe and secure public key encryption schemes should follow: one may not obtain the secret key, , from the public key, . In other words, it is very hard (meaning it should take thousands and thousands of years) to decrypt a secret message encrypted by using without .
This is where RSA comes in.
- The RSA algorithm describes a public key cryptosystem. To generate the public and private keys using RSA, one would:
- Randomly generate two prime numbers, and , and take their product, . Preferably, the two prime numbers should be of similar number of bits. (In an extreme case, think of how hard it would be to factor, $n = 17746761831$. Not too hard right?)
- Then, compute Euler’s totient function, where Euler’s totient function computes the number of numbers that are relatively prime to the input, are less than the input, and are positive.
- Using the output of the totient function, choose a number, such that and . In other words, and are relatively prime. In our previous case with Alice, the public key Alice would distribute are and : .
- Now, we need to determine Alice’s private key. First we define a number to be the multiplicative inverse of modulus , . Then, the Alice’s secret key would consist of , . However, even though , , and are not included in the private key (because they are not necessary to decode the encrypted message that Alice receives), they should not be made public. In fact, the security of the system relies on keeping , , and private.
- Given Alice’s new public and secret keys, we can now encrypt and decrypt messages. To encrypt a message, an outsider (why don’t we call him Bob), takes Alice’s public key and computes the encrypted message: . Then, Alice would take and decrypt this message by computing: . To see why this works, we have to consider another property that Euler discovered.
Euler’s Theorem: if and are relatively prime.
We can now prove that we can get the original message that Bob sent by using the formula, . NOTE: This is not correct. There is a slight flaw in the proof below. Try to find it or see the comments for the correct proof.. We know from our definition of that . We can thus rewrite, where can be any positive integer.
Thus, by Euler’s Theorem.
Simplifying even further, we get: . QED
The security of the RSA algorithm depends on how hard it is to factor given that no one besides Alice knows what and are. (We can see this because calculating depends on knowing . Once we know , we can compute .)
The entire algorithm hinges on the fact that we don’t know an efficient way to factor large numbers. I’ll talk about various factoring techniques in my next blog post since this is already a very long blog post and I should probably be doing homework and sleeping (not simultaneously of course).
EDIT (04/17/2014): There’s a slight problem in my proof above. There is also a simple fix for it. I will leave it as an exercise for you to find the flaw in the proof. Answer will be in the comments.