Friday, January 18, 2013

One-time pads with Python

A one-time pad is a kind of unbreakable encryption.  For most encryption, breaking it is a matter of throwing a lot of computational resources at the problem.  Typically, the amount of resources needed greatly exceeds the amount that is practical to obtain, so most modern cryptography is secure enough.  There are, however, some downsides to modern cryptography, the biggest of which is its complexity.  If we want to use crypto for something like bank transactions, complexity is not that big of a deal, because centralization can hide most of the complexity from end-users.  But if, for instance, you need to implement secure communications without a centralized certificate authority, effectively implementing secure communications becomes a lot harder.

But why would you not want to have a CA?  Well, suppose you're a dissident in an oppressive state like China.  You want to communicate securely with other dissidents, but can't or won't register with any sort of centralized organization.  You have a small group of closely-trusted individuals who you can meet in real life, but only infrequently, to avoid arousing suspicion.  You could still use something like a web of trust, but it's overkill for your plans; all you need to do is send short encrypted messages to individuals, and having dedicated PGP software on your computer could be incriminating.

So if modern cryptography doesn't fit these needs, what's so great about a one-time pad?  It's simple yet perfectly secure if done correctly.  Enough discussion, let's have some code.

Here's the code to create keys:

import os
import base64

LENGTH = (140 * 3)/4 # = 105
secret = os.urandom(LENGTH)
print base64.b64encode(secret, '#*') 

These keys are just small enough to be texted or tweeted, but that's a bad idea because you'd be doing so "in the clear."  Instead, you could save them to a flash drive or print them out as QR codes.  Printing them has the advantage that paper is easier to destroy or render unreadable than flash drives, and it's hard to be sure you've properly overwritten data on a solid-state device.  On the other hand, a flash drive is more concealable.

The os.urandom function is supposed to be "suitable for cryptographic purposes," but that doesn't mean it's perfectly random; one-time pads are a lot more sensitive to this sort of thing than typical modern cryptography. This could be a concern for the safeness of our key. On Linux in particular, it uses /dev/urandom when I'd personally be more comfortable using /dev/random and instructing the user to mash the keyboard until the program continues. On Windows, it seeds a PRNG with a lot of disparate sources of entropy, and is probably safe enough for 105 bytes at a time. I'd be concerned about longer keys, however.

Here's the code to encrypt/decrypt something:

import base64

key = "..." # Get from input or file
secret = base64.b64decode(key, '#*')
plaintext = u"Here's some secret text we want to encrypt.".encode('utf-8')
encrypted = bytearray()
(secret, plaintext) = (bytearray(secret), bytearray(plaintext))
for (pbyte, sbyte) in zip(plaintext, secret):
    encrypted.append(pbyte ^ sbyte) # remember, ^ is XOR
print base64.b64encode(str(encrypted), '#*')

If the plaintext is limited to 105 bytes of UTF-8, the encrypted output will be 140 characters or less (tweet-sized); moreover, those characters will be A-Z, a-z, 0-9, and the characters #, *, and =, meaning they'll be fine to text as well since the standard GSM 7-bit alphabet includes all three.  Very cheap handsets might not have a key for the = symbol, but since it only occurs at the end, and then only up to twice, it's not very difficult to work around (e.g. send a second text saying "That last one ended with two equals symbols.").  If the plaintext is longer than the key, it'll be truncated to the length of the key.

If you want to decrypt something, just encrypt it again with the same key.  You'll need to Base64 decode at the beginning instead of Base64 encoding at the end, but it's otherwise exactly the same.

It should go without saying that a key should be used only once, hence the name "one-time pad."  Reusing a key, even once, renders the encryption ineffective at best.

The obvious downsides to this system are symmetry and lack of authentication.  Symmetry means the key to encrypt is the same as the key to decrypt.  This poses practical challenges because the key needs to be secretly distributed to only those people you trust, while an asymmetric system allows public keys to be redistributed freely without endangering security.  The lack of authentication means there's no way for the sender to prove his identity to the recipient, nor vice versa.  If the authorities obtain a copy of the keys, they can masquerade as either party.