Friday, January 04, 2008

TrueCrypt explained

The most popular cryptographic software for Windows is probably TrueCrypt. In this article I will explain how TrueCrypt works and as a by-product a working Python implementation will be provided. This article is written from a programmer perspective and the math will be kept to the minimum. The emphasis is on how TrueCrypt uses cryptographic primitives such as AES and SHA-1, not how the primitives themselves work.

It should be noted this article was written with the current (as of writing) TrueCrypt 4.3 specification in mind, which has deprecated several cryptographic algorithms and modes of operation, although they are still in the program for compatibility reasons. This article will only describe the non-deprecated features namely AES, Serpent, Twofish and the LRW mode of operation. If you have an old TrueCrypt volume that uses for example Blowfish and CBC mode, you won’t learn how to read it in this article.

Before we get into the nitty-gritty details of TrueCrypt there are some basic things we must understand. If we remove the cryptographic part of TrueCrypt the only thing the software does is mount file systems, exactly like the program ‘mount’ on unixes and ‘daemon-tools’ (which can mount the CD and DVD file systems) on Windows. If we take an encrypted TrueCrypt-container and decrypt it, we’re left with a file system.

What we are going to do is take a TrueCrypt volume, we can call the file volume.tc, and decrypt this to a file system, say volume.fat32. We can then use user-level file system tools to extract files from the file system, just like we work with zip/tar archives. This is not very user friendly, but perfectly adequate since we are only interested in the cryptographic part of the program, not the win32-driver that does the mounting or the graphical user interface.

If you remove every part of TrueCrypt except the part that does the encryption and decryption, TrueCrypt is very easy to understand.

Preparations

In this article we are using the Python programming language. The good news is Python actually provides the SHA-1 algorithm in the standard library, and SHA-1 is one of the cryptographic primitives required to read a TrueCrypt volume.

The bad news is we don’t have Rijndael aka AES, Serpent (a cryptographic algorithm like AES), Twofish (another cryptographic algorithm like AES), RIPEMD-160 (a hash similar to SHA-1) or Whirlpool (an advanced hash algorithm). Fortunately for us, there are many third party Python modules with this functionality, such as the Python bindings for mcrypt or the Python Cryptographic Toolkit. It is also fairly straightforward to create modules from C code using Pyrex.

However, for the sake if portability and simplicity, I have provided some pure Python implementations of all the cryptographic algorithms needed. These implementations are converted directly from the C code and are thus very ugly Python, but the code works although it’s slow.

Code: serpent.py, twofish.py, rijndael.py, ripemd.py, whirlpool.py

The three ciphers have the same interface and should be easy to work with. The two hashes have the same interface as the sha and md5 module.

>>> from serpent import *
>>> cipher = Serpent()
>>> cipher.set_key('a'*32)
>>> cipher.encrypt('01234567abcdefgh')
'\xc0oN\xefw\\\xa8\x06GQG[\xcc\x94\x0e1'
>>> cipher.decrypt(_)
'01234567abcdefgh'

Please note that the cryptographic primitives above are not really part of this article. For the rest of this article I will simply assume the ciphers and hashes with the interfaces described are available. As mentioned previously, this article won’t explain how the five primitives works, just how TrueCrypt uses them.

To try the code in the article it is very useful to have a TrueCrypt volume ready. I strongly recommend against using your production volumes with all your secret files, mainly because your password may end up in log files or swap files. Here are some recommendations when you create the new volume. One. Select a small size, a few MB or so. A small volume is easier to work with. Two. Use a FAT file system. The FAT file system is much easier to read for humans than NTFS. Three. Select the SHA-1 hash. While RIPEMD-160 and Whirlpool works perfectly well, the SHA-1 code is much faster because it’s not implemented in pure Python like the other two. Four. Don’t forget to add some files to your volume! I suggest text files, which are very easy to read.

Once you have code for the cryptographic primitives and a working TrueCrypt volume, we can proceed.

High-level overview

You have a TrueCrypt volume and you want to decrypt it. The only thing you have at your disposal is the correct password and a programming language. Here is a high-level overview of what must be done. For now we will ignore hidden volumes. Those will be dealt with later.

Step 1. The password you chose when you created the TrueCrypt volume is not actually used directly, but indirectly. Using a so-called key strengthening algorithm, we use the password and some additional data to generate new better passwords (the correct terminology is keys). These keys will be needed later. The idea of key strengthening is to perform lots of time-consuming transformations on the initial password to make brute force attacks more difficult.

Step 2. Using the keys generated, the volume header will be decrypted. The volume header is the first few bytes of the TrueCrypt volume file and it contains information needed to decrypt the rest of the volume. The volume header also contains useful ancillary data such as time stamps.

Step 3. If Step 2 was successful, we now have at our disposal all the keys necessary to decrypt the actual file system.

These three steps are somewhat simplified since we don’t know which cryptographic algorithm is the correct one (when a TrueCrypt volume is created we can select between several, including chains of algorithms). We will deal with this later.

Key Strengthening

Do you remember when you created your last TrueCrypt volume? One of the steps required when creating the volume is you have to choose a hash algorithm, one of SHA-1, RIPEMD-160 and Whirlpool. This algorithm is used for the key strengthening. But I’m getting ahead of myself here.

As mentioned previously, key strengthening is the conceptual method of creating strong key(s) from a weak key. There are many methods to do this. A very simple method is to hash the key thousands of times. Consider for example the following Python code.

import sha
def strengthen(key):
    for i in xrange(50000):
        key = sha.new(key).digest()
    return key

>>> strengthen("password")
'+\xdeF\xb4\xd9\xb6\xf3\x1f\xde\xb9!*\xe6\xec\xa2K\xac!\xbbv'

An attacker armed with a dictionary will soon be discouraged because for each password he tries he has to perform several thousand additional computations. If the attacker has a list of 2000 potential passwords, he has to perform 2000 * 50000 SHA-1 computations if we are using the example above! Notice there is no way to parallelize our strengthening algorithm. This is by design.

As a clever by-product most key strengthening algorithms can also be used to derive several keys from one key. Lets say you already have a 128 bytes long key. In that case you can create several shorter keys by splitting it, for example into four 32 bytes long keys. But what if your key is short? Instead of repeatedly hashing the output as above, we can for example concatenate the output to the input and thus create a very long stream of bytes, which can be used to create several keys of various lengths. The important thing to realize is by some small modification to the strengthen routine we can return a key much longer than 20 bytes, which can be split into several keys, if needed.

import sha
def strengthen(key):
    for i in xrange(50000):
        key += sha.new(key).digest()
        if len(key) >= 128:
            key = key[-128:]
    return key

>>> strengthen("password")
'\xab$\xa1\x83l\xba\xa6Y\x94wi\x00\xe11\x96I\xb1\xc7\x84\x14\xcd'
'\x8fo\x97\xd64\xafw\xd0?J\xbd\xd6n\x80#X\xb0\x1cY%\x0f\xd4\x99\xf0'
'\x8c\x14\x885u?\xdd\xa2*\x87\x96\xd0\x97\xdf\xd9\x08\xc0\xe4\xd5k'
'\x9e=\xa3\xb5\r\xec\x88h\x12\xf9}\xb1A\x02\x20:\xd7<\xf4\x04\xce#7m'
'\xd5\x20\r\x86\x1a\x1b\xe0\xd7\x9ai\xc3(5\x8b\xe6\x1c\xf1\xa9:'
'\xfa\x94H\xbbV\x9f\xa3#^\x9f\xa3:\x9a/\xcb.A\xb0\xc3J'

TrueCrypt uses an advanced key strengthening algorithm known as PBKDF2, described in RFC 2898. It is not completely necessary for us to understand how PBKDF2 works; it is a cryptographic primitive like AES and SHA-1. We only have to know how to use it. The function PBKDF2 takes as arguments a hash-function, a password, a salt, an iteration count and a desired length of the output. (Note that the hash function is not the function directly but a HMAC version. This doesn’t change anything for us since it’s just another cryptographic primitive well understood). Here’s an example:

>>> PBKDF2(HMAC_RIPEMD160, "password", "\x12\x34\x56\x78", 5, 4)
'\x7a\x3d\x7c\x03' 

The first argument is the HMAC version of the RIPEMD-160 hash algorithm. The second argument is our password. The third argument is the salt. A salt is a randomly generated value, more about the salt later. The fourth argument is the iteration count. In this case we have selected 5 iterations, but more is better (but slower). This is basically the same thing as the 50000 iterations in our simple example above. The fifth argument is the desired length of the output. As seen, the output is 4 bytes long.

With our new acquaintance the cryptographic primitive PBKDF2 we can proceed. For each hash algorithm TrueCrypt uses we are now going to generate a 128 bytes long key. This key is needed later. We already know the password; after all we created the TrueCrypt volume. The iteration count is part of the TrueCrypt specification and is 2000 for SHA1, 2000 for RIPEMD-160 and 1000 for Whirlpool (because Whirlpool is more advanced and computationally heavy than the two others). Why generate a 128 byte long key and not a shorter or longer one? We only need 128 bytes, which we will see in the next step.

What then is the salt? Now we can finally start working with the actual TrueCrypt volume file. The salt is the 64 bytes at the beginning of the volume file. Open the TrueCrypt volume, read the first 64 bytes (don’t forget to read the file in binary mode) and we have the salt. Notice that the salt is not encrypted and it doesn’t need to be! Intuitively this doesn’t make any sense but it will become clear later in the article.

It should be noted that we are not supposed to know which of the three hash algorithms was used when the volume was created. We simply create three 128 bytes keys, one for each of SHA-1, RIPEMD-160 and Whirlpool, and find the correct key in the next step by trial-and-error (assuming, of course, we used the correct password). With that said, the only thing we have to do before proceeding to the next step is get the salt and generate the three keys.

password = "password"

fileobj = file("volume.tc", "rb")
salt = fileobj.read(64)
fileobj.close()

for hmac in [HMAC_SHA1, HMAC_RIPEMD160, HMAC_WHIRLPOOL]:
    iterations = 2000
    if hmac == HMAC_WHIRLPOOL:
        iterations = 1000
    key = PBKDF2(hmac, password, salt, iterations, 128)
    print repr(key)

Code: keystrengthening.py. Requires ripemd, whirlpool.

Decrypting the header

The good news is: from here on everything will be relatively easy. None of the hash algorithms are used after we have created the keys, and no new keys will be generated. Once we have the three 128 byte keys from the previous step we can forget everything about hash algorithms, key strengthening, Whirlpool, HMACs, PBKDF2 and what not. The bad news is this part is heavy on theory.

In this step we have to work very closely with the TrueCrypt volume file. More specifically, we have to work very closely with the first 512 bytes of the TrueCrypt volume file. Open your TrueCrypt volume, read the first 512 bytes, and forget about the volume file because this is the only thing we need for now.

If you remember the previous step, the first 64 bytes of the volume file is the salt. We don’t need the salt, so we can discard it. So what we are actually interested in is the 448 bytes from byte 64 to byte 512.

>>> data = file("volume.rc", "rb").read(512)
>>> header = data[64:512]
>>> len(header)
448

So, just to remind us, we now have a 448 bytes long encrypted header and three 128 bytes long keys. We are going to use one of the keys to decrypt the header. How do we know we are successful? Fortunately for us, the decrypted header begins with the magic bytes ‘TRUE’. So if we use one of our keys, decrypt the header with it and find that the first four bytes are ‘TRUE’, we have probably provided the correct password to the key strengthening algorithm. To make sure we have the correct password and the magic bytes were not just a coincidence, there are some additional data in the header we can use to verify the correctness of the header. This is not important now.

So what to do? Simple, you say: Just try the three keys in each cryptographic algorithm and see if the magic bytes are what we’re looking for when decrypting the header. Not so fast. TrueCrypt doesn’t use the cryptographic algorithm directly on the data. This is why: All the cryptographic algorithms we are using are so called block ciphers. They encrypt and decrypt blocks that are a certain number of bytes large. Let’s say the block size is 16 bytes. This means the cipher can only encrypt and decrypt data of length 16. If we have a large block of data we split it into 16 bytes long blocks and encrypt/decrypt them separately.

Here is the problem. What if two blocks contains the exact same data, for example file headers? Then the encrypted blocks will look exactly the same for an attacker looking at the cipher text (that is, the TrueCrypt volume file). This is highly undesirable, especially for such large files as file systems where there’s plenty of room for redundant information. If each block in the volume was decrypted separately with the same key an attacker would easily be able to guess the contents of the volume, based on the regularities alone.

Enter cryptographic modes. A cryptographic mode is a “scheme” how data should be encrypted and decrypted to remove weaknesses, such as the weakness we have described. A very simple cryptographic mode works like this: Fill the first block, block 0, with random bytes. The remaining blocks should contain the actual data. However before encrypting block one, XOR the data with the random bytes from block 0. Before encrypting block two, XOR the data with the encrypted data from block one, and so on. As each block is “chained” to the previous block, the cipher text will not look the same for two blocks with the same plain text. As with the salt in the previous steps, the random bytes don’t have to be encrypted. Using the common cryptographic notation, this mode can be described as:

Ci = EK1(Ci-1 xor Pi)

Where Ci is the cipher text, E is the encryption algorithm, K1 is the key, Ci-1 is the previous cipher text block and Pi is the plain text.

The mode described above is a simple form of the so-called CBC-mode, where CBC stands for Cipher Block Chaining. For the untrained eye CBC-mode seems plausible and useful but in fact it has many non-obvious weaknesses. For this reason, a new mode was designed. This is known as the LRW-mode from the authors Liskov, Rivest (the guy who designed MD5), Wagner, and it’s the mode TrueCrypt use.

The LRW-mode is one of the few cryptographic primitives very useful to understand how it works to better understand TrueCrypt. With the same notation as above, we can completely describe the LRW mode:

Ci = EK1(Pi xor (K2 mul i)) xor (K2 mul i)

If you understood the notation describing CBC above it should not be too difficult to understand the description of LRW. There are three problems. First of all, K2 is a second key, we can call this the LRW key. This LRW key is actually part of the 128 bytes long key we generated in the previous step, exactly like the cipher key K1. We know both K1 and K2. Second, the above mode mentions i. What is i? It’s just the block index, that is the index of the block Pi.

Now, unfortunately, comes the abstract algebra. In the description above, mul means multiplication in the Galois field GF(2128). This is the mathematician’s way of describing a kind of advanced bit hacking operation. Addition and subtraction in GF(2n) is the same thing as our old friend xor. In fact, the formal definition of LRW doesn’t “xor”, it “performs addition in GF(2n)”. (Note that n can be something other than 128, but it is 128 for the ciphers in this article.)

I will not describe exactly how this kind of multiplication is performed, I will just refer to the following Wikipedia article: Wikipedia: Finite File Arithmetic and give these encouraging words: if you can multiply large numbers with a pen and paper like you learned in elementary school, you can perform a multiplication in GF(2128).

We can now write some code! First of all we have to write a short module for arithmetic in the Galois fields GF(2n), because we need to do this where n=128 for the LRW mode and the ciphers TrueCrypt uses. There are many clever ways to quickly multiply numbers in this field but I will for simplicity go for a schoolbook implementation, it’s less than 100 lines. For example:

def gf2n_mul(a, b, mod):
    """Multiplication in GF(2^n)."""
    pass

def gf2pow128mul(a, b):
    """Multiplication in GF(2^128)."""
    return gf2n_mul(a, b, mod128)

>>> gf2pow128mul(0xb9623d587488039f1486b2d8d9283453, 0xa06aea0265e84b8a)
0xfead2ebe0998a3da7968b8c2f6dfcbd2

Code: gf2n.py.

Using the module we just wrote, we can now write some code for the LRW mode. The main function will look like this (see lrw.py for the actual implementation, it's just in code what we explained above):

def LRW(cipherfunc, lrwkey, i, block):
    pass

Where cipherfunc is a cryptographic function initialized with a key (K1) that performs either encryption or decryption, lrwkey is the LRW key K2, i is the block index and block is the block of plain text or cipher text, depending on whether we want to encrypt or decrypt some data.

The informed reader will notice a multiplication in our Galois field will at most output 16 bytes. This means the length of the block must be 16, or a multiple of 16. A helper function LRWMany will be provided which simply splits the block into 16 byte blocks and calls the LRW function for each block.

def LRWMany(cipherfunc, lrwkey, i, blocks):
    length = len(blocks)
    assert length % 16 == 0
    data = ''
    for b in xrange(length / 16):
        data += LRW(cipherfunc, lrwkey, i + b, blocks[0:16])
        blocks = blocks[16:]
    return data

Code: lrw.py. Requires gf2n.

We can now finally decrypt the TrueCrypt volume header. Remember the 128 bytes long key? The first 16 bytes of the key is the LRW key. The bytes from 32 to 128 are keys for the cryptographic algorithms themselves. More precisely byte 32-64 is cipher key 1, byte 64-96 is cipher key 2 and byte 96-128 is cipher key 3. For TrueCrypt volumes using multiple cascaded ciphers two, or all three, cipher keys are used. For volumes using a single cipher only the first key is used. In Python:

lrwkey = key[0:16]
key1 = key[32:64]
key2 = key[64:96]
key3 = key[96:128]

Let us for a short moment ignore cascaded ciphers and say we have a TrueCrypt volume using the Twofish algorithm. To decrypt the header we simply do the following:

>>> cipher = Twofish()
>>> cipher.set_key(key1)
>>> decryptedheader = LRWMany(cipher.decrypt, lrwkey, 1, header)

Where header is the 448 byte header and 1 is the block index. The header has block index 1. Here’s an example how the header may look like before and after decrypting it.

+\xf0\xbb\x0e\xeb\x82K\x0bQ\xfa\rs\x11\xb8c=\x1f!\xc6`2\xa5\xb8\xefZ\xd7\xf6\x82~\x1b
\xe0\x18Cl\x1eM\xffW\x85H\xc6k\xba\xb4\xb9\x86{G\x05(\xa4\x8b\x0c\x9b\x06\x13\xa5\x8d
\xe9\x06E\xa7\x01pz\x97\x87\x02\x8a\xb6;\xf9O\x8e\xfa\x14\x3ea\xd1\x911\xa2w\xfe?C\x1e&
\xd5\xfe\xdc\xceu\xd9\x04\xfb\xf2\xa3\xde\xa0-\xee\xdc\xb8\xe7\x91\x80[N\xc6)#l\x08q
\x89\xe0=\xa1\xd7\x0e\x90nO\xc9s\xde\x92\xbbv\x14\t\xeb_\x92\x85_\x18\xb9\xba\x1b]
\xb7\xf8\x17\x03\xc5\x86\xac\xcd]\xd7\xc5\xcb\x9f!\xc4\xec\x05\x85\xc9\x91\xb5Pm\xdb
\xdc\xc7\x14_\xe3I\x86k\xdcA\x3ef\xb9\x10"\xcd\x14I\xd1\xe3|\xd0\x95\xd0\xcb\x12\x83a,
\xa3K\xac\x0b+\x1b\x02-?g\xf8\xad\x93\x01\x96p^\xd6\xa3\x8dY\xa4\xc6\xa5+u\x80\x19
\xcexX\x8d\xd0\xa6\xb0\x0b\x97\x19\xa2h\xe8/D\xef,\xc4\x1dy\x7f\xdc\xf6\n\xe4\xafw\x80
\x97\x18\xb5jPT\x81m\xbc\x9a\x94\x1b\x3e\xa9\xf6\xe0\xa9\xc1M\xff\xed)D\x8f\x16\x8e\x9al
\x0c\xdbhl\xf7\xbb\xfc\x04%\x96\xd2\xa1n\xf3\xe9K\x96\xfe\xe1\x20\xec\'\xc4\xce\x89
\xbcG\x1d\xc9\x8e\xcd\xb3?\x15{_\xff^f9\xc5-\xdd\x9c6\xf5b\xa4\xb5\xf08v\x93\xed\xee
\xc1K\xd6\x95R7\xe6\x1c\xcad\xee\xe9\xa7\x10\xf7\xe0\x90\xaa\x81\x04\xb2c\xa2d\xd8T
\xe5\xa3\xe3f\xbcG;\xb7\xf8\x3e\xe2\xad\xcc5\r\xf0Z\xec\x98\xc9\xd9\xb6\xf3\x12\xa1fY
\xa0S\xd9E\xe5\xbc\x86p(&\x0c-\xf3\x8a\x80\x81\xb5\x95j\x9c\rfQ\xb2\xdcu\xfa=\x18\x01
\x96[\x1a\xab[\xb1{8\xaf{\x1e\xb2\\\xaf\x11(!\x00h\xb1\xe7\x85\xd4\n\xc5\x3c\x90\xb0QQ\xd5

'TRUE\x00\x02\x04\x10\x19\xe8\x89\x0c\x01\xc8=\xbes\xcd\xd9\x10\x01\xc8=\xbes\xcd\xd9
\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xcb\r\xf0_CRw
\x84\xa2\x81!:\x1d\xbd\x90\x16\x95\x99WS\x1do\xc5\xf4\xdd#\xe7\x10\xfc\x18\x86\xddl
\xbf/\xd6\x02M\x13\x11\xc7\x86H\xbe"\x05\x19\xe4\x1e\xf2fO4\xa2_\x15Lg\x80\x99~\xd1
\xd1K\x14H\xb7\xd7\xa0B\xf8\xc4\xb9`\xfcc\x14j\xa9\xd7\xe2\xcc\x9a\x81C\xa3\xc3\xfb~
\xfc\xc0\x83z\xb3\x8c\x10\x05z\x98\xaf\xcd,+\x91N\x1c#\xf6\xe3\x83\x88\x1b\x1e\x89r
\xba\r;3\xc4\x0b.\'\x15\x3cn9\xb3\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00'

As we can see, the decrypted header has the magic bytes ‘TRUE’ in the beginning, as promised. Before we go over the content of the header, we should be able to handle cascaded ciphers. This is actually trivial to implement because we don’t have to change the LRW functions, just supply a decrypt or encrypt function as we did before. Consider the following code, with (almost) the same interface as the cipher algorithms:

class CipherChain:
    def __init__(self, ciphers):
        self.ciphers = [ciph() for ciph in ciphers]
    def set_key(self, keys):
        i = 0
        for cipher in self.ciphers:
            cipher.set_key(keys[i])
            i += 1
    def encrypt(self, data):
        for cipher in self.ciphers:
            data = cipher.encrypt(data)
        return data
    def decrypt(self, data):
        for cipher in reversed(self.ciphers):
            data = cipher.decrypt(data)
        return data
    def get_name(self):
        return '-'.join(reversed([cipher.get_name() for cipher in self.ciphers]))

Ponder the code for a while. Especially consider the encrypt and decrypt function and the use of the Python function “reversed()”. Now consider a TrueCrypt volume using a three cipher cascade instead of simply Twofish. The cascade Rijndael, Twofish, Serpent is known as “Serpent-Twofish-AES” in the TrueCrypt user interface.

>>> cipher = CipherChain([Rijndael, Twofish, Serpent])
>>> cipher.set_key([key1, key2, key3])
>>> decryptedheader = LRWMany(cipher.decrypt, lrwkey, 1, header)

Please note that CipherChain([Rijndael, Twofish, Serpent]) is not the same cascade as CipherChain([Serpent, Twofish, Rijndael]).

In this section of the article we have covered lots of ground and we have taken some detours into the realms of abstract algebra and cryptographic modes. We can now do something that is probably more familiar to the reader: parse binary data. We can now temporarily forget everything above and concentrate fully on the decrypted header (such as the one above).

There are basically two parts of the decrypted header. First we have information. This is the data in the very beginning of the header. Then we have the curious bytes in the middle of the header, with all the zero bytes surrounding it. These bytes are very important because they are the keys for the rest of the volume. These keys are needed to decrypt the actual file system. We will call this key the master key.

Let’s get back to the information in the beginning of the decrypted header. The first 4 bytes are as previously mentioned the magic bytes TRUE (assuming, of course, we have decrypted the header correctly. If we used the wrong cipher or the wrong hash or the wrong password, the first four bytes will probably be garbage). Following the magic bytes are some big endian integers. The complete layout is as follows:

4 bytes: The string ‘TRUE’
2 bytes: A 16 bit big endian integer. The version number of the volume.
2 bytes: A 16 bit big endian integer: The minimum version of TrueCrypt required to read the volume.
4 bytes: A 32 bit big endian integer: A checksum.

In our decrypted header above, we can parse the first bytes by hand here and now. The version number is 2, the minimum TrueCrypt version is 0x0410 and the checksum is 0x19e8890c. This checksum is used to make sure we really have a correctly decrypted header. Although it’s highly improbable, decrypting a volume with the wrong password may result in the first four bytes being ‘TRUE’. The checksum is added to verify the decrypted header is really valid. If the checksum matches the CRC32 of byte 192-448 of the header, we have a valid header (with extremely high probability).

Following the checksum are three 64 bit big endian integers. The first two are timestamps and the third is the size of the volume if this is a hidden volume (ignore this for now). The first timestamp is the date and time the volume was created. The second timestamp is the date and time the header was modified.

Knowing all this, we can now write some code to decrypt the header using only the password we chose when the volume was created. First lets write some pseudo-code:

for each hash
    key = keystrengthening(hash, password)
    for each cipher
        decryptedheader = decrypt(key, header)
        if magic bytes matches and crc32 matches checksum:
            success!

As a recap, in the first step in this article we took the TrueCrypt volume password and generated keys needed to decrypt the header. These keys, in turn, we used in the second step to decrypt the volume header. In the middle of the decrypted volume header lays the master key needed to decrypt the rest of the volume.

Here’s the complete code for this and the previous step. This code will work like this:

fileobj = file("aestwoser.tc", "rb")
tc = TrueCryptVolume(fileobj, "passwordpasswordpassword", Log)
TCPrintInformation(tc)
fileobj.close()

TrueCryptVolume is a class representing a TrueCrypt volume and tc is the object representing the volume aestwoser.tc specifically, which has the password as shown. The constructor of TrueCryptVolume will generate the strengthened keys from password, as described in the first step in this article. It will then decrypt the header by trying each cipher and cascade. In essence, it works exactly like the pseudo code just written. A TrueCryptVolume object has some useful member variables, especially “master_lrwkey” which is the master LRW key extracted from the decrypted header, and “cipher”, which is the cipher algorithm relinitilized with the master keys also extracted from the decrypted header.

from rijndael import Rijndael
from serpent import Serpent
from twofish import Twofish
from lrw import *
from keystrengthening import *

#
# Utilities.
#

import struct
import time
import binascii

def Log(message):
    print >> sys.stderr, "Progress:", message

def CRC32(data):
    crc = binascii.crc32(data)
    # Convert from signed to unsigned word32.
    return crc % 0x100000000

def BE16(x):
    return struct.unpack(">H", x)[0]

def BE32(x):
    return struct.unpack(">L", x)[0]

def BE64(x):
    a, b = struct.unpack(">LL", x)
    return (a<<32) | b

def Win32FileTime2UnixTime(filetime):
    return filetime / 10000000 - 11644473600

class CipherChain:
    assert not "Code as above in the article"

Cascades = [
    [Rijndael],
    [Serpent],
    [Twofish],
    [Twofish, Rijndael],
    [Serpent, Twofish, Rijndael],
    [Rijndael, Serpent],
    [Rijndael, Twofish, Serpent],
    [Serpent, Twofish]
]

HMACs = [
    (HMAC_SHA1, "SHA-1"),
    (HMAC_RIPEMD160, "RIPEMD-160"),
    (HMAC_WHIRLPOOL, "Whirlpool")
]

class TrueCryptVolume:
    """Object representing a TrueCrypt volume."""
    def __init__(self, fileobj, password, progresscallback=lambda x: None):

        self.fileobj = fileobj
        self.decrypted_header = None
        self.cipher = None
        self.master_lrwkew = None
        self.hidden_size = 0

        salt = fileobj.read(64)
        header = fileobj.read(448)
        
        assert len(salt) == 64
        assert len(header) == 448

        for hmac, hmac_name in HMACs:
            # Generate the keys needed to decrypt the volume header.
            iterations = 2000
            if hmac == HMAC_WHIRLPOOL:
                iterations = 1000

            info = ''
            if hmac_name in "RIPEMD-160 Whirlpool":
                info = ' (this will take a while)'
            progresscallback("Trying " + hmac_name + info)
            
            header_keypool = PBKDF2(hmac, password, salt, iterations, 128)
            header_lrwkey = header_keypool[0:16]
            header_key1 = header_keypool[32:64]
            header_key2 = header_keypool[64:96]
            header_key3 = header_keypool[96:128]

            for cascade in Cascades:
                # Try each cipher and cascades and see if we can successfully
                # decrypt the header with it.
                cipher = CipherChain(cascade)
                
                progresscallback("..." + cipher.get_name())
                
                cipher.set_key([header_key1, header_key2, header_key3])
                decrypted_header = LRWMany(cipher.decrypt, header_lrwkey, 1, header)
                if TCIsValidVolumeHeader(decrypted_header):
                    # Success.
                    self.decrypted_header = decrypted_header
                    
                    master_keypool = decrypted_header[192:]
                    master_lrwkey = master_keypool[0:16]
                    master_key1 = master_keypool[32:64]
                    master_key2 = master_keypool[64:96]
                    master_key3 = master_keypool[96:128]

                    self.master_lrwkey = master_lrwkey
                    self.cipher = cipher
                    self.cipher.set_key([master_key1, master_key2, master_key3])
                    self.hidden_size = BE64(decrypted_header[28:28+8])
                    self.format_ver = BE16(decrypted_header[4:6])

                    # We don't really need the information below but we save
                    # it so it can be displayed by print_information()
                    self.info_hash = hmac_name
                    self.info_headerlrwkey = hexdigest(header_lrwkey)
                    self.info_headerkey = hexdigest(header_keypool[32:128])
                    self.info_masterkey = hexdigest(master_keypool[32:128])

                    progresscallback("Success!")
                    return
        # Failed attempt.
        raise KeyError, "incorrect password (or not a truecrypt volume)"

def TCIsValidVolumeHeader(header):
    magic = header[0:4]
    checksum = BE32(header[8:12])
    return magic == 'TRUE' and CRC32(header[192:448]) == checksum

def TCPrintInformation(tc):
    if not tc.decrypted_header:
        return

    header = tc.decrypted_header
    program_ver = BE16(header[6:8])
    volume_create = Win32FileTime2UnixTime(BE64(header[12:12+8]))
    header_create = Win32FileTime2UnixTime(BE64(header[20:20+8]))

    print "="*60
    print "Raw Header"
    print "="*60
    print repr(tc.decrypted_header)
    print "="*60
    print "Parsed Header"
    print "="*60    
    print "Hash          :", tc.info_hash
    print "Cipher        :", tc.cipher.get_name()
    if tc.hidden_size:
        print "Volume Type   : Hidden"
        print "Hidden size   :", tc.hidden_size
    else:
        print "Volume Type   : Normal"
    print "Header Key    :", tc.info_headerkey
    print "Header LRW Key:", tc.info_headerlrwkey
    print "Master Key    :", tc.info_masterkey
    print "Master LRW Key:", hexdigest(tc.master_lrwkey)
    print "Format ver    :", hex(tc.format_ver)
    print "Min prog. ver :", hex(program_ver)
    print "Volume create :", time.asctime(time.localtime(volume_create))
    print "Header create :", time.asctime(time.localtime(header_create))
    print "="*60

TCPrintInformation will display some information about the volume, such as the hash algorithm used, the cipher(s) used and the information in the header. It may look like this:

============================================================
Raw Header
============================================================
'TRUE\x00\x02\x04\x10\x19\xe8\x89[snip]
============================================================
Parsed Header
============================================================
Hash          : SHA-1
Cipher        : Rijndael-Twofish-Serpent
Volume Type   : Normal
Header Key    : dced57ba7bc3e11ab047e747d2c4a3a6e205f3d36dafd766fe3bc
                77e61597df12781597b7d8c4092bba7c0bfd78e3e58ca90eb6c63
                e68090bae282dbed0ca90fc85f408277ed3b7e26ba2f0d782dfdf
                a04985d4c49caf5d80d59b3f3d3d51567
Header LRW Key: e0977ad2c02991d1e9eda77e9df88583
Master Key    : 6cbf2fd6024d1311c78648be220519e41ef2664f34a25f154c678
                0997ed1d14b1448b7d7a042f8c4b960fc63146aa9d7e2cc9a8143
                a3c3fb7efcc0837ab38c10057a98afcd2c2b914e1c23f6e383881
                b1e8972ba0d3b33c40b2e27153c6e39b3
Master LRW Key: cb0df05f43527784a281213a1dbd9016
Format ver    : 0x2
Min prog. ver : 0x410
Volume create : Thu Dec 13 20:29:17 2007
Header create : Thu Dec 13 20:29:17 2007
============================================================

Reading the data

A short recap of what we have done. In step one we took our TrueCrypt volume and a password. We used this password to derive several new keys using a key strengthening algorithm. These keys were in turn used to decrypt the TrueCrypt volume header. In the volume header we found the keys for the rest of the volume. These are the keys we are going to use to decrypt the actual file system.

Fortunately for us, since we have already implemented the LRW functions, this step will be very easy! We just reinitialize the cipher we used to decrypt the header with the keys from the header and use the new master LRW key to decrypt the rest of the volume. We only have to add one new function to the code above and other than some math to compute the LRW block indices the code should not be too difficult to understand. Given an index starting from 1, the function TCReadSector will read 512 bytes long blocks of data. 512 bytes is the sector size defined in the TrueCrypt specification (note that the sector size doesn’t have anything to do with the underlying file system, it is just the TrueCrypt terminology for the smallest block of data we work with when decrypting the file system).

def TCReadSector(tc, index):
    """Read a sector from the volume."""
    assert index > 0
    tc.fileobj.seek(0, 2)
    file_len = tc.fileobj.tell()

    # The LRW functions work on blocks of length 16. Since a TrueCrypt
    # sector is 512 bytes each call to LRWMany will decrypt 32 blocks,
    # and each call to this function must therefore advance the block
    # index 32. The block index also starts at 1, not 0. index 1
    # corresponds to lrw_index 1, index 2 corresponds to lrw_index 33 etc.
    lrw_index = (index - 1) * 32 + 1

    # For a regular (non-hidden) volume the file system starts at byte
    # 512 (after salt+header).
    seekto = TC_SECTOR_SIZE * index

    # last_sector_offset is the beginning of the last sector relative
    # the end of the file. For a regular non-hidden volume this is simply
    # 512 bytes from the end of the file.
    last_sector_offset = TC_SECTOR_SIZE
    if seekto > file_len - last_sector_offset:
        return ''

    tc.fileobj.seek(seekto)
    data = tc.fileobj.read(TC_SECTOR_SIZE)
    
    return LRWMany(tc.cipher.decrypt, tc.master_lrwkey, lrw_index, data)

While we’re at it it’s useful to know how many sectors we can read. The function TCSectorCount works pretty much the same as the code above.

def TCSectorCount(tc):
    """How many sectors can we read with TCReadSector?"""
    tc.fileobj.seek(0, 2)
    volume_size = tc.fileobj.tell()
    # Minus the salt+header.
    volume_size -= 512
    return volume_size / TC_SECTOR_SIZE

The first block may look like this, for a FAT volume:

\xeb\x3c\x90MSDOS5.0\x00\x02\x01\x08\x00\x02\x00\x02\xffO\xf8P\x00\x01\x00\x01\x00?
\x00\x00\x00\x00\x00\x00\x00\x00\x00)\xe2m`GNO NAME    FAT16   \x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00U\xaa

We have now successfully decrypted a TrueCrypt volume and printed the first 512 bytes of the underlying file system! We can now write a small command line program that will write the complete file system to disk, like this:

$ truecrypt.py volume.tc password filesystem.fat

Here's the code.

def cmdline():
    scriptname = sys.argv[0]
    try:
        path, password, outfile = sys.argv[1:]
    except ValueError:
        print >> sys.stderr, "%s volumepath password outfile" % scriptname
        sys.exit(1)

    if os.path.exists(outfile):
        print >> sys.stderr, "outfile %s already exists. use another " \
              "filename and try again (we don't want to overwrite " \
              "files by mistake)" % outfile
        sys.exit(1)

    try:
        fileobj = file(path, "rb")
    except IOError:
        print >> sys.stderr, "file %s doesn't exist" % path
        sys.exit(1)

    try:
        tc = TrueCryptVolume(fileobj, password, Log)
    except KeyError:
        print >> sys.stderr, "incorrect password or not a TrueCrypt volume"
        fileobj.close()
        sys.exit(1)
    except KeyboardInterrupt:
        print >> sys.stderr, "aborting"
        fileobj.close()
        sys.exit(1)

    TCPrintInformation(tc)

    outfileobj = file(outfile, "ab")
    num_sectors = TCSectorCount(tc)
    num_written = 0
    try:
        for i in xrange(1, num_sectors + 1):
            if i % 100 == 0:
                Log("Decrypting sector %d of %d." % (i, num_sectors))
            outfileobj.write(TCReadSector(tc, i))
            num_written += 1
    except KeyboardInterrupt:
        print "Aborted decryption."
        pass
    outfileobj.close()
    print "Wrote %d sectors (%d bytes)." % (num_written,
                                            num_written * TC_SECTOR_SIZE)
    fileobj.close()
    sys.exit(0)

if __name__ == '__main__':
    cmdline()

This program is of course not very useful in practice but it is at least more user friendly than looking at the file system in the Python top loop. It should be mentioned: you should not use the above program on a “production” volume with all your secret files on them. Your shell may log the password to file, which is undesirable. Additionally you do not want your encrypted file system written to disk because traces may be left of your secret files even if you delete the extracted file system. Traces of your secret files on a non-encrypted filesystem is extremely undesirable.

Finally, don’t worry if the written file system is full of garbage – TrueCrypt writes random bytes to the empty areas of the file system during formatting. If you create a new volume with a FAT file system and then add some text files, you can find your files in the beginning of the file system.

Wrapping it up: hidden volumes

By now we have some code that can successfully decrypt most TrueCrypt volumes. We still have to handle hidden volumes, though. Fortunately for us this is very easy. Before we do that, here is a short explanation of hidden volumes.

A hidden volume is a second TrueCrypt volume residing in the same file as another TrueCrypt volume. Note that the hidden volume is not inside the regular volume, but beside – it is not necessary to decrypt the regular volume to access the hidden volume. A TrueCrypt volume with a hidden volume is basically the same thing as two regular TrueCrypt volumes concatenated together, but in a clever way so the file system in the first volume believes it is as large as the TrueCrypt file itself (it would be very suspicious if the TrueCrypt file was say 10 gigabytes, but the regular volume was only say 6 gigabytes when mounted).

The reason TrueCrypt supports hidden volumes is plausible deniability. If the user is forced for some reason to give up his password, he will give the adversary the password to the regular volume. There is no way whatsoever to tell if there is a hidden volume in the TrueCrypt file so from the adversary’s point of view the TrueCrypt volume just mounted looks like any other TrueCrypt volume, with or without a hidden volume. In fact, when the regular volume is mounted, the hidden volume may become overwritten if new files are added. Not even TrueCrypt knows there’s a hidden volume when the regular volume is mounted. (TrueCrypt does however know when a hidden volume is mounted, but this is not necessary for plausible deniability – we are not giving up the password to the hidden volume when we can give up the password to the regular volume!)

The only way to mount a hidden volume is to provide the correct password. This password must, of course, be different than the password to the regular volume; otherwise the regular volume would be mounted all the time regardless of our intentions. Here’s what happens when we use the hidden volume password instead of the regular password. Obviously, all the steps we have described previously will fail: the hidden volume password is not the same as the regular password. This is because we haven’t written any code for hidden volumes yet. The question is then: what should we do?

This is fortunately very straightforward. When we fail to decrypt the header for the regular volume, we will now go back to step 1, that is we are going to generate new keys using the key-strengthening algorithm. But instead of reading the 64 bytes salt from offset 0 in the TrueCrypt file we are going to read the salt from offset 1536 from the end of the TrueCrypt file. We now have new keys and we will try to decrypt the header. Not the same header, but another header from offset 1536+64 from the end of the TrueCrypt file.

In the code we have already written to read regular TrueCrypt volumes, we have the following two lines to read the salt and the header:

salt = fileobj.read(64)
header = fileobj.read(448)

To read a hidden volume, we only have to add a line to seek to 1536 bytes from the end:

fileobj.seek(-1536, 2)
salt = fileobj.read(64)
header = fileobj.read(448)

Because we should be able to handle both regular and hidden volumes, we only have to add a loop outside the code that decrypts the header:

for volume_type in ["normal", "hidden"]:
    fileobj.seek(0)
    if volume_type == "hidden":
        fileobj.seek(-1536, 2)

We can now decrypt both regular and hidden volume headers by using different passwords. The only thing remaining is to fix the TCReadSector function: the first sector is obviously not at the same offset (in the TrueCrypt file) as the regular volume. The first sector of a regular volume is 512. The first sector for the hidden volume depends on the size of the file and the size of the hidden volume.

def TCReadSector(tc, index):
    """Read a sector from the volume."""
    assert index > 0
    tc.fileobj.seek(0, 2)
    file_len = tc.fileobj.tell()

    # The LRW functions work on blocks of length 16. Since a TrueCrypt
    # sector is 512 bytes each call to LRWMany will decrypt 32 blocks,
    # and each call to this function must therefore advance the block
    # index 32. The block index also starts at 1, not 0. index 1
    # corresponds to lrw_index 1, index 2 corresponds to lrw_index 33 etc.
    lrw_index = (index - 1) * 32 + 1 

    # For a regular (non-hidden) volume the file system starts at byte
    # 512. However for a hidden volume, the start of the file system
    # is not at byte 512. Starting from the end of the volume, namely
    # byte file_len, we subtract the hidden volume salt+header (at offset
    # 1536 from the end of the file). We then subtract the size of the
    # hidden volume.
    mod = 0
    last_sector_offset = TC_SECTOR_SIZE
    if tc.hidden_size:
        mod = file_len - tc.hidden_size - TC_HIDDEN_VOLUME_OFFSET
        # We subtract another sector from mod because the index starts
        # at 1 and not 0.
        mod -= TC_SECTOR_SIZE
        last_sector_offset = TC_SECTOR_SIZE + TC_HIDDEN_VOLUME_OFFSET
    seekto = mod + TC_SECTOR_SIZE * index

    # last_sector_offset is the beginning of the last sector relative
    # the end of the file. For a regular non-hidden volume this is simply
    # 512 bytes from the end of the file. However for hidden volumes we
    # must not read past the headers, so the last sector begins 512 bytes
    # before the header offset.
    if seekto > file_len - last_sector_offset:
        return ''

    tc.fileobj.seek(seekto)
    data = tc.fileobj.read(TC_SECTOR_SIZE)
    
    return LRWMany(tc.cipher.decrypt, tc.master_lrwkey, lrw_index, data) 
def TCSectorCount(tc):
    """How many sectors can we read with TCReadSector?"""
    volume_size = 0
    if tc.hidden_size:
        volume_size = tc.hidden_size
    else:
        tc.fileobj.seek(0, 2)
        volume_size = tc.fileobj.tell()
        # Minus the salt+header.
        volume_size -= 512
    return volume_size / TC_SECTOR_SIZE

The complete program – truecrypt.py

## truecrypt.py - partial TrueCrypt implementation in Python.
## Copyright (c) 2008 Bjorn Edstrom <be@bjrn.se>
##
## Permission is hereby granted, free of charge, to any person
## obtaining a copy of this software and associated documentation
## files (the "Software"), to deal in the Software without
## restriction, including without limitation the rights to use,
## copy, modify, merge, publish, distribute, sublicense, and/or sell
## copies of the Software, and to permit persons to whom the
## Software is furnished to do so, subject to the following
## conditions:
##
## The above copyright notice and this permission notice shall be
## included in all copies or substantial portions of the Software.
##
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
## EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
## OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
## NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
## HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
## WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
## OTHER DEALINGS IN THE SOFTWARE.
## --
## Changelog
## Jan 4 2008: Initial version. Plenty of room for improvements.

import struct
try:
    import psyco
    psyco.full()
except ImportError:
    pass

import sys
import os

from rijndael import Rijndael
from serpent import Serpent
from twofish import Twofish
from lrw import *
from keystrengthening import *

#
# Utilities.
#

import struct
import time
import binascii

def Log(message):
    print >> sys.stderr, "Progress:", message

def CRC32(data):
    """Compute CRC-32."""
    crc = binascii.crc32(data)
    # Convert from signed to unsigned word32.
    return crc % 0x100000000

def BE16(x):
    """Bytes to 16 bit big endian word."""
    return struct.unpack(">H", x)[0]

def BE32(x):
    """Bytes to 32 bit big endian word."""
    return struct.unpack(">L", x)[0]

def BE64(x):
    """Bytes to 64 bit big endian word."""
    a, b = struct.unpack(">LL", x)
    return (a<<32) | b

def Win32FileTime2UnixTime(filetime):
    """Converts a win32 FILETIME to a unix timestamp."""
    return filetime / 10000000 - 11644473600

#
# Ciphers.
#

class CipherChain:
    def __init__(self, ciphers):
        self.ciphers = [ciph() for ciph in ciphers]
    def set_key(self, keys):
        i = 0
        for cipher in self.ciphers:
            cipher.set_key(keys[i])
            i += 1
    def encrypt(self, data):
        for cipher in self.ciphers:
            data = cipher.encrypt(data)
        return data
    def decrypt(self, data):
        for cipher in reversed(self.ciphers):
            data = cipher.decrypt(data)
        return data
    def get_name(self):
        return '-'.join(reversed([cipher.get_name() for cipher in self.ciphers]))

Cascades = [
    [Rijndael],
    [Serpent],
    [Twofish],
    [Twofish, Rijndael],
    [Serpent, Twofish, Rijndael],
    [Rijndael, Serpent],
    [Rijndael, Twofish, Serpent],
    [Serpent, Twofish]
]

HMACs = [
    (HMAC_SHA1, "SHA-1"),
    (HMAC_RIPEMD160, "RIPEMD-160"),
    (HMAC_WHIRLPOOL, "Whirlpool")
]

#
# TrueCrypt.
#

TC_SECTOR_SIZE = 512
TC_HIDDEN_VOLUME_OFFSET = 1536

class TrueCryptVolume:
    """Object representing a TrueCrypt volume."""
    def __init__(self, fileobj, password, progresscallback=lambda x: None):

        self.fileobj = fileobj
        self.decrypted_header = None
        self.cipher = None
        self.master_lrwkew = None
        self.hidden_size = 0

        for volume_type in ["normal", "hidden"]:
            fileobj.seek(0)
            if volume_type == "hidden":
                fileobj.seek(-TC_HIDDEN_VOLUME_OFFSET, 2)

            progresscallback("Is this a " + volume_type + " volume?")
            
            salt = fileobj.read(64)
            header = fileobj.read(448)
            
            assert len(salt) == 64
            assert len(header) == 448

            for hmac, hmac_name in HMACs:
                # Generate the keys needed to decrypt the volume header.
                iterations = 2000
                if hmac == HMAC_WHIRLPOOL:
                    iterations = 1000

                info = ''
                if hmac_name in "RIPEMD-160 Whirlpool":
                    info = ' (this will take a while)'
                progresscallback("Trying " + hmac_name + info)
                
                header_keypool = PBKDF2(hmac, password, salt, iterations, 128)
                header_lrwkey = header_keypool[0:16]
                header_key1 = header_keypool[32:64]
                header_key2 = header_keypool[64:96]
                header_key3 = header_keypool[96:128]

                for cascade in Cascades:
                    # Try each cipher and cascades and see if we can successfully
                    # decrypt the header with it.
                    cipher = CipherChain(cascade)
                    
                    progresscallback("..." + cipher.get_name())
                    
                    cipher.set_key([header_key1, header_key2, header_key3])
                    decrypted_header = LRWMany(cipher.decrypt, header_lrwkey, 1, header)
                    if TCIsValidVolumeHeader(decrypted_header):
                        # Success.
                        self.decrypted_header = decrypted_header
                        
                        master_keypool = decrypted_header[192:]
                        master_lrwkey = master_keypool[0:16]
                        master_key1 = master_keypool[32:64]
                        master_key2 = master_keypool[64:96]
                        master_key3 = master_keypool[96:128]

                        self.master_lrwkey = master_lrwkey
                        self.cipher = cipher
                        self.cipher.set_key([master_key1, master_key2, master_key3])
                        self.hidden_size = BE64(decrypted_header[28:28+8])
                        self.format_ver = BE16(decrypted_header[4:6])

                        # We don't really need the information below but we save
                        # it so it can be displayed by print_information()
                        self.info_hash = hmac_name
                        self.info_headerlrwkey = hexdigest(header_lrwkey)
                        self.info_headerkey = hexdigest(header_keypool[32:128])
                        self.info_masterkey = hexdigest(master_keypool[32:128])

                        progresscallback("Success!")
                        return
        # Failed attempt.
        raise KeyError, "incorrect password (or not a truecrypt volume)"

    def __repr__(self):
        if not self.decrypted_header:
            return "<TrueCryptVolume>"
        return "<TrueCryptVolume %s %s>" % (self.cipher.get_name(), self.info_hash)

def TCIsValidVolumeHeader(header):
    magic = header[0:4]
    checksum = BE32(header[8:12])
    return magic == 'TRUE' and CRC32(header[192:448]) == checksum

def TCReadSector(tc, index):
    """Read a sector from the volume."""
    assert index > 0
    tc.fileobj.seek(0, 2)
    file_len = tc.fileobj.tell()

    # The LRW functions work on blocks of length 16. Since a TrueCrypt
    # sector is 512 bytes each call to LRWMany will decrypt 32 blocks,
    # and each call to this function must therefore advance the block
    # index 32. The block index also starts at 1, not 0. index 1
    # corresponds to lrw_index 1, index 2 corresponds to lrw_index 33 etc.
    lrw_index = (index - 1) * 32 + 1 # LRWSector2Index(index)

    # For a regular (non-hidden) volume the file system starts at byte
    # 512. However for a hidden volume, the start of the file system
    # is not at byte 512. Starting from the end of the volume, namely
    # byte file_len, we subtract the hidden volume salt+header (at offset
    # 1536 from the end of the file). We then subtract the size of the
    # hidden volume.
    mod = 0
    last_sector_offset = TC_SECTOR_SIZE
    if tc.hidden_size:
        mod = file_len - tc.hidden_size - TC_HIDDEN_VOLUME_OFFSET
        # We subtract another sector from mod because the index starts
        # at 1 and not 0.
        mod -= TC_SECTOR_SIZE
        last_sector_offset = TC_SECTOR_SIZE + TC_HIDDEN_VOLUME_OFFSET
    seekto = mod + TC_SECTOR_SIZE * index

    # last_sector_offset is the beginning of the last sector relative
    # the end of the file. For a regular non-hidden volume this is simply
    # 512 bytes from the end of the file. However for hidden volumes we
    # must not read past the headers, so the last sector begins 512 bytes
    # before the header offset.
    if seekto > file_len - last_sector_offset:
        return ''

    tc.fileobj.seek(seekto)
    data = tc.fileobj.read(TC_SECTOR_SIZE)
    
    return LRWMany(tc.cipher.decrypt, tc.master_lrwkey, lrw_index, data)          

def TCSectorCount(tc):
    """How many sectors can we read with TCReadSector?"""
    volume_size = 0
    if tc.hidden_size:
        volume_size = tc.hidden_size
    else:
        tc.fileobj.seek(0, 2)
        volume_size = tc.fileobj.tell()
        # Minus the salt+header.
        volume_size -= 512
    return volume_size / TC_SECTOR_SIZE

def TCPrintInformation(tc):
    if not tc.decrypted_header:
        return

    header = tc.decrypted_header
    program_ver = BE16(header[6:8])
    volume_create = Win32FileTime2UnixTime(BE64(header[12:12+8]))
    header_create = Win32FileTime2UnixTime(BE64(header[20:20+8]))

    print "="*60
    print "Raw Header"
    print "="*60
    print repr(tc.decrypted_header)
    print "="*60
    print "Parsed Header"
    print "="*60    
    print "Hash          :", tc.info_hash
    print "Cipher        :", tc.cipher.get_name()
    if tc.hidden_size:
        print "Volume Type   : Hidden"
        print "Hidden size   :", tc.hidden_size
    else:
        print "Volume Type   : Normal"
    print "Header Key    :", tc.info_headerkey
    print "Header LRW Key:", tc.info_headerlrwkey
    print "Master Key    :", tc.info_masterkey
    print "Master LRW Key:", hexdigest(tc.master_lrwkey)
    print "Format ver    :", hex(tc.format_ver)
    print "Min prog. ver :", hex(program_ver)
    print "Volume create :", time.asctime(time.localtime(volume_create))
    print "Header create :", time.asctime(time.localtime(header_create))
    print "="*60

def cmdline():
    scriptname = sys.argv[0]
    try:
        path, password, outfile = sys.argv[1:]
    except ValueError:
        print >> sys.stderr, "%s volumepath password outfile" % scriptname
        sys.exit(1)

    if os.path.exists(outfile):
        print >> sys.stderr, "outfile %s already exists. use another " \
              "filename and try again (we don't want to overwrite " \
              "files by mistake)" % outfile
        sys.exit(1)

    try:
        fileobj = file(path, "rb")
    except IOError:
        print >> sys.stderr, "file %s doesn't exist" % path
        sys.exit(1)

    try:
        tc = TrueCryptVolume(fileobj, password, Log)
    except KeyError:
        print >> sys.stderr, "incorrect password or not a TrueCrypt volume"
        fileobj.close()
        sys.exit(1)
    except KeyboardInterrupt:
        print >> sys.stderr, "aborting"
        fileobj.close()
        sys.exit(1)

    TCPrintInformation(tc)

    outfileobj = file(outfile, "ab")
    num_sectors = TCSectorCount(tc)
    num_written = 0
    try:
        for i in xrange(1, num_sectors + 1):
            if i % 100 == 0:
                Log("Decrypting sector %d of %d." % (i, num_sectors))
            outfileobj.write(TCReadSector(tc, i))
            num_written += 1
    except KeyboardInterrupt:
        print "Aborted decryption."
        pass
    outfileobj.close()
    print "Wrote %d sectors (%d bytes)." % (num_written,
                                            num_written * TC_SECTOR_SIZE)
    fileobj.close()
    sys.exit(0)

if __name__ == '__main__':
    cmdline()

Code: truecrypt.py. Requires rijndael, serpent, twofish, keystrengthening, lrw.

Conclusion

There we have it, a couple of hundred lines of code describing the essence of TrueCrypt. We have not fully implemented TrueCrypt – in fact there are some features missing such as key files not to mention the ability to read and write files to volumes directly – but we have described the most important parts of the specification. Hopefully the code and the article will be useful for some purpose.

Finally some technicalities: One: the author is not affiliated with the TrueCrypt foundation in any way. Two: see the source code for code licenses. Most of the primitives are public domain or almost public domain and most of the TrueCrypt specific code is MIT License. The article you are now reading, minus the code, is licensed under Creative Commons (Attribution-Share Alike 3.0).