blog.bjrn.se

Non-realtime publishing for censorship resistance

2020-01-04T09:57:00.000+01:00

“Because the Internet is so new, we still don't really understand what it is. We mistake it for a type of publishing or broadcasting, because that's what we're used to. So people complain that there's a lot of rubbish online” -- Douglas Adams

In this blog post I’m going to make the case that “real time communication” should not be a given requirement when designing censorship resistant services, if resiliency is a key feature. By making real-time communication optional but not required, a service can keep functioning even in the presence of interference from very resourceful attackers in control of the service implementation, or underlying infrastructure (like the Internet itself).

To make this concrete I’m going to propose a class of services I call zines. I’m going to apply the concept to three famous example of services that try to be more or less censorship resistant: WikiLeaks (a journalistic publishing platform), Silk Road (an anonymous market), and The Pirate Bay (a torrent website).

What is a service anyway?

WikiLeaks, Silk Road and The Pirate Bay are both ideas and technical implementations. WikiLeaks is the idea that a person can publish something anonymously. Silk Road is the idea that a person can buy/sell stuff without external interference. The Pirate Bay is a search index serving small pieces of data.

When people think of these three services however, they typically think of the implementation. WikiLeaks is an actual website running on a server somewhere. Silk Road was also a website, implementing eBay like functionality - with some twists - on a server running on top of the Tor network. The Pirate Bay: a website and server as well.

All of these three services are implemented this way because that is the most practical way to achieve real time communication in the usability sense of the term. As a curious citizen I can visit WikiLeaks in my web browser and immediately be served their website and content. This is taking full advantage of one of the main features of the Internet.

This is a weakness if censorship resistance is considered very important. The servers can be shut down, as happened with Silk Road and I believe The Pirate Bay as well. The WikiLeaks administrators seem to run a game of whack-a-mole where they are shut down by authorities and move servers elsewhere.

Services are out, Publishing is in

Here is a key realization: Let’s go back in time and learn from the old days of content distribution. By removing the real time communication requirement, we could get rid of all the servers and instead turn these “services” into a publishing problem. A neat thing with this kind of publishing, like me publishing this blog post, is that it is agnostic to the medium. For convenience I am publishing this blog post here on my website, but due to the nature of the “protocol” (the English language) I could also print this blog post on paper and give it to my neighbors. If I wanted to be anonymous and sneaky I could just sign it with my private key, hide the text (steganography) inside some image or music file and publish it on every single social media website on the Internet. That is fairly strong resiliency.

Contrast this with a service like Silk Road, that is so tied to the technical implementation that removal of the service infrastructure means complete service disruption. The Silk Road “protocol” was never meant to be agnostic to the platform it was running on and therefore it did not survive after shutting down the servers it was implemented on top of. This kind of centralized infrastructure is a weakness.

Zine: offline censorship resistance

Zine is an abstract framework, or way of thinking, for designing certain classes of resilient “services”. It sacrifices real time communication to gain resiliency from distribution potential.

The main idea is that you have one or more owners. Think of an owner as a public/private keypair. The owner manipulate (offline) a collection of files (a “zine”). This collection is encrypted, signed and published periodically. It doesn’t matter where they are published, could be anywhere on the Internet, in an actual magazine, or maybe on a Tor hidden service. It doesn’t matter.

The cryptographic signing here is important. It is the one proof that the zine has not been tampered with. The 3 example services give a certain degree of “identity proof” from the underlying infrastructure, for example the WikiLeaks domain name, and the TLS certificate. Because Zine is agnostic to all this, it relies on it’s own signing.

The users can perform actions on the zine by relaying (somehow) requests to the owners. The owners may then chose to apply the action and thus manipulate/update the collection, and then publish a new collection/zine for consumption by the users. In practice they will probably have a little application for manipulating the zine, like a command line tool, text editor, or something, that handles the formatting/marshalling.

The actions are highly dependent on product. WikiLeaks may have an “upload” action for example, whereas The Pirate Bay may have an “add-torrent” action for adding a torrent to the index.

An aspect of censorship resistance is anonymity. For the sake of generality, we can say that actions may in most cases by signed by the user key-pair, but in certain applications it may be okay to not do that or use throw-away key-pairs if even pseudonymity or traceability of previous actions is undesirable.

On the owner side, when they (somehow) get a request, they will probably first inspect it (including the user public key) and then they may apply the resulting mutation. The result may go into a new zine, or mutate state that is not published, but maintained. That depends on the application and “business logic”. For example, an application may be completely stateless i.e. everything except the owner private key is (encrypted) and embedded in the zine. In other cases, the owner may have an auxiliary database or similar that is used during zine generation but not published.

I believe this framework is enough to implement a plethora of different types of applications that may benefit from censorship resistance.

There are obvious drawbacks:

As can be seen, we have immediately sacrificed the usability of real-time updates, for stronger resilience.
The user will also initially (and maybe in the future) have the problem of discovery. Where is the latest collection anyway?
The distribution is left as an exercise to the reader and could be a real problem for larger data sets. Distributing smaller files should be an easier problem to solve.

There are some benefits though:

The Zine can be published anywhere and is therefor much harder to take down, be it denial of service or infrastructure takeover. When the cat is out of the bag it will live a life on it’s own, unlike services that are backed by real-time (online) databases and such.
All interactions with the Zine are offline (in the sense there are no client/server interaction) which could lead to less vulnerabilities.
The owners only have to care only about their keypair and no other sensitive things (servers etc).

Note that the owner can of course be automated, for near real time updates. Any existing service could also publish a zine as an out of band side channel, next to the real time “usable” implementation. Should the real time implementation be taken down, the zine can live on.

Example: WikiLeaks

An important feature of WikiLeaks, as with many other popular websites, is curation. WikiLeaks publish certain types of data, not any data. However, the current technical implementation of WikiLeaks have to care a lot about distribution: they publish large archives of material, searchable. This improves on usability.

To make a Zine out of the WikiLeaks idea, the owner could publish a main collection that is basically an index that points to more data/collections, that could be published anywhere. All of this is just optimizations though: “theoretically” the entirety of WikiLeaks, or certain parts of it, could be just published on any underlying platform and not their own servers.

Example actions: upload.

Example: Silk Road

I think this is my favorite example because the offline implementation is interesting. The Owner will have to periodically publish a collection. The collection will contain vetted users key pairs (buyers/sellers), a listing of items being sold, and auction state (that is, the current state of bids and bid history). To protect users from each other, parts of the collection will have to be encrypted cleverly. The owner(s) will have to maintain other offline state as well. There are a lot of devils in the details here, but of course that is also true for the online (real time) implementation.

Example actions: send-message-to-user, bid, add-listing.

Owner may have some backing infrastructure for managing transactions.

Example: The Pirate Bay

The collection will be an index of “magnet links” pointing to the BitTorrent Distributed Hash Table (so no tracker server is needed).

Example actions: add-torrent, dmca-takedown-request.

Conclusion

Designers and operators of censorship resistant services should consider if real time communication is strictly necessary. Does the benefits of a publishing approach like Zine, in their particular use case, outweigh the obvious drawbacks?

Thanks to Gunnar Kreitz for feedback on this post.

Cabinet of curiosities: A bunch of cryptographic protocol oddities

2016-08-13T17:50:00.000+02:00

In this short trivia article I will list a few unusual cryptographic design choices in some real applications/protocols. Specifically, I will focus on two types of “oddities”.

1) Unusual or weird design choices, maybe bordering voodoo/superstitions territory.

2) Choices that may have been very reasonable historically, but will be perceived as unusual for a reader in 2016 who has the benefit of hindsight.

I will not list design choices that have simply been broken, instead focusing on oddities that are maybe more interesting. Hopefully it will make you say WTF at least once. :-)

We have learned a lot over the years, and protocols designed by experts in the year 2016 are typically fairly sane, at least lacking in surprises. That was not always the case however, leading to choices that were made with a poorer understanding of cryptography engineering than what we have today.

The point of this article is not to point fingers or make fun of designs. It is simply to highlight design choices that may have made sense with the knowledge we had of crypto design at the time when the protocol was designed. Some of the choices may still be controversial.

Oddity: Unexplained “strengthenings”

Sometimes you come across a design that does some unusual construct to make the design “stronger”, without explaining why the construct is needed or how it works. This kind of cargo culting is especially common in non-cryptographers hobby designs, but sometimes it gets deployed broadly. One interesting example of this is in MS-CHAP v2.

MS-CHAP v2 is Microsoft's version of the CHAP protocol, used among others as an authentication option for PPTP. It implements a challenge-response mechanism that as a very high level works like this: The server sends the client a challenge message. The client hashes the combination of a challenge and a secret. The server verifies by doing the same (the secret is shared).

The protocol is described in RFC 2759 (so it’s quite old) and the main curiosity in this protocol is revealed in the function GenerateAuthenticationResponse. After (roughly) hashing the password/secret and the challenge, it ends with hashing in a magic 41 byte string called Magic2. So roughly it does the following, ignoring some detail.

Digest = SHA1(password || challenge || Magic2)

What, you may ask, is Magic2? It’s the following string: "Pad to make it do more than one iteration".

The reason for hashing in this string is not explained, but you can only guess that the protocol designer wanted to make the SHA function do more work, or be “more random”, by lengthening the input string enough to make it longer than 512 bits, the block size of the SHA-1 algorithm. An input longer than the block size makes SHA-1 do one more iteration in the hashing step, as explained by the string.

Oddity: Hedging against future breaks (in hindsight)

Being conservative is often good when designing cryptographic protocols. There are many cases when protocols and applications decide to use two different primitives in the same class to still offer some security in case one of the primitives is broken. That’s often a good idea, but may lead to surprises down the road for people who have the benefit of hindsight. One such surprise can be found in TLS version 1.1 and older, for people who read the spec today (or in recent years).

When setting up a TLS connection the client and the server need to construct keying material that is used as keys for various cryptographic primitives. A pseudo random function is used to construct new pseudo random strings from an original pseudo-random string. In TLSv1.2, this is simply done using SHA-256 in a specified way.

In TLSv1.1 and older, however, there were concerns which hash algorithm to use. The outcome? A function that uses both SHA-1 and MD5, XOR:ed together. (For completeness: there is another part of the TLS 1.1 protocol that instead use a MD5 and SHA-1 hash concatenated together).

The original rationale was that, if one of the hash functions is broken, there’s a “hedge” by still having hopefully one that is not broken. Is this reasonable? Most certainly. The decision to change the function to use SHA-256 only, in the design of TLSv1.2, was controversial. If you are interested in the discussion, see for example the discussion “[TLS] PRF in TLS 1.2” from 2006, on the TLS IETF mailing list.

Oddity: Protocols designed in the 90:s have weaknesses due to iterative improvements on a legacy base

Highlighting problems with OpenPGP is maybe beating a dead horse, but it highlights well the generic problem of bolting on security on a 90:s era protocol.

The current version of the OpenPGP specification, as implemented in for example GPG, is mostly backwards compatible with older versions of the specification, which were designed in the olden days. This leads to a few problems with OpenPGP that are well known. I will list two of them.

#1. If you sign, but not encrypt, a message, then the signature does not include some of the metadata in the OpenPGP message. For example, there’s a metadata field that includes the proposed file name of the signed data. This field name can be tampered with and the signature check will still hold.



$ echo "hello world" > helloworld.txt

$ gpg --sign helloworld.txt

$ sed 's/helloworld.txt/tamperedxx.txt/' helloworld.txt.gpg >

helloworld.txt.gpg.tampered

$ gpg --verify helloworld.txt.gpg.tampered

... gpg: Good signature from …

#2. In the original Pretty Good Privacy specification there were no support for integrity protecting messages. Nowadays it’s standard practice to not touch (decrypt) a message before the MAC has been verified. That was not the case in the 90:s. When the OpenPGP RFC 4880 spec was written, integrity checking was “bolted on” the old design. It works like this:

You hash the plaintext with SHA-1. Then you append the SHA-1 hash to the end of the plaintext. Then you encrypt the combination of the plaintext and the hash. So it’s “MAC”-then-Encrypt.

The RFC is pretty apologetic about this design choice. I quote RFC 4880 section 5.13 (here “MDC” means Modification Detection Code):



      It is a limitation of CFB encryption that damage to the ciphertext

      will corrupt the affected cipher blocks and the block following.

      Additionally, if data is removed from the end of a CFB-encrypted

      block, that removal is undetectable.  (Note also that CBC mode has

      a similar limitation, but data removed from the front of the block

      is undetectable.)



      The obvious way to protect or authenticate an encrypted block is

      to digitally sign it.  However, many people do not wish to

      habitually sign data, for a large number of reasons beyond the

      scope of this document.  Suffice it to say that many people

      consider properties such as deniability to be as valuable as

      integrity.



      OpenPGP addresses this desire to have more security than raw

      encryption and yet preserve deniability with the MDC system.  An

      MDC is intentionally not a MAC.  Its name was not selected by

      accident.  It is analogous to a checksum.



      Despite the fact that it is a relatively modest system, it has

      proved itself in the real world.  It is an effective defense to

      several attacks that have surfaced since it has been created.  It

      has met its modest goals admirably.



      Consequently, because it is a modest security system, it has

      modest requirements on the hash function(s) it employs.

Oddity: Too many knobs leads to difficulty reasoning about security

This entry is maybe less exciting, and more problematic, than the other ones, but are still worth discussing. A design feature of IPSEC is that it allows a lot of flexibility. Maybe a bit too much flexibility. With great power comes great responsibility, and it’s quite easy to configure IPSEC in a potentially problematic way.

To recap, IPSEC supports a few different protocols, each with their own cryptographic goal. One of the protocols is called ESP, Encapsulated Security Protocol. It can provide confidentiality and (optional) data origin authentication for the encapsulated packet. Another protocol is called AH, for Authentication Header. It provides only data origin authentication for the payload.

The flexibility gotchas of IPSEC are many. The administrator can configure his deployment to use either AH or ESP for data origin authentication. You could also skip authentication altogether, as the ESP packet has support for not using this. To further complicate things, AH and ESP could be applied in any order: first AH then ESP. Or first ESP then AH. To even further complicate things, IPSEC then allows for both encrypt-then-MAC and MAC-then-encrypt configurations.

This does not even mention the various cryptographic primitives each of the protocol supports, that give even more choice during setup.

IPSEC is, arguable, an instance of the problem of giving too many knobs to the user, and very little guidelines how to not shoot yourself in the foot. Fortunately as of 2016 there are lots of advice on how to configure IPSEC by the vendor implementations and other documentation (see for example RFC 7321 from 2014), but it can be argued it’s still problematic that the protocol offer so many options. Newer protocols typically go the other way: allow as little room for screwing up as possible.

(Minor) Oddity: Status quo vs personal choice

Daniel J Bernstein is arguably one of the most influential cryptographers (and cryptographic engineers) working for the internet today. He also seems to be the only one who has a soft spot on using little endian representations of integers in his cryptographic designs. This in a world that is largely dominated by big endian protocols.

Is this a security issue? Of course not. And it may or may not lead to some performance improvements in some cases. It does, however, lead to to less unified designs, and maybe some bikesheds.

Conclusion

In this article I’ve shared a few unusual creations. If you know others, please share your favorites.

I challenge you to compute this: Specializing BREACH and Cross-Site Search attacks to steal private keywords

2015-12-23T14:19:00.000+01:00

In the real world there are many cases when we are more interested in how well a system protects a few specific keywords, rather than how well it can secure data “in a broader sense”. If an attacker can learn those keywords (say medical history/diagnosis), then the system’s security fails from the perspective of the user.

In this article I will try to raise awareness of two side channel attacks that are often practical in stealing such strings. Specifically, I want to emphasize how these two known attacks can be used against websites to exfiltrate private information about a user by associating the protected data with a set of sensitive strings, such as from a domain specific dictionary.

The question that I want you to keep in the back of your mind when reading this article is: As an engineer, what kind of data am I actually protecting? Is the data I’m protecting a sensitive set of words from a small corpus, and how do I deal with the fact that the data I’m protecting is “low entropy”?

As an attacker, can I learn something sensitive even without decrypting your TLS channel, circumventing the same-origin policy, or by taking over your account? Everyone likes to steal session tokens, but do I really have to, to get the user data? Can’t I just throw my sensitive keyword at the target and see if it sticks?

A Contrived Example To Get Started: Payload length

Say that your doctor office launch a website, https://getmydiagnosis.foo (to the horror of the security and privacy community). After logging in you are presented with a /diagnosis endpoint which, due to this being an early beta, will print one of the following two strings to the user:

“You are healthy!”
“You have cancer! Here’s some information about that…”

What do I do as an attacker sniffing your connection? I look at the TLS length fields or TCP packet lengths. The data is very well encrypted, but because the content that is encrypted is only one of two options I can use some other information (packet lengths) to figure out what is encrypted. I have learned the plaintext even without decrypting it.

The important point in this silly example is as follows: what we really want to protect here is the keywords “healthy” and “cancer” (or even just the binary predicate of the presence of one of those). The description text is often not really important at all from a privacy perspective.

About BREACH: A compression side channel

Let us take a less contrived example. Go to any popular website and look at the HTTP traffic. You are likely to find that the payload, be it HTML or JSON or XML, is compressed.

Content-Encoding: gzip

The whole idea of compression is to make the output data “shorter” in a sense. The string "hello world, the world says hello!" will intuitively compress better than a string of random characters, because of the re-use of the substrings “hello” and “world”. If I as an attacker can inject a string that will be reflected in a payload that is then compressed, then I can exfiltrate some private data by trial, error and observation.

Let us look at the doctor example again. This time the website is out of beta and is capable of listing all possible health issues, not just two options. Once logged in you are taken to the following page that unfortunately tells you that you have cancer and that you should contact your doctor's office for more information.

https://getmydiagnosis.foo/diagnosis?office=Stockholm

.. Cancer .. Stockholm

The payload is compressed and sent over TLS. For the sake of this example the query parameter “office” is reflected in the payload, whereas the rest of the data is not (say it’s pulled from a database). If I can make the target I’m attacking visit the specially crafted page:

https://getmydiagnosis.foo/diagnosis?office=Cancer

.. Cancer .. Cancer

Then the payload will compress slightly better if the diagnosis (the static content) is Cancer compared to strings not in the static content. Armed with a dictionary of diseases, I want to force the target to visit all these pages and record how well the data compress, by looking at packet lengths:

https://getmydiagnosis.foo/diagnosis?office=Flu

.. Cancer .. Flu

https://getmydiagnosis.foo/diagnosis?office=Mononucleosis

.. Cancer .. Mononucleosis

https://getmydiagnosis.foo/diagnosis?office=Cancer

.. Cancer .. Cancer

Doing this, I can maybe learn your diagnosis.

How do I make the target visit those specially crafted pages? For one, if you can inject data in a WiFi network, just look after a non-https traffic (the target may have many tabs open for example) and inject something like

<script src="https://...">

Which often works, especially if the website lack CSRF protection. You can also trick the user into visiting your website that does the above.

This class of attacks works better the less noisy the target endpoint is. JSON xhr endpoints that reflect input via a GET query parameter are often very nice to attack as they are much less noisy than if data is embedded directly into a large HTML payload.

Let's take a step back here and think about what the above attack means:

If I as an attacker have domain knowledge about what the plaintext can be, and the plaintext can be from a limited set of strings, I can quite easily (in some cases) learn your information without decrypting the TLS channel.

About Cross Site Search attacks: A timing side channel

A consequence of the BREACH attack is that it only works in practice if I can record packet lengths. That is how I learn if the injected data compress well or not.

A similar side channel attack that works in many cases is to do a cross site request to instruct a browser to make search queries on the target website, and record the timing. The purpose is, of course, to to find keywords for a user when the target website has a per-user search function. Results with words that are in the search index will often return with different timings than when the words are not indexed, depending on how the search result is constructed. This can be due to server-side processing, caching, database lookups done on the search results etc.

As a concrete example, if I search in GMail for a random string, that will typically return in about 0.3 seconds. If however I search for a string that is in my index, it will take significantly longer (0.6 seconds or more) due to GMail fetching the individual results server side and constructing the response.

Quite often, a search function on a website is non-mutating GET request, so it may not have CSRF protection.

Going back to the doctor website example, say that I can search my journal for medical history. Armed with the same dictionary as I used in the BREACH attack, I just construct a special website that asynchronously visits the below endpoints, and records the timing it takes for the page to load.

https://getmydiagnosis.foo/searchmedicalhistory?query=...

I no longer need to be in the middle to learn interesting data, I just have to trick the target into visiting my attack website, and record how long the “challenge” searches take compared to a known hit or miss. Often a fairly simple statistical model is needed to learn the search output, depending on how big the search space you are interested in is.

This attack can also be fairly advanced if done correctly, even constructing whole keywords from parts. Amir Herzberg and Nethanel Gelernter [1] successfully used this timing attack against GMail to figure out the name of the account holder, by a special divide and conquer search approach that is faster than just trying names in a dictionary one by one. Their paper offer many super cool techniques to make this kind of attack very practical.

I won’t spoil the paper for you but I just want to mention one technique that I believe is awesome: By short circuiting challenge search queries, you can often trick a search engine to perform some very time consuming work if the first search keyword is not found. That will aid your data gathering.

The concern with XS (Cross Search) is the same as for BREACH: Relatively few queries are needed to learn the presence of a single keyword from a dictionary (more advanced searches may require more queries and a better model).

Sensitive Data Everywhere!

There are many places on the Internet where the content being protected is from a small set of strings, and those places can sometimes be attacked using these techniques, with a relatively small set of queries.

Before writing this article I, by cheating a little bit, used the compression attack against Google Location History. It reflects query parameters in the payload and it does contain the countries/places you have visited, which lends itself nice to this kind of “dictionary” attack. Note that in practice this is difficult to do against Google Location History because the compressed payload is very noisy, so both myself and Google Security Team consider this attack against them to be more on the academic than practical side. If Location History was less noisy, the attack may have worked in practice.

Of course, I am fairly certain that the general class of attack can be successfully mounted against other targets. Consider for example:

Medical data requires a small dictionary of diseases.
Search history requires a small dictionary of sensitive keywords that are interesting to the attacker.
Communication/chat data requires a small dictionary of common first/last names to enumerate contact information.
Location history data requires a small dictionary of the place names in the world (countries, cities etc).
Financial data requires a small dictionary of, for example, names (money transfer recipients), company names (stock investments), numbers (credit card numbers, salary data) etc.
… and others. A particularly scary case is if the information you are after is a simple predicate, or a set of very small options.

If you can exploit search functionality, then that is typically a very viable approach as you don’t need to be in the middle of the TLS connection. If not, be the man in the middle and look for a compressed endpoint, with a normally small/clean output, where you can inject the challenge string you are looking for.

Protective Measures

Search fields can be enhanced with CSRF protection. There may be other places that are “search like” that will also leak timing data. It can be fun and terrifying to ask what other GET queries may leak significant timing data.

BREACH can be avoided by not compressing sensitive output. Here the best bang for the buck would be to disable compression on endpoints that reflect query parameters or (especially) have a small output size (because it’s typically less noisy and thus easier to compromise than the larger payloads).

Rate limiting can sometimes help, if the search space is large enough that a significant number of queries is required to exhaust the dictionary. Of course, even a strict rate limiting can not protect content where the set of possibilities is very small.

A noisy output can help against BREACH by obscurity, by design, or by accident. A very ugly workaround that may increase the cost of the attack in some cases could be to introduce a random byte string in the payload that has a random length. Invoking information theory the fact that the string is random makes it generally not compress, so by also making the “token” random length will result in the output getting a jittered length as well. This will often not work though, for two reasons. First of all, the compression algorithm may break this band-aid in various ways; window sizes/lookback and compression contexts may make your random string not affect the compression. Secondly, it may be possible to differentiate between a hit (that compress) and a miss anyway, by doing enough requests.

Conclusion

Hopefully this article will raise awareness of these side channel attacks and how they can be of particular concern to systems that are meant to protect a small set of keywords.

In general, a useful question to ask when designing secure systems is this: What is the most important data I am protecting, and how large is the domain?

Thanks to my friends Gunnar Kreitz, Nenad Stojanovski and Alex Rad for valued feedback on this post. Any mistakes are solely my own.

[1] http://dl.acm.org/citation.cfm?id=2813688

Let’s construct an elliptic curve: Introducing Crackpot2065

2015-07-07T19:19:00.000+02:00

This article is meant to demystify some aspects of elliptic curve cryptography. The target audience is people who already have a basic understanding of applied cryptography but would like to close some gaps in knowledge when it comes to elliptic curves in general, and elliptic curve domain parameters in particular.

The first half of the article will lay the foundation on how ECC works, with examples from the “sage” mathematical software system. It will cover elliptic curves, curve equations, finite fields, addition/multiplication within the system, the Elliptic Curve Discrete Logarithm Problem (ECDLP), the group laws and the meaning of orders and subgroup orders. It will briefly mention some systems using the curves, such as ECDH and ECDSA, but the main focus is on the curves themselves and their properties.

The second half will focus on how to create a curve from scratch, suitable for cryptographic use. I will introduce a curve I call Crackpot2065 which is “safe” according to one criteria. I do not advise anyone to use the curve I’ve created for real purposes as there are many more well studied curves that have better security guarantees and implementations. I am just doing this for educational/expository reasons and that because I think it’s fun. With that said...

Part 1: How Does Elliptic Curve Cryptography Work?

At a high level, Elliptic Curve Cryptography can be split in two broad areas. The first broad area concerns the mathematical “environment” in which we work, that is the elliptic curves themselves. The second broad area concerns the algorithms and systems using these structures to perform some interesting cryptographic operation, for example signing or key agreement. To namedrop some terms that you may have heard before: The first broad area is about structures such as the elliptic curve Curve25519 or NIST P-256. The second broad area is about algorithms such as ECDH and ECDSA.

Informally we can describe the “environment” as follows: ECC uses a mathematical structure called an elliptic curve over a finite field. An elliptic curve is a set of (x, y) points satisfying an equation that typically looks something like y^2 = x^3 + ax + b, where a and b are curve parameters that have constraints. The finite field is often the field of integers mod a prime number. That means that the set of points on the curve are also on integer coordinates. It is possible to define addition, and by extension multiplication, on these elliptic curve points. ECC works because if Alice has a secret integer s and a public key point s*P, then Eve cannot easily find s from the published point s*P. This is called the Elliptic Curve Discrete Logarithm Problem and it is believed that this is a very difficult problem to solve. The ECDLP problem form the basis of the security of the algorithms.

Let us understand the above paragraph by some examples. The examples below will use the computer algebra system Sage:

sage: ec = EllipticCurve(GF(2**255-19), [0,486662,0,1,0])
sage: ec
Elliptic Curve defined by y^2 = x^3 + 486662*x^2 + x over Finite Field of size 57896044618658097711785492504343953926634992332820282019728792003956564819949

First of all we have (as seen above) created an elliptic curve over a finite field, called “ec”. The number 2**255-19 is a big prime number. The equation is an elliptic curve where the equation is given on a Montgomery form. The three equation forms that we will discuss in this article are:

The short Weierstrass form: y^2 = x^3 + ax + b, where 4a^3+27b^2 is nonzero in GF(p).

The Montgomery form By^2 = x^3 + Ax^2 + x, where B(A^2-4) is nonzero in GF(p).

The Edwards form x^2 + y^2 = 1 + dx^2y^2, where d(1-d) is nonzero in GF(p).

All of the above equations are elliptic curves. It is possible to convert a curve written on one form to the other. Normally in the cryptographic literature the most common way to express an elliptic curve is on the short Weierstrass form. Many modern curves are using the other forms for performance reasons.

sage: Point = ec([9, 14781619447589544791020593568409986887264606134616475288964881837755586237401])
sage: Point
(9 : 14781619447589544791020593568409986887264606134616475288964881837755586237401 : 1)

Next we create a point on the curve on (x, y) coordinates. We have chosen a specific point with x coordinate 9. Using the (x, y)-coordinates directly in the obvious way are called “affine coordinates”. For performance reasons it is often desirable to use other coordinate systems instead. On the “projective coordinate system” the curve points are instead written as a 3-tuple (X : Y : Z) where x=X/Z and y=Y/Z which means that (x, y) = (x : y : 1). These are just two examples of coordinate systems that can be used when computing on elliptic curves.

sage: 80 * Point
(23817343013128792117854356851130396784845341481516229275406643496458380679609 : 35116296009168191515727253524688666433668212002350538291544025151813503173711 : 1)
sage: Point + Point
(14847277145635483483963372537557091634710985132825781088887140890597596352251 : 8914613091229147831277935472048643066880067899251840418855181793938505594211 : 1)

It is possible to do multiplication and addition on the points, as shown above, which will be discussed soon. What is not shown above is that elliptic curves often have a special point called the “point at infinity”. When dealing with curves on Montgomery or short Weierstrass form, the point at infinity exists. Edwards curves do not have a point at infinity.

What all curves have though, regardless of form, is a neutral element. Addition over the curve forms an abelian group. Consider the addition operation on the curve, then the laws for the abelian group hold (shamelessly copied from Wikipedia):

Closure: For all a, b in A, the result of the operation a • b is also in A.
Associativity: For all a, b and c in A, the equation (a • b) • c = a • (b • c) holds.
Identity element: There exists an element e in A, such that for all elements a in A, the equation e • a = a • e = a holds.
Inverse element: For each a in A, there exists an element b in A such that a • b = b • a = e, where e is the identity element.
Commutativity: For all a, b in A, a • b = b • a.

Lets have a look at this.

sage: ec(0) # The point at infinity is the neutral point for our curve
(0 : 1 : 0)
sage: Point + ec(0) == Point # Identity
True
sage: some_point = ec.lift_x(4)
sage: Point + some_point == some_point + Point # Commutativity
True

sage: -Point + Point # Additive inverse
(0 : 1 : 0)

sage: some_point_2 = ec.lift_x(1)

sage: Point + (some_point + some_point_2) == (Point + some_point) + some_point_2 # Associativity
True

Multiplication is defined as repeated addition. That is, 3*P = P+P+P. Multiplication can be implemented programmatically using repeated doubling and addition using an algorithm called double-and-add, or, more preferably, a similar approach called a Montgomery ladder that is similar but can be made to run in constant time instead, preventing timing attacks leaking information about the scalar (which is often the secret key).

sage: 3*Point == Point+Point+Point
True

As mentioned briefly above, the two primitive algorithms that must be implemented in an ECC system to get multiplication are double(P) and add(P, Q). The addition function often has the constraint that P != Q, which is why the specialization of addition, double(P), exists instead of just doing add(P, P) directly. If you go out of the way to try to derive an addition formula from the curve equation in short Weierstrass or Montgomery form, you will find that there can be a special “division by zero” step in the add(P, P) case. This is why most treaties on ECC always include algorithms for both add and double, to avoid making add(P, Q) riddled with special cases (the naive add(P, Q) for Short Weierstrass curves has many special cases as it is).

Exactly what is the meaning of addition on an elliptic curve, and how are the algorithms for add(P, Q) and double(P) implemented? Unlike pretty much all other ECC introduction articles on the Internet, I’m gonna leave this outside the scope of the document for the simple reason that I believe it will make the topic more confusing to understand. I can recommend Wikipedia for pointers and cool graphical illustrations. For the remainder of this article it’s enough to trust that addition follow the group laws above.

Elliptic curves on finite fields will by definition have a finite number of points. We call this the order of the curve. Counting the number of points on the curve can be done somewhat efficiently using somewhat complicated algorithms called Schoof’s algorithm or the more efficient improved algorithm Schoof-Elkies-Atkin, aka SEA.

sage: ec.order() # the number of valid points on the curve
57896044618658097711785492504343953926856930875039260848015607506283634007912

Consider an arbitrary point on the curve, for example the point “Point” in the sage example above. Repeated addition using this point forms a subgroup on it’s own: addition is cyclic in the this group which means that if you just add a point to itself (that is 0P, 1P, 2P, 3P…) you will eventually end up at the identity element 0P=0 where you started. We say that the point Point generates a cyclic subgroup using the point as the generator. This subgroup has an order on it’s own (the order in this case is the smallest n>0 such that nP=0):

sage: Point.order()
7237005577332262213973186563042994240857116359379907606001950938285454250989

sage: Point.order() * Point # order*P == 0*P
(0 : 1 : 0)

Due to Lagranges Theorem of group theory, the order of the subgroup will divide the order of the group.

sage: ec.order() / Point.order()
8

The above number is called the cofactor.

In Elliptic Curve Cryptography, the “domain parameters” of the curve include a base point which generates a subgroup of a large order. It is within this subgroup that we perform our curve operations with the assumption that the reversal of the public key s*P is a hard problem. The domain parameters must include a base point as it’s required to agree upon which subgroup to work in prior to perform cryptographic operations.

(Briefly) Introducing Curve25519 and ECDH

The elliptic curve given above, that is the equation y^2 = x^3 + 486662*x^2 + x over Finite Field GF(2**255-19) together with the base point (9, <large number>) gives the domain parameters for an elliptic curve called Curve25519, constructed by famous crypto guy Daniel J Bernstein. Curve25519 defines a public key as the x-coordinate of the point s*P where s is the secret key and P is the base point. The secret key is a random 255 bit number with the constraints that the number is a multiple of the cofactor (to prevent leaking a few bits of info during a certain class of attack called a small subgroup attack). The secret key is also constructed such that the highest bit is always set. This together with the Montgomery ladder ensure that no timing information will leak when using the secret key.

Curve25519 is not a general purpose curve in the sense that you can just take the curve and apply it to any algorithm. It is specifically constructed to be used with the Elliptic Curve Diffie Hellman algorithm. Some other curves (with “curve” here I mean the domain parameters including the base point, not just the equation) are more general purpose and can be used for other algorithms as well, such as ECDSA or others. You cannot take djb:s definition of Curve25519 as is and apply it to ECDSA because those algorithms require both (x, y) coordinate to work, and Curve25519 is only defined for the x-coordinate. y can be computed from x using the curve equation however this will result in two options due to the equations including a square root: either you will get the point A, or the points inverse -A. Curve25519 does not define which point to use in that situation. djb:s similar curve used for signing, Ed25519, does define which point to use in that case, and is as such more generic.

ECDH is the elliptic curve version of the Diffie Hellman key agreement. Alice and Bob would like to establish a shared secret. Alice has a secret key a and a public key A. Bob has a secret key b and a public key B. They both know each other public key. Alice computes aB and Bob computes bA. They use the x coordinate of the result as a shared secret. Because of the group laws, the result is the same (A is a*P and B is b*P so it follows that b*a*P = a*b*P).

sage: aliceSecretKey = 2**254 + 8 * randint(1, 2**251-1)
sage: A = aliceSecretKey * Point
sage: bobSecretKey = 2**254 + 8 * randint(1, 2**251-1)
sage: B = bobSecretKey * Point

sage: (aliceSecretKey * B).xy()[0]
16006640369170327653803989998811672408911551588929921312340351204923053532044
sage: (bobSecretKey * A).xy()[0]
16006640369170327653803989998811672408911551588929921312340351204923053532044

The observing attacker cannot recover the secret without solving the ECDLP, so ECDH is safe under the assumption that ECDLP is hard. ECDH is typically the least complicated Elliptic Curve Cryptography schema to understand. Other algorithms includes ECDSA which is a variant of DSA and it’s used for signing. Daniel J Bernstein has also created an algorithm called EdDSA, which is also an algorithm for cryptographic signatures with some other properties than ECDSA. It’s a neat algorithm where signatures are deterministic and they do not have the same catastrophic failure of ECDSA in case random numbers (ECDSA, as DSA, requires a random number “k” for signatures) are re-used by accident, which is what broke the Playstation protection a few years back.

This concludes our short overview of Elliptic Curve Cryptography fundamentals. The rest of this article will describe how to generate your own domain parameters, that is the prime for the finite field, the curve equation, and the base point.

Part 2: Let us create our own domain parameters!

For reasons that I have trouble to explain to my friends I got the strange idea to construct my own elliptic curve. That is I wanted to find a suitable prime for the finite field, a suitable equation and a suitable base point. Thankfully the SafeCurves project define a plethora of gotchas and requirements for a curve to be considered “safe”. I figured a good goal would be to create a curve that meets the SafeCurves requirements.

Step 1: Let’s choose a prime number

Because I have limited computational resources and limited patience, I quickly redefined my goal to find a curve that meets the requirements just enough. The smaller the numbers are, the faster I can find them with my 2009-era budget workstation. One of the main computational definitions of security in the SafeCurve criteria is that the work required to solve ECDLP (using Rho’s method) should be at least 2**100 operations. That means that the subgroup order O should be roughly 0.886*sqrt(O) > 2**100. Armed with this knowledge I set out to find good prime numbers around 2**200.

For performance reasons, most curves in practice have prime numbers that are very close to a power of 2. This means that the field operations, especially the modular step, can be implemented quickly using bit hacks. I enumerated all the prime numbers of the form 2**e-c where c is as small as possible and found some possibilities:

2**200 - 75
2**201 - 55
2**202 - 183
2**203 - 159
2**204 - 167
2**205 - 81
2**206 - 5
2**207 - 91
2**208 - 299
2**209 - 33

Using common reasoning I figured that the first few would be too small. So I settled for 2**206 - 5 just because the constant was the smallest which was aesthetically pleasing to me. It may be possible that the 204 or 203 one also meet the Rho requirement, but I made my choice.

It is highly desirable to pick prime numbers of the form p == 3 (mod 4) or p == 5 (mod 8) because in those cases it’s efficient to compute the square root in the finite field. This is beneficial for point compression: Given the point (x, y) you can typically “compress” that to just one of the numbers: the “receiver” can compute it on his side of the wire: for example given x you can compute y. Doing this requires the square root calculation.

The prime number I chose happens to be 3 (mod 4) which lends to quick square rooting.

Step 2: Let’s find a curve equation

Okay, now it gets a bit more complicated. What we want to do now is to, given the finite field we’ve chosen, find a curve equation that will given a base point in the curve generate a subgroup of very large order.

To make this easier, I decided to go with an Edwards curve. Edwards curves are parametrized on “d” only. By brute forcing all “d” I tried to find a curve that has a curve order that is a very large prime factor. That means I can later choose a base point that will (due to Lagranges theorem) generate a subgroup of the large prime factor order.

The pseudo code for sage would look something like this:

for all d = 2, 3, 4, …

convert “d” to a curve on short weierstrass or montgomery form (so we can use the EllipticCurve constructor)

find the order of the group using Schoof or SEA

factor the number above

pick a d such as the factors above contain one very large prime factor and some very small ones

The SafeCurves criteria additionally contains a requirement that the “twist” of the curve - which is to say another elliptic curve that is related to the one at hand - should have a large order as well. This is outside the scope of this article, but can be checked in a similar fashion. In fact, the SafeCurves criteria contain many other requirements that I’ve decided not to cover in this article at this point.

After around 8-12 hours on my slow computer I found the following:

d = 15688 gives

curve order = 2^2 * 25711008708143844408671393477455870117170824866529792321311457

twist order = 2^2 * 25711008708143844408671393477461333163539670934519578408332573

Alright! Those orders give a Rho cost of approximately 2**101.8 which is (slightly) greater than the required 2**100. The cofactor is a multiple of 4 which is another of the SafeCurves requirements, which is good for our purpose.

Step 3: Let’s find a base point

Given the curve x^2 + y^2 = 1 + 15688x^2y^2 over GF(2**206-5) we want to find a base point with the order given above, that is 25711008708143844408671393477455870117170824866529792321311457. Again remember that Lagranges theorem states that the order of the subgroup will be a divisor of the parent group order. So we:

let L = desired order 25711008708143844408671393477455870117170824866529792321311457

let Possibilities = {divisors of curve order}

for all y = 2, 3, 4, …

find point P = (x, y) on the curve, P != neutral

check that L*P = neutral and that a L is the smallest integer in Possibilities where L*P = neutral

Fortunately the search was pretty quick because the first y I tried, y=2, happened to be one that had the desired order.

That gives us the base point

(21562982200376473978929601187519256941293801462850163935410726, 2). Actually there is another choice: the inverse of this point. But we go for this one because it has the smallest x-coordinate of the two choices. A common convention is to define the “negative” point to be the one with the larger integer, similar to the convention with signed integers having the high bit set.

Step 4: Let’s sanity check against SafeCurves

The SafeCurves project actually contains a verification script that will try all the criterias and output the result. Lets try that on the new curve (before we do this we have to find a point generating the entire curve, similar to above, and also we have to supply lots of prime numbers so the SafeCurves program doesn’t have to do the factoring itself. I used msieve to compute the factorizations).

$ sage verify.sage data/curves/crackpot2065/
$ cat data/curves/crackpot2065/verify-safecurve
True

Of course, just because this curve meets the requirements doesn’t mean it’s safe: that requires that the implementation of the curve is actually safe too, and that it’s used sanely with the algorithm (ECDH, ECDSA, EdDSA etc.). And just because the curve is small doesn’t mean it’s fast either. In practice, if you want to use a “small” curve go with Curve25519 or Ed25519, depending on use case, or use something such as NaCL or libsodium.

Step 5: Now what?

The above description barely scratches the surface of finding good elliptic curve domain parameters. Even assuming the parameters are safe in a sense, there are still problems that need to be solved in order to use a curve in practice. One big issue would be to implement the curve in a safe and high performing way. Another big issue would be to use the curve in a correct way with the available algorithms. Is it tempting to look at the curves separate from the algorithms because that will give a warm fuzzy feeling for software engineers who like composability. However, there’s a danger in just taking a curve and “plugging it in” in any arbitrary algorithm. That’s a part that requires some knowing what you are doing.

With that said, in practice, it is often possible to compose a curve and an algorithm and end up at a “system” (i.e. a curve and an algorithm together in a specified way) that actually works. This is the case with some of the NIST families of curves and ECDSA. In other cases more info and care needs to be taken, such as with Curve25519, which is really the “system” of the curve defined in the Curve25519 paper, together with the ECDH algorithm.

Can my Crackpot2065 curve be used in practice? There are two sides to this. One the negative side, using this curve would require more work, such as defining the secret key space (which should probably be a multiple of 4 with a high bit set), and a good implementation, and some clear thought on how to use it with the algorithms without doing something moronic. On the positive side, if everyone used the same curves then an attacker with lots of resources (or at least the theoretical attacker) would be inclined to study or do pre-computations on those curves, which would be an argument for a more diverse set of domain parameters than are commonly used today.

As a fun thought experiment, a friend of mine mentioned a scenario where curve parameters can be generated so quickly that two parties can compute them together before commencing with the rest of the protocol in question. This is of course not very practical, but imagine a situation where generating domain parameters can be done very quickly, instead of requiring hours of computations.

Conclusion

My goal with this article was to demystify the elliptic curve domain parameters. By choosing how one might go about and pick those, I hope that this has cleared up at least some confusion on the topic. I have basically written the article I would have liked to read myself before I started this project. That would have saved me about of week of work.

If you are more interested in domain parameters, please check out the excellent SafeCurves project which includes lots of good information and references on parameter selection.

… Having said that, I may have lied a bit when I told you that my curve is “safe” according to the SafeCurves criteria. Because the verify.sage script has an input called “rigid”, which is up to the person running the verification script to decide. I decided to call Crackpot2065 “fully rigid”, according to the SafeCurves definition: “Fully rigid: The curve-generation process is completely explained.” This blog post is hopefully rigid enough. :-)

Have fun and don’t shoot yourself in the foot with your new crypto gun.

Obligatory caveat: The author of this article, Björn, is not a real cryptographer. There may be mistakes in this article. I apologize for any inconvenience caused by this. If you find a problem or have questions, just email me at be@bjrn.se.

JavaScript Cryptography Considered Somewhat Useful In Some Special Cases

2014-08-03T13:48:00.002+02:00

Over the years there have been a few articles published about the “dangers” of JavaScript cryptography. In this article I am going to make the claim that this is often misunderstood, and that there are in fact some specialized uses of JS crypto. Hopefully it will be interesting for someone.

JavaScript - the language

This article is mostly about JavaScript cryptography delivered by web services, but before we go into this let’s just briefly discuss the language itself, for example as it pertains to NodeJS and similar.

Claim: In general, JavaScript is an unsuited language to implement interactive, real-time cryptographic protocols.

The reason for this is the difficulty in implementing cryptographic primitives in an interpretive language that does not leak information, for example subtle timing differences. In a compiled language such as C, you can more easily (though it’s still difficult) write constant time algorithms without having to worry about the interpreter doing optimizations making it suspicable to timing attacks (or other side channel attacks for that matter).

I’d personally be on my guard if I saw something like “pure JavaScript server implementation of TLS”, or any kind of JavaScript implementation that allows an untrusted party real-time interaction with the implementation (think key exchanges and similar).

Caveat to above claim: If the cryptographic primitives are implemented in a low level language but exposed to JavaScript, such as the new W3 Web Cryptography API proposal, then there may be possible to implement interactive protocols in JavaScript.

Claim: There’s nothing inherently wrong with using JavaScript for non-interactive, non-real-time cryptography - outside of the web browser, with normal caveats about cryptography applied.

Assuming you have code that you trust that does some “offline” cryptographic operation - the fact that it’s written in JavaScript doesn’t make it less secure than if it was written in another language. The “normal caveat” in the claim is that the code should of course be written by a security expert.

A good example of this is the new Google project “End-to-End”, which is an OpenPGP implementation in JavaScript. This is a web browser extension that happens to use JavaScript for cryptography, and it is not timing sensitive.

JavaScript in the web browser

First of all I am going to make the same claim as everyone else:

Claim: In general, JavaScript Cryptography in the web browser is a fundamentally bad idea...

(...Hold on to your hats, because here is the main point of this article...)

Claim: … However, there are some potential uses cases for non-interactive, non-real-time JavaScript cryptography if the JavaScript cryptography as well as the entire website is delivered over a secure connection, such as TLS, and the information that the JavaScript cryptography is meant to protect is not more sensitive than would otherwise be acceptable to give to the server.

There we have it, the main point of this article.

Discussion

The fallacy that most “don’t use JavaScript cryptography” articles make is the assumption that a web site would use JavaScript cryptography instead of TLS. The second fallacy is that if the website uses TLS, then there is no point in using JavaScript cryptography. Lets have a look at both.

JavaScript delivered over an unsecure connection (such as the normal http protocol) is easily intercepted by an attacker and cannot be trusted. That JavaScript could do anything. So in this case all the articles are correct. Do not send JavaScript cryptography over plain http.

However, if JavaScript is delivered over a secure connection then that JavaScript code can be trusted to the extent that everything else on that website can be trusted. This can for example be used to protect against some information leakage on the server side.

As an example to this, consider key strengthening. Pretty much all websites nowadays have some username/password login form delivered over a secure connection. If you already trust that website with your password, then there is nothing wrong with trusting that website with a JavaScript transformed version of your password (unless the presence of this transformation would make you use a different password that you would not otherwise trust to the server). That may help the website owner in case he doesn’t want your cleartext password show up in server logs, crash dumps, debugging tools (strace, gdb...) or similar.

The same argument can be made about different applications. For example say there is a website that offers some kind of JavaScript based chat feature. The owner of that service may decide to encrypt your messages before they are sent to the server, for the same reason (avoid plaintext in logs and similar). The presence of this JavaScript cryptography should not, as the claim above state, make you use the chat for more sensitive information than you would without the JavaScript crypto.

Conclusion

This short article hopefully shows that there are some specialized use cases for JavaScript cryptography in different settings.

Holiday planning by Debian packages

2012-10-20T17:30:00.001+02:00

Today I am facing a rather difficult decision. Should I travel to Lauterbrunnen for a week of BASE jumping or should I stay with my friends in Sweden for the last weekend of skydiving this season (with the obligatory party), and try out my brand new wingsuit? Facing this decision, I did what any rational human being would do: I decided to ask the aptitude package manager for help in making this decision.

Setting up a local Debian repository

If you already have a local debian repository, you can skip this, otherwise it's useful to set one up. Install a web server of your choice (I happened to be running lighttpd already) and then install, for example, reprepro. I am not going to elaborate on this, here:s a guide that may be useful: Local Debian Repository.

Creating a debian project

A very simple debian project consists of a debian debian directory with the 3 files "control", "rules" and "changelog", so let's create this.

bjorn@terra:~/src/$ mkdir -p decision-example/debian
bjorn@terra:~/src/$ cd decision-example
bjorn@terra:~/src/decision-example$ touch debian/control
bjorn@terra:~/src/decision-example$ cat debian/rules # create this
#!/usr/bin/make -f

%:
 dh $@
bjorn@terra:~/src/decision-example$ chmod +x debian/rules
bjorn@terra:~/src/decision-example$ cat debian/changelog # create this
decision-example (0.0.0) unstable; urgency=low

  * Initial release. (Closes: #XXXXXX)

 -- Björn Edström   Sat, 20 Oct 2012 11:37:03 +0200

While optional, I am also going to create the "compat" file because it will supress a bunch of warnings when building the package.

bjorn@terra:~/src/decision-example$ cat debian/compat # create this
7

The interesting file is the control file, which specifies one or more debian packages to be built. Just to get this out of the way, we are going to add a source package on top, and a single package representing the decision to go skydiving (this package is called gryttjom becayse my home dropzone is a place called Gryttjom which is north of Stockholm in Sweden):

Source: decision-example
Section: devel
Priority: optional
Maintainer: Root Key
Build-Depends: debhelper (>= 7)
Standards-Version: 3.8.4

Package: decision-gryttjom
Architecture: amd64
Description: Go to gryttjom

Having set this up, we can now build our package(s)

bjorn@terra:~/src/decision-example$ debuild -d -us -uc

Essentially that means to build binary (non-source), ignore dependencies when building, and not signing the package. If successful this will create a bunch of files in the parent directory.

Let's publish this package to our debian repository:

bjorn@terra:~/src$ dput -f -u decision decision-example_0.0.0_amd64.changes
Uploading to decision (via local to localhost):
Successfully uploaded packages.
bjorn@terra:~/src$ sudo /usr/bin/reprepro -b /var/www/debian processincoming decision
Exporting indices...

Given that you have configured your /etc/apt/sources correctly that package will now be available after an aptitude update:

bjorn@terra:~$ aptitude search decision
p   decision-gryttjom            - Go to gryttjom

Because we are going to create a bunch of packages from now on, let us create a build script:

bjorn@terra:~/src$ cat ./decision-example/publish.sh #/bin/bash
sudo rm -r /var/www/debian/{pool,db,dists} # lazy way to delete previous, only do this if you don't have other packages
dput -f -u decision decision-example_0.0.0_amd64.changes
sudo /usr/bin/reprepro -b /var/www/debian processincoming decision
sudo aptitude update

Decisions and dependencies

Aptitude, which is a package manager for debian, has a rather advanced dependency resolver. Specifically, you can use the keywords "Depends", "Conflicts", "Suggests", "Enhances", "Recommends" and a bunch of other to specify relations between packages. This is what we are going to use to make our decision.

On one level, we can specify our problem as "go to gryttjom or go to the alps". So let's create two packages for each option.

Package: decision-gryttjom
Provides: decision-root
Priority: standard
Conflicts: decision-alps
Architecture: amd64
Description: Go to gryttjom

Package: decision-alps
Provides: decision-root
Priority: standard
Conflicts: decision-gryttjom
Architecture: amd64
Description: Stay in the alps

Let us build these packages (debuild step) and publish (using the build script). Now we have:

bjorn@terra:~$ aptitude search decision
p   decision-alps                - Stay in the alps
p   decision-gryttjom            - Go to gryttjom
v   decision-root                -

We have now codified a decision problem as shown:

bjorn@terra:~$ sudo aptitude install decision-root
...
"decision-root" is a virtual package provided by:
  decision-gryttjom decision-alps
You must choose one to install.

bjorn@terra:~$ sudo aptitude install --schedule-only decision-gryttjom decision-alps
bjorn@terra:~$ sudo aptitude install # we scheduled above and execute here to avoid installing the default
...
The following packages are BROKEN:
  decision-alps decision-gryttjom
0 packages upgraded, 2 newly installed, 0 to remove and 1 not upgraded.
Need to get 1,966B of archives. After unpacking 57.3kB will be used.
The following packages have unmet dependencies:
  decision-alps: Conflicts: decision-gryttjom but 0.0.0 is to be installed.
  decision-gryttjom: Conflicts: decision-alps but 0.0.0 is to be installed.
The following actions will resolve these dependencies:

Keep the following packages at their current version:
decision-alps [Not Installed]

Score is 117

Accept this solution? [Y/n/q/?] n

The following actions will resolve these dependencies:

Keep the following packages at their current version:
decision-gryttjom [Not Installed]

Score is 117

Accept this solution? [Y/n/q/?]

What we have done is we have created two packages and a root "virtual" package and specified a relationship between them. Using the excellent debtree utility we can visualize this as follows:



bjorn@terra:~/src$ debtree -R -S decision-alps | dot -T png > decision-example/img/debtree1.png

Above is the dependency graph centered on the decision to go to the Alps. The root decision shows the two options: alps or gryttjom, while the red arrow shows a conflict.

Adding data points, additional decisions

The decision I am making is heavily influenced by weather forecasts. Furthermore, if I go to the Alps I a have the option to either stay in Lauterbrunnen or travel to Mt Brento in Italy to meet up with a friend. So lets create some data points and sub decisions:

Package: decision-gryttjom
Provides: decision-root
Priority: standard
Conflicts: decision-alps
Architecture: amd64
Description: Go to gryttjom

Package: decision-alps
Provides: decision-root
Priority: standard
Conflicts: decision-gryttjom
Suggests: decision-alps-brento, data-good-weather-alps
Architecture: amd64
Description: Stay in the alps

Package: decision-alps-brento
Depends: decision-alps
Architecture: amd64
Description: Go to brento

Package: data-good-weather-alps
Architecture: amd64
Enhances: decision-alps
Description: Good weather in the alps

Package: data-bad-weather-gryttjom
Architecture: amd64
Enhances: decision-alps
Recommends: decision-alps
Description: Bad weather at home

Here I have basically set up a bunch of hints or indications that if the weather forecast for Sweden looks bad then that leans towards going to the Alps. Building and publishing gives us:

bjorn@terra:~$ aptitude search decision
p   decision-alps                   - Stay in the alps
p   decision-alps-brento            - Go to brento
p   decision-gryttjom               - Go to gryttjom
v   decision-root                   -
bjorn@terra:~$ aptitude search ^data- | egrep 'alps|gryttjom'
p   data-bad-weather-gryttjom       - Bad weather at home
p   data-good-weather-alps          - Good weather in the alps

Now, let's say the forecast says it seems to be bad weather in Sweden. Let's see what aptitude says:

bjorn@terra:~$ sudo aptitude install --schedule-only decision-gryttjom decision-alps data-bad-weather-gryttjom

bjorn@terra:~$ sudo aptitude install
...
The following packages are BROKEN:
  decision-alps decision-gryttjom
The following NEW packages will be installed:
  data-bad-weather-gryttjom
0 packages upgraded, 3 newly installed, 0 to remove and 1 not upgraded.
Need to get 3,010B of archives. After unpacking 86.0kB will be used.
The following packages have unmet dependencies:
  decision-alps: Conflicts: decision-gryttjom but 0.0.0 is to be installed.
  decision-gryttjom: Conflicts: decision-alps but 0.0.0 is to be installed.
The following actions will resolve these dependencies:

Keep the following packages at their current version:
decision-gryttjom [Not Installed]

Score is 117

Accept this solution? [Y/n/q/?] n
The following actions will resolve these dependencies:

Keep the following packages at their current version:
decision-alps [Not Installed]

Leave the following dependencies unresolved:
data-bad-weather-gryttjom recommends decision-alps
Score is -83

As can be seen, given the simple dependencies we have set up shows that aptitude prefers BASE jumping if the weather is bad in Sweden.

Conclusion

To not make this article too long and tedious, I will stop here and not add more data points. However, by using Debian packaging "Suggest", "Ehances" and "Recommends" on some other data points (cost/travel overhead of getting to Brento, if my cold develops further, how much money I have etc) you can actually get something useful. As the number of data points increase it is not immediately obvious (as it is in the example above) what to do, and aptitude actually gives some suggestions.

Of course, I do not recommend anyone to base their life on what a Linux package manager says, and for that reason this article is a little bit tounge-in-cheek, and I realize this article is also the result of some procrastination on my part. However, actually formalizing the data points and relations between decisions helped me out, and when I have the data points I have formalized (travel costs, weather forecasts etc) will give aptitude a spin and hear it out.

Cheers,

Björn

Fun with the TLS handshake

2012-07-28T17:22:00.001+02:00

This article is about the TLS handshake in general, with a focus on how cryptographic signing is used in the protocol. Specifically, the article will attempt do to the following:

Explain how the TLS handshake works, using Python code.
Highlight how some parts of the protocol allow for signing short amount of client-supplied data by the server certificate.
... and abuse the above parts of the protocol, and common implementations of the protocol, to build poor-mans trusted time-stamping.

The companion code to this article is the little project "toytls" hosted at Github, which implements the client part of the handshake and some surrounding utilities to accomplish point 2 and 3 above.

See the toytls page at github for the code.

An overview of the TLS handshake

TLS, and it's predecessor SSL, is a protocol for secure (encrypted, authenticated) communication over an untrusted network. The "TLS handshake" is the initial part of the protocol, where a client and a server negotiate how (if) to accomplish the goal of secure communication.

The handshake consists of multiple messages being sent back and forth the server and client, with the ultimate goal of: a) allowing the client to verify the authenticity of the server, usually using cryptographic certificates, b) allowing the server to verify the client and c) to securely agree on a method to proceed with the bulk transfer of data.

The handshake typically look like this:



>>> ClientHello

<<< ServerHello

<<< Certificates

<<< ServerHelloDone

>>> ClientKeyExchange

>>> ChangeCipherSpec

>>> Finished

<<< ChangeCipherSpec

<<< Finished

This means that the client first sends a message ClientHello to the server, and will get the response ServerHello back, etc. The handshake ends with both client and server sending Finished. After that point, a secure channel has been set up for the transfer of data.

Cryptographic primitives and cipher suites

TLS has the concept of a "cipher suite". This is a predefined list of cryptographic primitives in different classes (encryption, hashing, authentication etc) to use in the handshake and communication. In this article, we will concern ourselves with three cipher suites:



TLS_RSA_WITH_RC4_128_SHA = 0x0005

TLS_DHE_RSA_WITH_AES_128_CBC_SHA = 0x0033

TLS_ECDHE_RSA_WITH_RC4_128_SHA = 0xc011

The first one, suite TLS_RSA_WITH_RC4_128_SHA (which has id 5), is implemented and supported by almost all clients and servers. The last two are not as common (the servers at Google and Facebook support one each, but not both) and are selected specifically because they have interesting signing behavior, which will be discussed further below.

In English, the three cipher suites above mean:

A cipher suite using RSA for secure key exchange, RC4 for bulk data encryption and SHA1 as a hash.
A cipher suite using Diffie-Hellman for secure key exhange, RSA signatures for authentication, AES-128 in CBC mode for bulk encryption and SHA1 as hash.
A cipher suite using Elliptic Curve Diffie-Hellman for secure key exchange, RSA signatures for authentication, RC4 for bulk data encryption and SHA1 as hash.

As can be seen, a full implementation of one of the versions of TLS requires a full arsenal of different cryptographic primitives (the three examples given are just a small subset of all cipher suites specified in the RFC:s). For the remainder of this article, we will mostly concern ourselves with the first suite, and only concern
ourselves with the second suit to implement the "clever hacks" mentioned in the very beginning of this article.

To implement a handshake using TLS_RSA_WITH_RC4_128_SHA, you need, of course, an implementation of RSA, RC4 and SHA1. In addition, TLS version 1.1, which is by far the most common version on the Internet today, requires an implementation of MD5 which is used for authenticating the handshake itself. This is not a requirement for TLS version 1.2, which use SHA-256 instead.

For practical use of RSA, the cipher suites mentioned employ the PKCS1v5 padding schema, which is a method padding a byte-string securely before encrypting it.

In toytls, the cryptographic primitives are implemented in toytls/rsa.py and toytls/rc4.py.

Message structure

To get a general feel for the TLS handshake we will dissect a sample ClientHello message, being sent from the client to the server. A simple ClientHello can look like below, with line breaks and comments added for readability.



16 03 02 00 31 # TLS Header

01 00 00 2d # Handshake header

03 02 # ClientHello field: version number (TLS 1.1)

50 0b af bb b7 5a b8 3e f0 ab 9a e3 f3 9c 63 15 \

33 41 37 ac fd 6c 18 1a 24 60 dc 49 67 c2 fd 96 # ClientHello field: random

00 # ClientHello field: session id

00 04 # ClientHello field: cipher suite length

00 33 c0 11 # ClientHello field: cipher suite(s)

01 # ClientHello field: compression support, length

00 # ClientHello field: compression support, no compression (0)

00 00 # ClientHello field: extension length (0)

All TLS messages have the TLS header which consists of a content type (first byte), version number (second two bytes) and a length (last two bytes). The content type describes the type of the message, and is 0x16 for the plaintext handshake and 0x17 för encrypted application data.

The handshake messages have a second handshake header describing the type of the handshake message (ClientHello, ServerHello etc). The handshake header also has a length field.

Following the handshake header is payload data specific to the handshake message. For ClientHello, this field contains some random data used for encryption purposes, a list of cipher suites supported by the client, information about compression support and optionally support for TLS extensions, which will not be covered in this article.

It is worth mentioning that, although the ChangeCipherSpec message is sent in the handshake procedure, it is not considered a handshake message and thus has a different content type (0x14) and does not have the handshake header. All the other messages mentioned above are handshake messages with a handshake header and a handshake content type.

Error handling

If something goes wrong during the handshake, the server can send an Alert message, which has content type 0x21. If you send incorrect TLS messages to the server (such as implementing authentication incorrectly, using incorrect decryption keys, attempting to use unsupported features, or otherwise implementing the protocol wrong)
you will get alert messages indicating some kind of failure.

In this article we will mostly ignore these messages. Just be aware that if you play around with toytls and change messages you will surely get these messages from the server.

The handshake from above

In this section of the article we will try to ignore the details of how the different TLS messages are marshalled, and instead focus on the content of each message. If you are interested in the bits and bytes, please have a look at the toytls source code and/or look at Wireshark output. In the description below, we will focus on the cipher suite TLS_RSA_WITH_RC4_128_SHA. Other cipher suites may have
slightly different message contents.

>>> ClientHello

Client says it supports TLS_RSA_WITH_RC4_128_SHA, no compression, and no extensions. It sends some random data used for cryptographic purposes.

<<< ServerHello

Server responds with some random bytes of it's own, also says that it accepts the cipher suite and compression option the client suggested. The server also gives a session ID which can be used if the client wants to reuse the TLS connection later. We will skip over the session ID and sessions in this article.

<<< Certificates

Server sends a list of certificates to the client (for HTTPS, these are the certificates shown in the web browser when the website is opened).

<<< ServerHelloDone

Server sends a message indicating that the client should continue with the handshake.

>>> ClientKeyExchange

The client generates a 48 byte random encryption key called the "pre master secret" and encrypts that with the public key in the server certificate. This encrypted key is sent to the server.

>>> ChangeCipherSpec

The client sends the control message ChangeCipherSpec indicating that everything from now on will be encrypted.

>>> Finished

The client sends an encrypted message that contains a hash of all the messages in the handshake sent and received. This makes it possible to detect tampering. This will be described further below.

<<< ChangeCipherSpec

The server sends the control message ChangeCipherSpec indicating that everything from now on will be encrypted.

<<< Finished

The server responds with a Finished message on it's own, sending an encrypted hash of the handshake messages.

Cryptography in the handshake

As mentioned above the Finished message contains a hash of the handshake sequence. For TLS 1.1, this hash is a slightly unusual construct: it is the concatenation of an MD5 and a SHA1 hash. The hashed data is the content of the handshake messages, without the TLS header.

This means that, when the client constructs ClientHello, it will also update the MD5-SHA1-hash with the content of the message. When it receives ServerHello, it will also update the hash, so the client and server can compare hashed in the Finished messages.

In addition to the MD5-SHA1-hash, TLS 1.1 also use a rather peculiar pseudo-random function (PRF) based on the two hashes. This pseudo-random function is used to generate keying material for the cryptographic primitives and MAC:s. The PRF is a combination of HMAC-MD5 and HMAC-SHA1. Please see toytls/hash.py for details.

When the cipher suite is TLS_RSA_WITH_RC4_128_SHA, the PRF is seeded with the pre-master secret, client random and server random data to generate a master secret. This master secret is then seeded to the PRF, again together with the random data, to create keying material. In this case, the keying material is for the RC4 stream cipher and HMAC-SHA1.

Both sides (client/server) hold encryption/MAC contexts for itself and the other side, so the client in our case has an RC4 encryption key for both the client and server.

Demonstrating the handshake

In the toytls project, please try out the script toytls-handshake.py. This will, given a server and port, complete a TLS 1.1 handshake.

Fun with signatures

With the cipher suite TLS_RSA_WITH_RC4_128_SHA, the handshake is authenticated because the pre master secret is encrypted with the public key in the server ceritificate, making sure only the holder of the private key can decrypt it. From the client side, this is some proof the server is valid.

When the cipher suit sets up a secure channel with Diffie Hellman, this kind of authentication is lacking. Diffie Hellman is anonymous and anyone could impersonate the server. To mitigate this, the TLS cipher suites using Diffie Hellman use RSA or some other algorithm for signing the handshake and encryption parameters. In TLS jargon, this is known as "Ephemeral Diffie Hellman", and is what DHE stands for in
the cipher suite names.

To inform the client of this signature, an extra message is introduced in the TLS handshake, known as ServerKeyExchange. This is sent between Certificate and ServerHelloDone. The signature is over the client and server random data, and the encryption parameters.

This means that a client can, using the client random field (as sent in ClientHello and used for encryption purposes), get up to 32 bytes of data signed by the server ceritificate. This is not completely useful on it's own. However, the TLS specification says that the first part of the random fields should be a timestamp.

Poor mans trusted timestamping

Trusted timestamping is cryptographically associating a timestamp with some data. Typically, you have a document you want to assert you wrote at a certain date. You take a hash of the document, send it to a trusted third party, and have them sign the hash together with a timestamp.

By using the client random field in the TLS handshake, together with a Diffie Hellman cipher suite, and using the fact that TLS implementations contain a timestamp from the server, you can use Google or Facebook as a trusted third party for timestamping up to 32 bytes of data, sufficient for for example a SHA-256 hash.

In the toytls project there are two scripts, toytls-sign.py and toytls-verify.py.



$ cat statement

In this statement I claim that I wrote the following documents the year 2012:



cae3614264895c0201525ec7efff4ca6bb34dfc2  toytls/x509.py

f087da9fe9ad13e8064e6ad951b6aac8c3d54799  scripts/toytls-sign.py



Cheers

be@bjrn.se



$ sha256sum statement

08c247a658bcfe4668d853192dfe9a27c4f7bbc75ca6bc567fdc4726b1628ee8  statement



$ sha256sum statement | awk '{print $1}' |

python -c "import sys; sys.stdout.write(sys.stdin.read().strip().decode('hex'))" |

PYTHONPATH=. scripts/toytls-sign.py www.google.com 443 signed_statement

Signing message:

08c247a658bcfe4668d853192dfe9a27c4f7bbc75ca6bc567fdc4726b1628ee8



Bytes signed:

08c247a658bcfe4668d853192dfe9a27c4f7bbc75ca6bc567fdc4726b1628ee8

5013fa565da4ee2d51c9441cf307578161d1f631a3b4784f193bdd7e9d55b598

03001741049c096ff72fe6a7a1bbc5227a7b9806ab0d12129212a3e700138070

42e35fd8f60efc9cda1ecd9bbf61464a179299b43c3cf195956eedd635a7f859

8091910bf3



Signature:

7c7e3ef5aeea49674e3311112b503c6cfe8149810d5615392d0405939667bf15

95a8c9693cc8d4105a52c85615e7132467757939f72f01354a74882f59463e4d

d76b7eb4ec0de9b6922e2fc3e74336eb0ae619f90f53a2384a1465970a11a9d5

66afd335d3ae9cb2e8f7fd757d5cb5fad530923d29b3df195a963ef699711141



Server certificate:

308203213082028aa00302010202104f9d96d966b0992b54c2957cb4157d4d30

...

20e90a70641108c85af17d9eec69a5a5d582d7271e9e56cdd276d5792bf72543

1c69f0b8f9



$ PYTHONPATH=. scripts/toytls-verify.py signed_statement

Signature Verification SUCCESS



Bytes signed:

08c247a658bcfe4668d853192dfe9a27c4f7bbc75ca6bc567fdc4726b1628ee8

5013fa565da4ee2d51c9441cf307578161d1f631a3b4784f193bdd7e9d55b598

03001741049c096ff72fe6a7a1bbc5227a7b9806ab0d12129212a3e700138070

42e35fd8f60efc9cda1ecd9bbf61464a179299b43c3cf195956eedd635a7f859

8091910bf3



Bytes signed, user supplied messsage (hex):

08c247a658bcfe4668d853192dfe9a27c4f7bbc75ca6bc567fdc4726b1628ee8



Bytes signed, user supplied messsage (repr):

"\x08\xc2G\xa6X\xbc\xfeFh\xd8S\x19-\xfe\x9a'\xc4\xf7\xbb\xc7\\\xa6\xbcV\x7f\xdcG&\xb1b\x8e\xe8"



Bytes signed, server unix timestamp:

1343486550



Bytes signed, server UTC timestamp:

2012-07-28 14:42:30



Signature:

7c7e3ef5aeea49674e3311112b503c6cfe8149810d5615392d0405939667bf15

95a8c9693cc8d4105a52c85615e7132467757939f72f01354a74882f59463e4d

d76b7eb4ec0de9b6922e2fc3e74336eb0ae619f90f53a2384a1465970a11a9d5

66afd335d3ae9cb2e8f7fd757d5cb5fad530923d29b3df195a963ef699711141



Server certificate. For more details, do:

$ openssl asn1parse -inform DER -in signed_statement.certificate.der



308203213082028aa00302010202104f9d96d966b0992b54c2957cb4157d4d30

...

20e90a70641108c85af17d9eec69a5a5d582d7271e9e56cdd276d5792bf72543

1c69f0b8f9

Conclusion

Happy hacking. :)

A proposal for better IRC encryption

2009-01-22T22:57:00.014+01:00

January 25 update: The IRCSRP propsal has been updated to version 1.1, fixing a minor replay attack.

February 12 update: The IRCSRP propsal has been updated to version 2.0, adding message authenticaton. This blog post has NOT been updated, for version 2.0 see the IRCSRP page.

This article assumes a basic understanding of cryptography and IRC terminology (such as “channel”).

IRC is mostly used for insensitive discussions in public channels, such as #math or #haskell on Freenode. As anyone can join these discussions, the plain text nature of the IRC protocol is usually not considered a problem.

Although plain text and thus not suited for private discussions, the protocol offers some features that lends itself to this usage behavior. First of all, it is possible for two users on an IRC network to talk to each other outside channels, similar to chat using instant messaging software. It is also usually possible to form password-protected channels, a feature often used by a group of friends.

While the IRC protocol itself does not specify means for encrypting these private conversations, several methods have been implemented over the years. Some IRC clients and servers now support SSL/TLS; however as most large networks do not, and SSL still allows the server admin to eavesdrop on private conversations, other methods are more common.

This short article will attempt to describe how some of the more common methods work, as well as suggesting an improved SRP-based protocol fixing some shortcomings in the available methods.

A short primer on the IRC protocol

The methods described in this article all use the IRC protocol itself for transporting ciphertext. To understand these methods it is thus necessary to understand the IRC protocol, at least to some extent. Fortunately the protocol is not very complicated. Consult RFC 2812 for the gritty details.

In this example, Alice has connected her IRC client to Freenode. On connection, a small handshake takes place where Alice and Freenode negotiate a unique nickname. Alice is not allowed to enter the server if her desired nickname is already in use. With a unique nickname, Alice can talk to other users on the network.

Next, Alice joins #math and asks a question about some mathematical topic. Her client sends to the server:

PRIVMSG #math :How can I prove this set has this property?\r\n

The server then parses this message, and sends to all other users in channel #math:

:alice!alice@host.com PRIVMSG #math :How can I prove this set has this property?\r\n

The first part of the message, after the initial colon and before the first space, is a string of the form nickname!username@hostname. The other users can thus know who sent the message.

The PRIVMSG message is also used for “private messages” (note that due to the name of the command, this name is unfortunate). If Alice and Bob talk to each other outside a channel, the message exchange looks something like this:

Alice sends to server:
PRIVMSG bob :Hi bob!\r\n

Server sends to bob:
:alice!alice@host.com PRIVMSG bob :Hi bob!\r\n

Bob sends to server:
PRIVMSG alice :Howdy alice!\r\n

Server sends to Alice:
:bob!bob@bobshost.com PRIVMSG alice :Howdy alice!\r\n

There is a second method to send messages to a user or channel. This is known as a NOTICE message, which has the exact same form as PRIVMSG. Client software usually displays these messages differently than PRIVMSG. Some of the software mentioned in this article use the NOTICE message for different purposes.

Encryption

All methods for encrypting IRC conversations within the protocol (that is, excluding layers such as IRC over SSL/TLS) work the same way. The IRC client, or some plugin, or some other software between the client and the IRC server, rewrites the content of the PRIVMSG (or NOTICE) messages before it’s sent to the server.

For example, the message:

PRIVMSG bob :Hi bob!\r\n

is rewritten to, using one common method described shortly:

PRIVMSG bob :+OK BRurM1bWPZ1.\r\n

before being sent to the server. (Note that the characters between the colon and the line break is not part of the IRC protocol itself).

Bob’s software then decrypts this message and the client displays it on the screen, assuming of course Bob knows the details needed for decryption, such as keys.

From here on in the article, the message content (as it would be sent without encryption) will be referred to as the “plaintext”, and the replacement will be referred to as “ircmessage”. In the example above:

plaintext = "Hi bob!"
ircmessage = "+OK BRurM1bWPZ1."

Blowfish-ECB (”blowcrypt”)

Probably the most common software for encrypting IRC is Fish, which is a plugin for several IRC clients. Fish implements an encryption method originally used in a program called blowcrypt. This method is supported by several other plugins and clients, such as Mircryption.

blowcrypt users (Alice and Bob talking privately, or a whole channel) agree on a password used as key. Messages are then encrypted with Blowfish using this key, and formatted in a special way before being sent to the server.

There are two points of interest. Blowfish is used in ECB mode, which is no mode at all. Each message is split into blocks of 8 bytes, encrypted individually. A block shorter than 8 bytes is padded with zeroes.

Because of this design decision, blowcrypt encryption is very easy to implement for software developers working on IRC clients. The shortcoming is weaker security. For example, the message “Hi” is encrypted to the same string every time, for a given key.

Before the ciphertext is sent to the server, it is formatted like so:

ircmessage = "+OK " || BlowcryptBase64(Blowfish(key, plaintext))

Where || denote concatenation and BlowcryptBase64 is a non-standard base64 implementation. The prefix “+OK “ is used as an identifier for the software handling the decryption. A conversation may look like this, where the string “password” is used as key:

PRIVMSG bob :Hi bob!\r\n

is rewritten to

PRIVMSG bob :+OK BRurM1bWPZ1.\r\n

Here’s some Python code.

>>> b = Blowfish("password")
>>> blowcrypt_pack("Hi bob!", b)
'+OK BRurM1bWPZ1.'
>>> blowcrypt_unpack(_, b)
'Hi bob!'

Blowfish-CBC (“Mircryption”)

An obvious shortcoming of blowcrypt is the use of ECB mode. The Mircryption plugin, which supports several common IRC clients, supports a new encryption method using Blowfish in CBC mode. (For more information about cryptographic modes in general and the CBC in particular, see the TrueCrypt article on this website).

For each message an 8 byte IV is randomly generated. The message is encrypted with the key and the IV. The message sent to the server is

ircmessage = "+OK *" || Base64(IV || BlowfishCBC(key, IV, plaintext))

Here Base64 is the standard MIME base64 implementation, so other than the use of Blowfish this encryption method is different from, and better than, blowcrypt in most ways.

Here’s some Python code:

>>> b = BlowfishCBC("keyTest")
>>> mircryption_cbc_pack("Hello world!", b)
'+OK *5RQreHBF54PH3wFxsFmf2o1i6dh5ykeA'
>>> mircryption_cbc_unpack(_, b)
'Hello world!'

Diffie-Hellman key exchange (“DH1080”)

A problem for users of blowcrypt, Fish, Mircryption or similar software is handling the keys. It is often cumbersome to establish a secret key with the person you want to talk to securely.

The Fish developers have solved this with the software DH1080, which is an implementation of the Diffie-Hellman key exchange for IRC. Diffie-Hellman is a cryptographic protocol where two parties establish a secret key over an insecure channel, thus getting rid of the need for users to remember/establish the key themselves.

Alice, using DH1080, instructs the software to establish a key with Bob. DH1080 then automatically does the necessary computations and communications with Bob to establish a secret key. When everything is done Alice and Bob both know a secret key that can be used by Fish (or other software).

DH1080 can only be used for “private message” conversations. Due to constraints both from the Diffie-Hellman protocol itself and how IRC works, it is not feasible to use DH1080 for establishing a key for a whole channel. So if you want to encrypt messages to a channel, you’ll have to stick with a pre-determined key or use the proposed method discussed later in this article.

That said. DH1080 is a standard Diffie-Hellman implementation, with few surprises. The IRC specific part is how the message is formatted and sent. Using the terminology from the Diffie-Hellman Wikipedia article:

1. Alice and Bob agree to use a prime number p (see below) and a base g = 2.

2. Alice chooses a secret integer 2 < a < p-1 as private key, computes her public key A = g^a mod p and sends Bob, in a NOTICE message:

ircmessage = "DH1080_INIT " || DH1080Base64(IntAsBytes(A))

3. Bob chooses a secret integer 2 < b < p-1 as private key, computes the public key B = g^b mod p and sends Alice, in a NOTICE message:

ircmessage = "DH1080_FINISH " || DH1080Base64(IntAsBytes(B))

4. Alice checks that B is valid and if it is computes the secret key s = B^a mod p. B is valid if 1 < B < p.

5. Bob checks that A is valid and if it is computes the secret key s = A^b mod p. A is valid if 1 < A < p.

Here, DH1080Base64 is another non-standard base64-implementation. IntAsBytes is a variable length, big-endian representation of an integer. For example, the integer 0xabbcc is represented as the three-byte string "\x0a\xbb\xcc".

A point of interest: DH1080 uses a curious 1080 bit prime number p in the default implementation. This prime number is constructed in such a way it yields a message in English when the prime number is base64 encoded.

In base 16:

p = 
FBE1022E23D213E8ACFA9AE8B9DFADA3EA6B7AC7A7B7E95AB5EB2DF858921
FEADE95E6AC7BE7DE6ADBAB8A783E7AF7A7FA6A2B7BEB1E72EAE2B72F9FA2
BFB2A2EFBEFAC868BADB3E828FA8BADFADA3E4CC1BE7E8AFE85E9698A783E
B68FA07A77AB6AD7BEB618ACF9CA2897EB28A6189EFA07AB99A8A7FA9AE29
9EFA7BA66DEAFEFBEFBF0B7D8B

And when the prime is represented as bytes, coded with the non-standard base64 implementation:

++ECLiPSE+is+proud+to+present+latest+FiSH+release+featuring+e
ven+more+security+for+you+++shouts+go+out+to+TMG+for+helping+
to+generate+this+cool+sophie+germain+prime+number++++/C32L

While the statement is arguably incorrect – p is a safe prime and (p-1)/2 is Sophie Germain – this doesn’t affect the security of the key exchange as the prime chosen satisfies all the desired properties for a “strong” prime.

By RFC 2631, a prime chosen is considered secure if it can be written as p = jq + 1, q is a large prime and j => 2. In this case, j = 2 by construction and both p and q=(p-1)/2 are prime, so the equality holds.

Here’s some Python code:

>>> alice = DH1080Ctx()
>>> bob = DH1080Ctx()
>>> dh1080_pack(alice)
'DH1080_INIT qStH1LjBpb47s0XY80W9e3efrVSh2Qfq19291XAuwa7C9UFvW0sYY
424FOS6JNVsoYVH85lj6oPkr8w3KRZDDqVoV+7yCVtLmhCcC3dHyz4Ynbe93HEtR3n
26+Q1dWPm+JgZEGSYnhNunk7FOqsXFUR/2O9dkbpnxZDh2UFFmg0uGbukgG+FA'
>>> dh1080_unpack(_, bob)
True
>>> dh1080_pack(bob)
'DH1080_FINISH mjyk//fqPoEwp5JfbJtzDmlfpzmtMEFw5Ueyk51ydAjXBd8cjqz
m5oW9V0/VTk8ag0DzzKw8+9C6hPotvonaI8PwkSplHlDHjGJgcpirIn7C07jHw2WEt
NUpZtyz2UNT/RStGLe/s+4MrmoUC6vOaIZMUu7sgCXyUbDfYk5QCcbQVRx6rH+hA'
>>> dh1080_unpack(_, alice)
True
>>> dh1080_secret(alice)
'tfu4Qysoy56OYeckat1HpJWzi+tJVx/cm+Svzb6eunQ'
>>> dh1080_secret(bob)
'tfu4Qysoy56OYeckat1HpJWzi+tJVx/cm+Svzb6eunQ'

Misc.

Other than the two Blowfish based solutions mentioned, there exist several other, less common, methods. Here are two of them.

PsyBNC

PsyBNC is a so-called bouncer, a kind of IRC proxy that offers several features. One of these features is encryption. The encryption is very similar to Blowfish-ECB, however instead of Base64 another serialization method is used instead.

During the code review, an innocent but dangerous implementation error was discovered in PsyBNC:s Blowfish code. Until this problem is fixed, users are advised to switch to a different encryption method. Cryptanalysis is available here, courtesy of David Wagner: sci.crypt: Strength of Blowfish with broken F-function?.

OTR

OTR is worth mentioning not because it’s very common, but because it should be. OTR, or Off-the-Record Messaging, is designed for encrypting instant messaging-style conversations and offers several desirable features for these kinds of conversations.

While OTR cannot encrypt IRC channels due to technical reasons, it is ideal for private message conversations. An OTR plugin currently exists for irssi.

Summary of available methods

There are two different cases to consider when choosing a solution for encrypting IRC conversations.

For “private message” conversations IRC does not severely limit the available choices, so more options are available:

Blowfish-ECB should due the ECB mode not be used at all.

Blowfish-CBC and a shared password: If the two users can keep the shared password secret, this method is decent in its simplicity. The fact that the users share a secret offers some authentication. A problem is if the password is compromised. Then all previous conversations are available to the attacker.

Blowfish-CBC with unauthenticated key exchange: This solution has the benefit of getting rid of the need of the two users to remember/establish a shared secret. Also, as the key can be constantly renewed, perfect forward secrecy is achieved. There are, however, several important shortcomings. Diffie Hellman alone can be attacked by a man in the middle – in fact it would be trivial for a server admin to automatically perform this attack when a DH1080 exchange takes place on his server. Adding some sort of authentication solves this, however this may requires more involvement from the users.

OTR: This is the ideal solution; in fact it was designed for instant message-style conversations. Right now the only problem is lack of implementations. As of January 2009, there only exist one OTR plugin, for irssi (while popular, mIRC is probably even more so).

For encrypting a whole channel, there aren’t as many reasonable choices. Of the methods described above, Blowfish-CBC with a shared password is recommended. A better method will be described shortly.

All of these methods can be combined with Tor or similar software to achieve anonymity, remembering the Tor end node can, like the IRC server admin, attack the DH key exchange.

Future improvements

There are currently no known attacks on Blowfish, so replacing this block cipher with something newer (AES, Twofish) is not necessarily an improvement – of course it doesn’t hurt. A more considerable improvement would be using a cryptographic protocol that somehow solves all the following problems:

Perfect forward secrecy.
Authentication.
Resistance against the attacks mentioned above (e.g. MITM).
A compromised password should do as little damage as possible.

With the following constraints

Should be easy to use for end users, i.e. not requiring significant public key infrastructure.
Should be applicable to group conversations, i.e. IRC channels.
Should be reasonably straightforward to implement.
Should not abuse the IRC network (this rules out solutions where all conversations take place in private messages, but client software rewrites the recipient giving the appearance of group conversation).

Here is my proposal to the IRC community. Comments are welcome.

IRCSRP version 1.1

February 12 update: The IRCSRP propsal has been updated to version 2.0, adding message authenticaton. This blog post has NOT been updated, for version 2.0 see the IRCSRP page.

This new method of IRC encryption is based on the “optimized” SRP-6, the Secure Remote Password Protocol. It is described in detail here: http://srp.stanford.edu/doc.html (the first paper is especially readable). SRP is a protocol for password-based authentication between a user and a host, where the user only has to remember a simple password. No public key infrastructure is necessary.

The protocol as described in this article has been adapted for IRC usage.

Sample setup

Alice, Bob, Carol and Dave talk in a channel #friends on a public IRC network. Dave is the most technical user of the four, and will be given a special purpose.

Overview

Instead of everyone on the channel sharing a password together, each one of Alice, Bob and Carol share a so-called verifier with Dave.

The basic idea of the IRC adapted protocol is each user does an authenticated key exchange with Dave. Alice and Dave share knowledge of a password (Alice knows her password, Dave has the verifier), which is used for authentication. If successful, Dave sends the current session key to Alice, which is used for decrypting the content of the channel.

Once in a while Dave generates a new session key and broadcasts it to everyone on the channel. Thus forward secrecy is achieved.

Details – channel encryption

The messages sent in the actual channel (#friends in our example) is:

ircmessage = "+aes " || Base64(IV || AESCBC(sessionkey, IV, "M" || info || plaintext))

Here Base64 is the standard MIME base64 implementation with padding characters left intact. AESCBC is AES in CBC-mode. sessionkey is a 256 bit key randomly generated by Dave. info is a short information string:

info = len(username) || username || timestamp

username is a string, also known as 'I'. It will be described shortly. len(username) is a single byte telling the length of the string. timestamp is a 32 bit Unix timestamp, represented in big endian.

As the IRC protocol is limited to messages 512 characters in length (this includes the whole message, including the PRIVMSG command) client software should split plaintext into parts if the complete IRC message gets too long.

When Dave broadcasts a new, updated session key, the message is:

ircmessage = "+aes " || Base64(IV || AESCBC(old_sessionkey, IV, "\xffKEY" || new_sessionkey))

Details – the authenticated key exchange (preparations)

The interesting part of the setup is establishing the session key for a user who doesn’t already know it. Here are some constants we are going to use, using the same terminology as the SRP paper.

N = The prime number from the 2048-bit MODP Group as described in RFC 3526.
g = 2
H = the SHA-256 hash algorithm. Depending on context, the hash is either a 32-byte string or said string interpreted as a 256 bit big-endian integer.

Before the key exchange, Alice and Dave have to share some information.

1) Alice selects a username I and a password P. The username should be constant and not derived from Alice IRC nickname or host.

2) Alice generates a random salt s then computes the verifier v:

s = random 256 bit integer
x = H(s || I || P)
v = g^x (mod N)

3) Alice gives Dave s and v, which he stores together with Alice username I. From now on, Alice only has to remember I and P. She can and should discard s, x and v.

It is very important Alice gives s and v to Dave in person, or through an existing authenticated secure channel (such as a GPG encrypted e-mail.)

Details – the authenticated key exchange

The key exchange works as follows, where all ircmessage are sent in a NOTICE:

1) Alice sends Dave her username I. This initiates the exchange.

ircmessage = "+srpa0 " || I

2) Dave looks up Alice information (s, v), computes and sends:

b = random integer with 1 < b < N.
B = 3v + g^b (mod N)
ircmessage = "+srpa1 " || Base64(s || B)

3) Alice asserts B != 0 (mod N), then computes and sends:

a = random integer with 1 < a < N.
A = g^a (mod N)
x = H(s || I || P)
u = H(A || B)
S = (B – 3g^x)^(a + ux) (mod N)
K = H(S)
M1 = H(A || B || S)
ircmessage = "+srpa2 " || Base64(M1 || IntAsBytes(A))

4) Dave verifies M1, then if Alice is trusted, computes and sends:

u = H(A || B)
S = (Av^u)^b (mod N)
K = H(S)
M2 = H(A || M1 || S)
ircmessage = "+srpa3 " || Base64(IV || AESCBC(K, IV, sessionkey || M2))

5) Alice verifies M2, and decrypts the session key using K. If the verification holds, then Dave and the session key are trusted.

Exactly why and how this works is described in the SRP paper.

Problems

The bottleneck in the protocol as described is the dependence on Dave. There are two notable problems to consider:

Problem 1: What happens when Dave is off-line

The protocol as described is better suited for medium sized than very small groups of friends. For the case with 10 or more users, it is seldom a problem in practice to find one or two users with good enough uptime to act as Dave.

The problem is for smaller groups, such as the sample setup described above. Assume all four users are talking to each other. Carol then disconnects, and later Dave changes the session key for the channel. Due to network trouble Dave then disconnects from the network. While Alice and Bob can still talk to each other, Carol can no longer decrypt the messages. With Dave offline, she can’t get the session key either.

Solution 1:

There are several potential solutions for this problem. Unfortunately most of the obvious ones (such as distributing Dave’s task) add complexity to the protocol, where users no longer have to remember only their password.

Until this problem is solved, the users are recommended to fall back on a simpler encryption method.

Problem 2: Net splits

A moderately common problem with IRC networks is so called net splits, where two or more IRC servers can no longer communicate.

Assume Alice and Bob are on one server, and Carol and Dave on another. The two servers split. After this has happened, Dave changes the key. When the two servers rejoin, Alice and Bob can no longer understand Carol and Dave. In this case, they will initiate a key exchange with Dave.

When several users simultaneously perform the SRP exchange with Dave, his CPU usage may spike for an unacceptable amount of time, depending on how fast his protocol implementation is and the number of users.

Solution 2:

There are several possible solutions to this problem:

1) Dave changes the session key manually instead of automatically. In this case he can avoid changing the key during a net split.

2) The software implementation adds some kind of net split detection, and doesn’t change the session key when the network is considered unstable.

3) The problem is ignored – several users exchanging keys with Dave simultaneously is not considered a problem.

Implementation details

While the key exchange has slightly more steps than a simple Diffie-Hellman exchange such as the one used by DH1080, all individual steps are quite easy to implement given a big integer library, a base64 implementation and code for AES, CBC and SHA256. In the Python programming language, the whole key exchange can be performed in about two hundred lines (this implementation is quite slow though).

There are other considerations for implementations of the protocol, especially concerning automation of the key exchange.

1) Alice joins #friends and tells the IRC client her password. This is the only user intervention required.

2) The implementation randomly picks a Dave and initiates the exchange.

3) From here on everything happens automatically, as required.

For the second step, the implementation may look for users prefixed as channel operators, or each user saves nicknames for the Dave’s in a configuration file.

Code

Here’s some Python code, demonstrating Dave's setup, Alice setup and the key exchange itself.

>>> s, v = ircsrp_generate("alice", "passw")
>>> dave = IRCSRPCtx(True)
>>> dave.users.db["alice"] = (s, v)
>>> dave.sessionkey = urandom(32)
>>> dave.cipher = AESCBC(dave.sessionkey)
>>>
>>> alice = IRCSRPCtx()
>>> alice.username = "alice"
>>> alice.password = "passw"
>>> 
>>> ircsrp_exchange(alice)
'+srpa0 alice'
>>> ircsrp_exchange(dave, _, "alicenick")
'+srpa1 AU/DMrnF/JrccLBs4EKDW4U4fJHafqvAwIsOxTiI84Z9oeisZlO6D1a
XTuUXeslI/4957Q1kUtJURzjulRSk43bZSeDfE90u/WvcfD5ayh/d7owdaVAsh/
eN6p7dJFSqWEh0Rd8xD1FM+w/SZ+6MF+USybPAe+MFg0cWmJx2UDryvWd0y0UiG
RNvVVdtWOS5TojJ7Psk+4wXrPq+1wz18+h6C/LbElJKezhsu2H02kdpfT9vSJ+7
bQBrfgWqr7QASakO+6I3BuyuVaIirVLrX1XHSwXE3Kt91TQAHY7clLXiYrVOOs1
E27VY8VJ7imXD3tlbx1TVzJRNehrWKPqrn3Wl42PxDN62fG/NisN/YeC6hiTivs
6ivwA6btsWHX9h'
>>> ircsrp_exchange(alice, _)
'+srpa2 6kSQvvZnqioVmLMsrNG0/CPFOnMW6qutgOOHCLCPJJBvHJtbjy2Q0Ee
Al0AAJdymGXyLvADHpDt5UgVLn0TtGLEc86wx+Shf/bdXAsOHaZrRm/P8csN2yy
t4CPSTr/oIGP8bLZIVcVOjZHwPX2uF9tmOE9o9xotLzPJVbEbZoCC/uvvMjQRch
LgW4tg6gknpI5U/noi0w4cMA7OY9HnBRXgGjTbrMIm00bkE3v7YWZ0CXBZX9W7q
F2B28qJauT5l6PL3o/PAgw3I5MP28/eowWGEjNf6W1wWqmE1q2oXS88xeCEAQ6A
mU0qFiHrWBFMP04Um9WNKAuNR9e8p/zGink/nCWX6JmjbApJRbHTjx3nNa7qCCN
3Fg/QsA+uJNkbA'
>>> ircsrp_exchange(dave, _, "alicenick")
'+srpa3 KLSFN+yqid9NXBzJEydFTAJm+9U5dZcbNIGr8sjzrLoOvSWn70H652D
gU/IA8a7PlDiL3YzJOTI/mk1C1x+M8WG8cjbsx8axdlmCvFC/xpk='
>>> ircsrp_exchange(alice, _)
*** Session key is: '\xce\xa3k\xa3\x03\xfdx$=\xd1\xf1P\xa4Cw
\x16\\\xaaw!d\x9f]|\xddl\xe4q\x15%\xfcI'
True
>>> dave.sessionkey
'\xce\xa3k\xa3\x03\xfdx$=\xd1\xf1P\xa4Cw\x16\\\xaaw!d\x9f]|\xddl
\xe4q\x15%\xfcI'
>>>
>>> ircsrp_pack(alice, "Hello everyone!")
'+aes zNRmWAM1WxOedS0twJVOIoBTQKbh/7c5GzgHRnfL+KbSrLCGTLjW/3yvV
AoiPMWh'
>>> ircsrp_unpack(dave, _)
*** Sent by username: alice
*** Sent at time: Sun Jan 25 16:11:47 2009
'Hello everyone!'

Conclusion

Of the available methods for encrypting IRC, This author recommends the Blowfish-CBC method for encrypting channels, and OTR for private messages (in which case it may be easier to use instant messaging instead of IRC). For protection only against passive eavesdroppers, Blowfish-CBC+DH1080 can be used instead of OTR.

The proposed IRCSRP method may not be applicable for all usage scenarios, but when it is the authentication and session key based approach should give considerably stronger security than a shared password.

The code in this article is available here: irccrypt.py. It has a the free OpenBSD license, and can be used for pretty much any purpose.

Let’s build an MP3-decoder!

2008-10-01T16:55:00.001+02:00

Even though MP3 is probably the single most well known file format and codec on Earth, it’s not very well understood by most programmers – for many encoders/decoders is in the class of software “other people” write, like standard libraries or operating system kernels. This article will attempt to demystify the decoder, with short top-down primers on signal processing and information theory when necessary. Additionally, a small but not full-featured decoder will be written (in Haskell), suited to play around with.

The focus on this article is on concepts and the design choices the MPEG team made when they designed the codec – not on uninteresting implementation details or heavy theory. Some parts of a decoder are quite arcane and are better understood by reading the specification, a good book on signal processing, or the many papers on MP3 (see references at the end).

A note on the code: The decoder accompanying this article is written for readability, not speed. Additionally, some unusual features have been left out. The end result is a decoder that is inefficient and not standards compliant, but with hopefully readable code. You can grab the source here: mp3decoder-0.0.1.tar.gz. Scroll down to the bottom of the article or see README for build instructions.

A fair warning: The author is a hobby programmer, not an authority in signal processing. If you find an error, please drop me an e-mail. be@bjrn.se

With that out of the way, we begin our journey with the ear.

Human hearing and psychoacoustics

The main idea of MP3 encoding, and lossy audio coding in general, is removing acoustically irrelevant information from an audio signal to reduce its size. The job of the encoder is to remove some or all information from a signal component, while at the same time not changing the signal in such a way audible artifacts are introduced.

Several properties (or “deficiencies”) of human hearing are used by lossy audio codecs. One basic property is we can’t hear above 20 kHz or below 20 Hz, approximately. Additionally, there’s a threshold of hearing – once a signal is below a certain threshold it can’t be heard, it’s too quiet. This threshold varies with frequency; a 20 Hz tone can only be heard if it’s stronger than around 60 decibels, while frequencies in the region 1-5 kHz can easily be perceived at low volume.

A very important property affecting the auditory system is known as masking. A loud signal will “mask” other signals sufficiently close in frequency or time; meaning the loud signal modifies the threshold of hearing for spectral and temporal neighbors. This property is very useful: not only can the nearby masked signals be removed; the audible signal can also be compressed further as the noise introduced by heavy compression will be masked too.

This masking phenomenon happens within frequency regions known as critical bands – a strong signal within a critical band will mask frequencies within the band. We can think of the ear as a set of band pass filters, where different parts of the ear pick up different frequency regions. An audiologist or acoustics professor have plenty to say about critical bands and the subtleties of masking effects, however in this article we are taking a simplified engineering approach: for our purpose it’s enough to think of these critical bands as fixed frequency regions where masking effects occur.

Using the properties of the human auditory system, lossy codecs and encoders remove inaudible signals to reduce the information content, thus compressing the signal. The MP3 standard does not dictate how an encoder should be written (though it assumes the existence of critical bands), and implementers have plenty of freedom to remove content they deem imperceptible. One encoder may decide a particular frequency is inaudible and should be removed, while another encoder keeps the same signal. Different encoders use different psychoacoustic models, models describing how humans perceive sounds and thus what information may be removed.

About MP3

Before we begin decoding MP3, it is necessary to understand exactly what MP3 is. MP3 is a codec formally known as MPEG-1 Audio Layer 3, and it is defined in the MPEG-1 standard. This standard defines three different audio codecs, where layer 1 is the simplest that has the worst compression ratio, and layer 3 is the most complex but has the highest compression ratio and the best audio quality per bit rate. Layer 3 is based on layer 2, in turn based on layer 1. All of the three codecs share similarities and have many encoding/decoding parts in common.

The rationale for this design choice made sense back when the MPEG-1 standard was first written, as the similarities between the three codecs would ease the job for implementers. In hindsight, building layer 3 on top of the other two layers was perhaps not the best idea. Many of the advanced features of MP3 are shoehorned into place, and are more complex than they would have been if the codec was designed from scratch. In fact, many of the features of AAC were designed to be “simpler” than the counterpart in MP3.

At a very high level, an MP3 encoder works like this: An input source, say a WAV file, is fed to the encoder. There the signal is split into parts (in the time domain), to be processed individually. The encoder then takes one of the short signals and transforms it to the frequency domain. The psychoacoustic model removes as much information as possible, based on the content and phenomena such as masking. The frequency samples, now with less information, are compressed in a generic lossless compression step. The samples, as well as parameters how the samples were compressed, are then written to disk in a binary file format.

The decoder works in reverse. It reads the binary file format, decompress the frequency samples, reconstructs the samples based on information how content was removed by the model, and then transforms them to the time domain. Let’s start with the binary file format.

Decoding, step 1: Making sense of the data

Many computer users know that an MP3 are made up of several “frames”, consecutive blocks of data. While important for unpacking the bit stream, frames are not fundamental and cannot be decoded individually. In this article, what is usually called a frame we call a physical frame, while we call a block of data that can actually be decoded a logical frame, or simply just a frame.

A logical frame has many parts: it has a 4 byte header easily distinguishable from other data in the bit stream, it has 17 or 32 bytes known as side information, and a few hundred bytes of main data.

A physical frame has a header, an optional 2 byte checksum, side information, but only some of the main data unless in very rare circumstances. The screenshot below shows a physical frame as a thick black border, the frame header as 4 red bytes, and the side information as blue bytes (this MP3 does not have the optional checksum). The grayed out bytes is the main data that corresponds to the highlighted header and side information. The header for the following physical frame is also highlighted, to show the header always begin at offset 0.

The absolutely first thing we do when we decode the MP3 is to unpack the physical frames to logical frames – this is a means of abstraction, once we have a logical frame we can forget about everything else in the bit stream. We do this by reading an offset value in the side information that point to the beginning of the main data.

Why’s not the main data for a logical frame kept within the physical frame? At first this seems unnecessarily clumsy, but it has some advantages. The length of a physical frame is constant (within a byte) and solely based on the bit rate and other values stored in the easily found header. This makes seeking to arbitrary frames in the MP3 efficient for media players. Additionally, as frames are not limited to a fixed size in bits, parts of the audio signal with complex sounds can use bytes from preceding frames, in essence giving all MP3:s variable bit rate.

There are some limitations though: a frame can save its main data in several preceding frames, but not following frames – this would make streaming difficult. Also, the main data for a frame cannot be arbitrarily large, and is limited to about 500 bytes. This is limit is fairly short, and is often criticized.

The perceptive reader may notice the gray main data bytes in the image above begin with an interesting pattern (3E 50 00 00…) that resembles the first bytes of the main data in the next logical frame (38 40 00 00…). There is some structure in the main data, but usually this won’t be noticeable in a hex editor.

To work with the bit stream, we are going to use a very simple type:

data MP3Bitstream = MP3Bitstream {
    bitstreamStream :: B.ByteString,
    bitstreamBuffer :: [Word8]
}

Where the ByteString is the unparsed bit stream, and the [Word8] is an internal buffer used to reconstruct logical frames from physical frames. Not familiar with Haskell? Don’t worry; all the code in this article is only complementary.

As the bit stream may contain data we consider garbage, such as ID3 tags, we are using a simple helper function, mp3Seek, which takes the MP3Bitstream and discards bytes until it finds a valid header. The new MP3Bitstream can then be passed to a function that does the actual physical to logical unpacking.

mp3Seek :: MP3Bitstream -> Maybe MP3Bitstream
mp3UnpackFrame :: MP3Bitstream -> (MP3Bitstream, Maybe MP3LogicalFrame)

The anatomy of a logical frame

When we’re done decoding proper, a logical frame will have yielded us exactly 1152 time domain samples per channel. In a typical PCM WAV file, storing these samples would require 2304 bytes per channel – more than 4½ KB in total for a typical audio track. While large parts of the compression from 4½ KB audio to 0.4 KB frame stems from the removal of frequency content, a not insignificant contribution is thanks to a very efficient binary representation.

Before that, we have to make sense of the logical frame, especially the side information and the main data. When we’re done parsing the logical frame, we will have compressed audio and a bunch of parameters describing how to decompress it.

Unpacking the logical frame requires some information about the different parts. The 4-byte header stores some properties about the audio signal, most importantly the sample rate and the channel mode (mono, stereo etc). The information in the header is useful both for media player software, and for decoding the audio. Note that the header does not store many parameters used by the decoder, e.g. how audio samples should be reconstructed, those parameters are stored elsewhere.

The side information is 17 bytes for mono, 32 bytes otherwise. There’s lots of information in the side info. Most of the bits describe how the main data should be parsed, but there are also some parameters saved here used by other parts of the decoder.

The main data contains two “chunks” per channel, which are blocks of compressed audio (and corresponding parameters) decoded individually. A mono frame has two chunks, while a stereo frame has four. This partitioning is cruft left over from layer 1 and 2. Most new audio codecs designed from scratch don’t bother with this partitioning.

The first few bits of a chunk are the so-called scale factors – basically 21 numbers, which are used for decoding the chunk later. The reason the scale factors are stored in the main data and not the side information, as many other parameters, is the scale factors take up quite a lot of space. How the scale factors should be parsed, for example how long a scale factor is in bits, is described in the side information.

Following the scale factors is the actual compressed audio data for this chunk. These are a few hundred numbers, and take up most of the space in a chunk. These audio samples are actually compressed in a sense many programmers may be familiar with: Huffman coding, as used by zip, zlib and other common lossless data compression methods.

The Huffman coding is actually one of the biggest reasons an MP3 file is so small compared to the raw audio, and it’s worth investigating further. For now let’s pretend we have decoded the main data completely, including the Huffman coded data. Once we have done this for all four chunks (or two chunks for mono), we have successfully unpacked the frame. The function that does this is:

mp3ParseMainData :: MP3LogicalFrame -> Maybe MP3Data

Where MP3Data store some information, and the two/four parsed chunks.

Huffman coding

The basic idea of Huffman coding is simple. We take some data we want to compress, say a list of 8 bit characters. We then create a value table where we order the characters by frequency. If we don’t know beforehand how our list of characters will look, we can order the characters by probability of occurring in the string. We then assign code words to the value table, where we assign the short code words to the most probable values. A code word is simply an n-bit integer designed in such a way there are no ambiguities or clashes with shorter code words.

For example, lets say we have a very long string made up of the letters A, C, G and T. Being good programmers, we notice it’s wasteful to save this string as 8 bit characters, so we store them with 2 bits each. Huffman coding can compress the string further, if some of the letters are more frequent than others. In our example, we know beforehand ‘A’ occurs in the string with about 40% probability. We create a frequency table:

A	40%
C	35%
G	20%
T	5%

We then assign code words to the table. This is done in a specific way – if we pick code words at random we are not Huffman coding anymore but using a generic variable-length code.

A	`0`
C	`10`
G	`110`
T	`111`

Say we have a string of one thousand characters. If we save this string in ASCII, it will take up 8000 bits. If we instead use our 2-bit representation, it will only take 2000 bits. With Huffman coding however, we can save it in only 1850.

Decoding is the reverse of coding. If we have a bit string, say 00011111010, we read bits until there’s a match in the table. Our example string decodes to AAATGC. Note that the code word table is designed so there are no conflicts. If the table read

A	`0`
C	`01`

… and we encounter the bit 0 in a table, there’s no way we can ever get a C as the A will match all the time.

The standard method of decoding a Huffman coded string is by walking a binary tree, created from the code word table. When we encounter a 0 bit, we move – say – left in the tree, and right when we see a 1. This is the simplest method used in our decoder.

There’s a more efficient method to decode the string, a basic time-space tradeoff that can be used when the same code word table is used to code/decode several different bit strings, as is the case with MP3. Instead of walking a tree, we use a lookup table in a clever way. This is best illustrated with an example:

lookup[0xx] = (A, 1)
lookup[10x] = (C, 2)
lookup[110] = (G, 3)
lookup[111] = (T, 3)

In the table above, xx means all permutations of 2 bits; all bit patterns from 00 to 11. Our table thus contains all indices from 000 to 111. To decode a string using this table we peek 3 bits in the coded bit string. Our example bit string is 00011111010, so our index is 000. This matches the pair (A, 1), which means we have found the value A and we should discard 1 bit from the input. We peek another 3 bits in the string, and repeat the process.

For very large Huffman tables, where the longest code word is dozens of bits, it is not feasible to create a lookup table using this method of padding as it would require a table approximately 2ⁿ elements large, where n is the length of the longest code word. By carefully looking at a code word table however, it’s often possible to craft a very efficient lookup table by hand, that uses a method with “pointers” to different tables, which handle the longest code words.

How Huffman coding is used in MP3

To understand how Huffman coding is used by MP3, it is necessary to understand exactly what is being coded or decoded. The compressed data that we are about to decompress is frequency domain samples. Each logical frame has up to four chunks – two per channel – each containing up to 576 frequency samples. For a 44100 Hz audio signal, the first frequency sample (index 0) represent frequencies at around 0 Hz, while the last sample (index 575) represent a frequency around 22050 Hz.

These samples are divided into five different regions of variable length. The first three regions are known as the big values regions, the fourth region is known as the count1 region (or quad region), and the fifth is known as the zero region. The samples in the zero region are all zero, so these are not actually Huffman coded. If the big values regions and the quad region decode to 400 samples, the remaining 176 are simply padded with 0.

The three big values regions represent the important lower frequencies in the audio. The name big values refer to the information content: when we are done decoding the regions will contain integers in the range –8206 to 8206.

These three big values regions are coded with three different Huffman tables, defined in the MP3 standard. The standard defines 15 large tables for these regions, where each table outputs two frequency samples for a given code word. The tables are designed to compress the “typical” content of the frequency regions as much as possible.

To further increase compression, the 15 tables are paired with another parameter for a total of 29 different ways each of the three regions can be compressed. The side information contains information which of the 29 possibilities to use. Somewhat confusingly, the standard calls these possibilities “tables”. We will call them table pairs instead.

As an example, here is Huffman code table 1 (table1), as defined in the standard:

Code word	Value
`1`	(0, 0)
`001`	(0, 1)
`01`	(1, 0)
`000`	(1, 1)

And here is table pair 1: (table1, 0).

To decode a big values region using table pair 1, we proceed as follows: Say the chunk contains the following bits: 000101010... First we decode the bits as we usually decode Huffman coded strings: The three bits 000 correspond to the two output samples 1 and 1, we call them x and y.

Here’s where it gets interesting: The largest code table defined in the standard has samples no larger than 15. This is enough to represent most signals satisfactory, but sometimes a larger value is required. The second value in the table pair is known as the linbits (for some reason), and whenever we have found an output sample that is the maximum value (15) we read linbits number of bits, and add them to the sample. For table pair 1, the linbits is 0, and the maximum sample value is never 15, so we ignore it in this case. For some samples, linbits may be as large as 13, so the maximum value is 15+8191.

When we have read linbits for sample x, we get the sign. If x is not 0, we read one bit. This determines of the sample is positive or negative.

All in all, the two samples are decoded in these steps:

Decode the first bits using the Huffman table. Call the samples x and y.
If x = 15 and linbits is not 0, get linbits bits and add to x. x is now at most 8206.
If x is not 0, get one bit. If 1, then x is –x.
Do step 2 and 3 for y.

The count1 region codes the frequencies that are so high they have been compressed tightly, and when decoded we have samples in the range –1 to 1. There are only two possible tables for this region; these are known as the quad tables as each code word corresponds to 4 output samples. There are no linbits for the count1 region, so decoding is only a matter of using the appropriate table and get the sign bits.

Decode the first bits using the Huffman table. Call the samples v, w, x and y.
If v is not 0, get one bit. If 1, then v is –v.
Do step 2 for w, x and y.

Step 1, summary

Unpacking an MP3 bit stream is very tedious, and is without doubt the decoding step that requires the most lines of code. The Huffman tables alone are a good 70 kilobytes, and all the parsing and unpacking requires a few hundred lines of code too.

The Huffman coding is undoubtedly one of the most important features of MP3 though. For a 500-byte logical frame with two channels, the output is 4x576 samples (1152 per channel) with a range of almost 15 bits, and that is even before we’ve done any transformations on the output samples. Without the Huffman coding, a logical frame would require up to 4-4½ kilobytes of storage, about an eight-fold increase in size.

All the unpacking is done by Unpack.hs, which exports two functions, mp3Seek and mp3Unpack. The latter is a simple helper function that combines mp3UnpackFrame and mp3ParseMainData. It looks like this:

mp3Unpack :: MP3Bitstream -> (MP3Bitstream, Maybe MP3Data)

Decoding, step 2: Re-quantization

Having successfully unpacked a frame, we now have a data structure containing audio to be processed further, and parameters how this should be done. Here are our types, what we got from mp3Unpack:

data MP3Data = MP3Data1Channels SampleRate ChannelMode (Bool, Bool) 
                                MP3DataChunk MP3DataChunk
             | MP3Data2Channels SampleRate ChannelMode (Bool, Bool) 
                                MP3DataChunk MP3DataChunk 
                                MP3DataChunk MP3DataChunk

data MP3DataChunk = MP3DataChunk {
    chunkBlockType    :: Int,
    chunkBlockFlag    :: BlockFlag,
    chunkScaleGain    :: Double,
    chunkScaleSubGain :: (Double, Double, Double),
    chunkScaleLong    :: [Double],
    chunkScaleShort   :: [[Double]],
    chunkISParam      :: ([Int], [[Int]]),
    chunkData         :: [Int]
}

MP3Data is simply an unpacked and parsed logical frame. It contains some useful information, first is the sample rate, second is the channel mode, third are the stereo modes (more about them later). Then are the two-four data chunks, decoded separately. What the values stored in an MP3DataChunk represent will be described soon. For now it’s enough to know chunkData store the (at most) 576 frequency domain samples. An MP3DataChunk is also known as a granule, however to avoid confusion we are not going to use this term until later in the article.

Re-quantization

We have already done one of the key steps of decoding an MP3: decoding the Huffman data. We will now do the second key step – re-quantization.

As hinted in the chapter on human hearing, the heart of MP3 compression is quantization. Quantization is simply the approximation of a large range of values with a smaller set of values i.e. using fewer bits. For example if you take an analog audio signal and sample it at discrete intervals of time you get a discrete signal – a list of samples. As the analog signal is continuous, these samples will be real values. If we quantize the samples, say approximate each real valued sample with an integer between –32767 and +32767, we end up with a digital signal – discrete in both dimensions.

Quantization can be used as a form of lossy compression. For 16 bit PCM each sample in the signal can take on one of 2¹⁶ values. If we instead approximate each sample in the range –16383 to +16383, we lose information but save 1 bit per sample. The difference between the original value and the quantized value is known as the quantization error, and this results in noise. The difference between a real valued sample and a 16-bit sample is so small it’s inaudible for most purposes, but if we remove too much information from the sample, the difference between the original will soon be audible.

Let’s stop for a moment and think about where this noise comes from. This requires a mathematical insight, due to Fourier: all continuous signals can be created by adding sinusoids together – even the square wave! This means that if we take a pure sine wave, say at 440 Hz, and quantize it, the quantization error will manifest itself as new frequency components in the signal. This makes sense – the quantized sine is not really a pure sine, so there must be something else in the signal. These new frequencies will be all over the spectra, and is noise. If the quantization error is small, the magnitude of the noise will be small.

And this is where we can thank evolution our ear is not perfect: If there’s a strong signal within a critical band, the noise due to quantization errors will be masked, up to the threshold. The encoder can thus throw away as much information as possible from the samples within the critical band, up to the point were discarding more information would result in noise passing the audible threshold. This is the key insight of lossy audio encoding.

Quantization methods can be written as mathematical expressions. Say we have a real valued sample in the range –1 to 1. To quantize this value to a form suitable for a 16 bit WAV file, we multiply the sample with 32727 and throw away the fractional part: q = floor(s * 32767) or equivalently in a form many programmers are familiar with: (short)(s * 32767.0). Re-quantization in this simple case is a division, where the difference between the re-quantized sample and the original is the quantization error.

Re-quantization in MP3

After we unpacked the MP3 bit stream and Huffman decoded the frequency samples in a chunk, we ended up with quantized frequency samples between –8206 and 8206. Now it’s time to re-quantize these samples to real values (floats), like when we take a 16-bit PCM sample and turn it to a float. When we’re done we have a sample in the range –1 to 1, much smaller than 8206. However our new sample has a much higher resolution, thanks to the information the encoder left in the frame how the sample should be reconstructed.

The MP3 encoder uses a non-linear quantizer, meaning the difference between consecutive re-quantized values is not constant. This is because low amplitude signals are more sensitive to noise, and thus require more bits than stronger signals – think of it as using more bits for small values, and fewer bits for large values. To achieve this non-linearity, the different scaling quantities are non-linear.

The encoder will first raise all samples by 3/4, that is newsample = oldsample^3/4. The purpose is, according to the literature, to make the signal-to-noise ratio more consistent. We will gloss over the why’s and how’s here, and just raise all samples by 4/3 to restore the samples to their original value.

All 576 samples are then scaled by a quantity simply known as the gain, or the global gain because all samples are affected. This is chunkScaleGain, and it’s also a non-linear value.

This far, we haven’t done anything really unusual. We have taken a value, at most 8206, and scaled it with a variable quantity. This is not that much different from a 16 bit PCM WAV, where we take a value, at most 32767, and scale it with the fixed quantity 1/32767. Now things will get more interesting.

Some frequency regions, partitioned into several scale factor bands, are further scaled individually. This is what the scale factors are for: the frequencies in the first scale factor band are all multiplied by the first scale factor, etc. The bands are designed to approximate the critical bands. Here’s an illustration of the scale factor bandwidths for a 44100 Hz MP3. The astute reader may notice there are 22 bands, but only 21 scale factors. This is a design limitation that affects the very high frequencies.

The reason these bands are scaled individually is to better control quantization noise. If there’s a strong signal in one band, it will mask the noise in this band but not others. The values within a scale factor band are thus quantized independently from other bands by the encoder, depending on the masking effects.

Because of reasons that will hopefully be made more clear shortly, a chunk can be scaled in three different ways.

For one type of chunk – called “long” – we scale the 576 frequencies by the global gain and the 21 scale factors (chunkScaleLong), and leave it at that.

For another type of chunk – called “short” – the 576 samples are really three interleaved sets of 192 frequency samples. Don’t worry if this doesn’t make any sense now, we will talk about it soon. In this case, the scale factor bands look slightly different than in the illustration above, to accommodate the reduced bandwidths of the scale factor bands. Also, the scale factors are not 21 numbers, but sets of three numbers (chunkScaleShort). An additional parameter, chunkScaleSubGain, further scales the individual three sets of samples.

The third type of chunk is a mix of the above two.

When we have multiplied each sample with the corresponding scale factor and other gains, we are left with a high precision floating point representation of the frequency domain, where each sample is in the range –1 to 1.

Here’s some code, that uses almost all values in a MP3DataChunk. The three different scaling methods are controlled by the BlockFlag. There will be plenty more information about the block flag later in this article.

mp3Requantize :: SampleRate -> MP3DataChunk -> [Frequency]
mp3Requantize samplerate (MP3DataChunk bt bf gain (sg0, sg1, sg2) 
                         longsf shortsf _ compressed)
    | bf == LongBlocks  = long
    | bf == ShortBlocks = short
    | bf == MixedBlocks = take 36 long ++ drop 36 short
    where 
        long  = zipWith procLong  compressed longbands
        short = zipWith procShort compressed shortbands

        procLong sample sfb = 
            let localgain   = longsf !! sfb
                dsample     = fromIntegral sample
            in gain * localgain * dsample **^ (4/3)

        procShort sample (sfb, win) =
            let localgain = (shortsf !! sfb) !! win
                blockgain = case win of 0 -> sg0
                                        1 -> sg1
                                        2 -> sg2
                dsample   = fromIntegral sample
            in gain * localgain * blockgain * dsample **^ (4/3)
                                
        -- Frequency index (0-575) to scale factor band index (0-21).
        longbands = tableScaleBandIndexLong samplerate
        -- Frequency index to scale factor band index and window index (0-2).
        shortbands = tableScaleBandIndexShort samplerate

A fair warning: This presentation of the MP3 re-quantization step differs somewhat from the official specification. The specification presents the quantization as a long formula based on integer quantities. This decoder instead treats these integer quantities as floating point representations of non-linear quantities, so the re-quantization can be expressed as an intuitive series of multiplications. The end result is the same, but the intention is hopefully clearer.

Minor step: Reordering

Before quantizing the frequency samples, the encoder will in certain cases reorder the samples in a predefined way. We have already encountered this above: after the reordering by the encoder the “short” chunks with three small chunks of 192 samples each are combined to 576 samples ordered by frequency (sort of). This is to improve the efficiency of the Huffman coding, as the method with big values and different tables assume the lower frequencies are first in the list.

When we’re done re-quantizing in our decoder, we will reorder the “short” samples back to their original position. After this reordering, the samples in these chunks are no longer ordered by frequency. This is slightly confusing, so unless you are really interested in MP3 you can ignore this and concentrate on the “long” chunks, which have very few surprises.

Decoding, step 3: Joint Stereo

MP3 supports four different channel modes. Mono means the audio has a single channel. Stereo means the audio has two channels. Dual channel is identical to stereo for decoding purposes – it’s intended as information for the media player in case the two channels contain different audio, such as an audio book in two languages.

Then there’s joint stereo. This is like the regular stereo mode, but with some extra compression steps taking similarities between the two channels into account. This makes sense, especially for stereo music where there’s usually a very high correlation between the two channels. By removing some redundancy, the audio quality can be much higher for a given bit rate.

MP3 supports two joint stereo modes known as middle/side stereo (MS) and intensity stereo (IS). Whether these modes are in use is given by the (Bool, Bool) tuple in the MP3Data type. Additionally chunkISParam stores parameter used by IS mode.

MS stereo is very simple: instead of encoding two similar channels verbatim, the encoder computes the sum and the difference of the two channels before encoding. The information content in the “side” channel (difference) will be less than the “middle” channel (sum), and the encoder can use more bits for the middle channel for a better result. MS stereo is lossless, and is a very common mode that’s often used in joint stereo MP3:s. Decoding MS stereo is very cute:

mp3StereoMS :: [Frequency] -> [Frequency] -> ([Frequency], [Frequency])
mp3StereoMS middle side =
    let sqrtinv = 1 / (sqrt 2)
        left  = zipWith0 (\x y -> (x+y)*sqrtinv) 0.0 middle side
        right = zipWith0 (\x y -> (x-y)*sqrtinv) 0.0 middle side
    in (left, right)

The only oddity here is the division by the square root of 2 instead of simply 2. This is to scale down the channels for more efficient quantization by the encoder.

A more unusual stereo mode is known as intensity stereo, or IS for short. We will ignore IS stereo in this article.

Having done the stereo decoding, the only thing remaining is taking the frequency samples back to the time domain. This is the part heavy on theory.

Decoding, step 4: Frequency to time

At this point the only remaining MP3DataChunk values we will use are chunkBlockFlag and chunkBlockType. These are the sole two parameters that dictate how we’re going to transform our frequency domain samples to the time domain. To understand the block flag and block type we have to familiarize ourselves with some transforms, as well as one part of the encoder.

The encoder: filter banks and transforms

The input to an encoder is probably a time domain PCM WAV file, as one usually gets when ripping an audio CD. The encoder takes 576 time samples, from here on called a granule, and encodes two of these granules to a frame. For an input source with two channels, two granules per channel are stored in the frame. The encoder also saves information how the audio was compressed in the frame. This is the MP3Data type in our decoder.

The time domain samples are transformed to the frequency domain in several steps, one granule a time.

Analysis filter bank

First the 576 samples are fed to a set of 32 band pass filters, where each band pass filter outputs 18 time domain samples representing 1/32:th of the frequency spectra of the input signal. If the sample rate is 44100 Hz each band will be approximately 689 Hz wide (22050/32 Hz). Note that there’s downsampling going on here: Common band pass filters will output 576 output samples for 576 input samples, however the MP3 filters also reduce the number of samples by 32, so the combined output of all 32 filters is the same as the number of inputs.

This part of the encoder is known as the analysis filter bank (throw in the word polyphase for good measure), and it’s a part of the encoder common to all the MPEG-1 layers. Our decoder will do the reverse at the very end of the decoding process, combining the subbands to the original signal. The reverse is known as the synthesis filter bank. These two filter banks are simple conceptually, but real mammoths mathematically – at least the synthesis filter bank. We will treat them as black boxes.

MDCT

The output of each band pass filter is further transformed by the MDCT, the modified discrete cosine transform. This transform is just a method of transforming the time domain samples to the frequency domain. Layer 1 and 2 does not use this MDCT, but it was added on top of the filter bank for layer 3 as a finer frequency resolution than 689 Hz (given 44.1 KHz sample rate) proved to give better compression. This makes sense: simply dividing the whole frequency spectra in fixed size blocks means the decoder has to take several critical bands into account when quantizing the signal, which results in a worse compression ratio.

The MDCT takes a signal and represents it as a sum of cosine waves, turning it to the frequency domain. Compared to the DFT/FFT and other well-known transforms, the MDCT has a few properties that make it very suited for audio compression.

First of all, the MDCT has the energy compaction property common to several of the other discrete cosine transforms. This means most of the information in the signal is concentrated to a few output samples with high energy. If you take an input sequence, do an (M)DCT transform on it, set the “small” output values to 0, then do the inverse transform – the result is a fairly small change in the original input. This property is of course very useful for compression, and thus different cosine transforms are used by not only MP3 and audio compression in general but also JPEG and video coding techniques.

Secondly, the MDCT is designed to be performed on consecutive blocks of data, so it has smaller discrepancies at block boundaries compared to other transforms. This also makes it very suited for audio, as we’re almost always working with really long signals.

Technically, the MDCT is a so-called lapped transform, which means we use input samples from the previous input data when we work with the current input data. The input is 2N time samples and the output is N frequency samples. Instead of transforming 2N length blocks separately, consecutive blocks are overlapped. This overlapping helps reducing artifacts at block boundaries. First we perform the MDCT on say samples 0-35 (inclusive), then 18-53, then 36-71… To smoothen the boundaries between consecutive blocks, the MDCT is usually combined with a windowing function that is performed prior to the transform. A windowing function is simply a sequence of values that are zero outside some region, and often between 0 and 1 within the region, that are to be multiplied with another sequence. For the MDCT smooth, arc-like window functions are usually used, which makes the boundaries of the input block go smoothly to zero at the edges.

In the case of MP3, the MDCT is done on the subbands from the analysis filter bank. In order to get all the nice properties of the MDCT, the transform is not done on the 18 samples directly, but on a windowed signal formed by the concatenation of the 18 previous and the current samples. This is illustrated in the picture below, showing two consecutive granules (MP3DataChunk) in an audio channel. Remember: we are looking at the encoder here, the decoder works in reverse. This illustration shows the MDCT of the 0-679 Hz band.

The MDCT can either be applied to the 36 samples as described above, or three MDCT:s are done on 12 samples each – in either case the output is 18 frequency samples. The first choice, known as the long method, gives us greater frequency resolution. The second choice, known as the short method, gives us greater time resolution. The encoder selects the long MDCT to get better audio quality when the signal changes very little, and it selects short when there’s lots going on, that is for transients.

For the whole granule of 576 samples, the encoder can either do the long MDCT on all 32 subbands – this is the long block mode, or it can do the short MDCT in all subbands – this is the short block mode. There’s a third choice, known as the mixed block mode. In this case the encoder uses the long MDCT on the first two subbands, and the short MDCT on the remaining. The mixed block mode is a compromise: it’s used when time resolution is necessary, but using the short block mode would result in artifacts. The lowest frequencies are thus treated as long blocks, where the ear is most sensitive to frequency inaccuracies. Notice that the boundaries of the mixed block mode is fixed: the first two, and only two, subbands use the long MDCT. This is considered a design limitation of MP3: sometimes it’d be useful to have high frequency resolution in more than two subbands. In practice, many encoders do not support mixed blocks.

We discussed the block modes briefly in the chapter on re-quantization and reordering, and hopefully that part will make a little more sense knowing what’s going on inside the encoder. The 576 samples in a short granule are really 3x 192 small granules, but stored in such a way the facilities for compressing a long granule can be used.

The combination of the analysis filter bank and the MDCT is known as the hybrid filter bank, and it’s a very confusing part of the decoder. The analysis filter bank is used by all MPEG-1 layers, but as the frequency bands does not reflect the critical bands, layer 3 added the MDCT on top of the analysis filter bank. One of the features of AAC is a simpler method to transform the time domain samples to the frequency domain, which only use the MDCT, not bothering with the band pass filters.

The decoder

Digesting this information about the encoder leads to a startling realization: we can’t actually decode granules, or frames, independently! Due to the overlapping nature of the MDCT we need the inverse-MDCT output of the previous granule to decode the current granule.

This is where chunkBlockType and chunkBlockFlag are used. If chunkBlockFlag is set to the value LongBlocks, the encoder used a single 36-point MDCT for all 32 subbands (from the filter bank), with overlapping from the previous granule. If the value is ShortBlocks instead, three shorter 12-point MDCT:s were used. chunkBlockFlag can also be MixedBlocks. In this case the two lower frequency subbands from the filter bank are treated as LongBlocks, and the rest as ShortBlocks.

The value chunkBlockType is an integer, either 0,1,2 or 3. This decides which window is used. These window functions are pretty straightforward and similar, one is for the long blocks, one is for the three short blocks, and the two others are used exactly before and after a transition between a long and short block.

Before we do the inverse MDCT, we have to take some deficiencies of the encoder’s analysis filter bank into account. The downsampling in the filter bank introduces some aliasing (where signals are indistinguishable from other signals), but in such a way the synthesis filter bank cancels the aliasing. After the MDCT, the encoder will remove some of this aliasing. This, of course, means we have to undo this alias reduction in our decoder, prior the IMDCT. Otherwise the alias cancellation property of the synthesis filter bank will not work.

When we’ve dealt with the aliasing, we can IMDCT and then window, remembering to overlap with the output from the previous granule. For short blocks, the three small individual IMDCT inputs are overlapped directly, and this result is then treated as a long block.

The word “overlap” requires some clarifications in the context of the inverse transform. When we speak of the MDCT, a function from 2N inputs to N outputs, this just means we use half the previous samples as inputs to the function. If we’ve just MDCT:ed 36 input samples from offset 0 in a long sequence, we then MDCT 36 new samples from offset 18.

When we speak of the IMDCT, a function from N inputs to 2N outputs, there’s an addition step needed to reconstruct the original sequence. We do the IMDCT on the first 18 samples from the output sequence above. This gives us 36 samples. Output 18..35 are added, element wise, to output 0..17 of the IMDCT output of the next 18 samples. Here’s an illustration.

With that out of the way, here’s some code:

mp3IMDCT :: BlockFlag -> Int -> [Frequency] -> [Sample] -> ([Sample], [Sample])
mp3IMDCT blockflag blocktype freq overlap =
    let (samples, overlap') = 
            case blockflag of
                 LongBlocks  -> transf (doImdctLong blocktype) freq
                 ShortBlocks -> transf (doImdctShort) freq
                 MixedBlocks -> transf (doImdctLong 0)  (take 36 freq) <++>
                                transf (doImdctShort) (drop 36 freq)
        samples' = zipWith (+) samples overlap
    in (samples', overlap')
    where
        transf imdctfunc input = unzipConcat $ mapBlock 18 toSO input
            where
                -- toSO takes 18 input samples b and computes 36 time samples
                -- by the IMDCT. These are further divided into two equal
                -- parts (S, O) where S are time samples for this frame
                -- and O are values to be overlapped in the next frame.
                toSO b = splitAt 18 (imdctfunc b)
                unzipConcat xs = let (a, b) = unzip xs
                                 in (concat a, concat b)

doImdctLong :: Int -> [Frequency] -> [Sample]
doImdctLong blocktype f = imdct 18 f `windowWith` tableImdctWindow blocktype

doImdctShort :: [Frequency] -> [Sample]
doImdctShort f = overlap3 shorta shortb shortc
  where
    (f1, f2, f3) = splitAt2 6 f
    shorta       = imdct 6 f1 `windowWith` tableImdctWindow 2
    shortb       = imdct 6 f2 `windowWith` tableImdctWindow 2
    shortc       = imdct 6 f3 `windowWith` tableImdctWindow 2
    overlap3 a b c = 
      p1 ++ (zipWith3 add3 (a ++ p2) (p1 ++ b ++ p1) (p2 ++ c)) ++ p1
      where
        add3 x y z = x+y+z
        p1         = [0,0,0, 0,0,0]
        p2         = [0,0,0, 0,0,0, 0,0,0, 0,0,0]

Before we pass the time domain signal to the synthesis filter bank, there’s one final step. Some subbands from the analysis filter bank have inverted frequency spectra, which the encoder corrects. We have to undo this, as with the alias reduction.

Here are the steps required for taking our frequency samples back to time:

[Frequency] Undo the alias reduction, taking the block flag into account.
[Frequency] Perform the IMDCT, taking the block flag into account.
[Time] Invert the frequency spectra for some bands.
[Time] Synthesis filter bank.

A typical MP3 decoder will spend most of its time in the synthesis filter bank – it is by far the most computationally heavy part of the decoder. In our decoder, we will use the (slow) implementation from the specification. Typical real world decoders, such as the one in your favorite media player, use a highly optimized version of the filter bank using a transform in a clever way. We will not delve in this optimization technique further.

Step 4, summary

It’s easy to miss the forest for the trees, but we have to remember this decoding step is conceptually simple; it’s just messy in MP3 because the designers reused parts from layer 1, which makes the boundaries between time domain, frequency domain and granule less clear.

Using the decoder

Using the decoder is a matter of creating a bit stream, initializing it (mp3Seek), unpacking it to an MP3Data (mp3Unpack) and then decoding the MP3Data with mp3Decode. The decoder does not use any advanced Haskell concepts externally, such as state monads, so hopefully the language will not get in the way of the audio.

module Codec.Audio.MP3.Decoder (
    mp3Seek
   ,mp3Unpack
   ,MP3Bitstream(..)
   ,mp3Decode
   ,MP3DecodeState(..)
   ,emptyMP3DecodeState
) where
...

mp3Decode :: MP3DecodeState -> MP3Data -> (MP3DecodeState, [Sample], [Sample])

data MP3DecodeState = ...

emptyMP3DecodeState :: MP3DecodeState
emptyMP3DecodeState = ...

The code is tested with a new version of GHC. The decoder requires binary-strict, which can be found at Hackage. See README in the code for build instructions. Please note that the software is currently version 0.0.1 – it’s very, very slow, and has some missing features.

Code: mp3decoder-0.0.1.tar.gz.

Conclusion

MP3 has its peculiarities, especially the hybrid filter bank, but it’s still a nice codec with a firm grounding in psychoacoustic principles. Not standardizing the encoder was a good choice by the MPEG-1 team, and the available encoders show it’s possible to compress audio satisfactory within the constraints set by the decoder.

If you decide to play around with the source code, be sure to set your sound card to a low volume if you use headphones! Removing parts of the decoder may result in noise. Have fun.

References

CD 11172-3 Part 3 (the specification)

David Salomon, Data Compression: The Complete Reference, 3rd ed.

Davis Pan, A Tutorial on MPEG/Audio Compression

Rassol Raissi, The Theory Behind Mp3

The source code to libmad, LAME and 8Hz-mp3.

Speeding up Haskell with C – a very short introduction

2008-09-03T20:50:00.018+02:00

You are working on your project in Haskell. You notice the performance is not as great as it could be. Here’s what you can do, apart from writing more efficient Haskell code.

Profiling

First, of course, you have to profile your code to know what’s really slow and not. Using GHC, compile your program with -prof -auto-all.

$ ghc -O2 --make Project.hs -prof -auto-all

Doesn’t work? Then you probably have to recompile the extra libraries you’re using (perhaps from Hackage). These libraries most certainly use the Cabal build system. Pass --enable-library-profiling to configure then build and install as usual.

$ cd /to/the/library/source
$ runghc Setup.hs configure --enable-library-profiling
$ runghc Setup.hs build
$ runghc Setup.hs install

Don’t worry, this will not mess up the current installation, it will just add a special profiling version of the library next to the regular version.

You can now test your code.

$ Project +RTS -p

The GHC profiler is competent and can profile both time and space. With -p, we’re profiling time.

The output is a file, Project.prof. It has lots of interesting stuff in it, most importantly a list with the functions that take the most time. In our fictional example, the profiler tells us the program spends most of its time in the function dctiv, the DCT-IV transform.

dctiv :: Int -> [Double] -> [Double]
dctiv points input = ...

This is probably a typical function you want to write in a low level language. It is used a lot, and is basically number crunching. Of course, a Haskell wizard can probably whip up a DCT-IV with performance very close to C. So we use the DCT-IV only because it's simple to implement, not because it is a domain C solves better than Haskell.

Let’s write C!

Most modern programming languages have facilities for invoking code written in other languages, and so does Haskell. But we worry about that later; let’s write a small C library first.

dctiv.h

#ifndef DCTIV_H
#define DCTIV_H
void dctiv(int points, double *in, double *out);
#endif

dctiv.c

void dctiv(int points, double *in, double *out)
{
    /* What this does is not really important, other than it doesn't
       do side effects, and writes the output to out. */
    int k, n;
    double sum;

    for (k = 0; k < points; k++) {
        sum = 0.0;
        for (n = 0; n < points; n++) 
            sum += in[n] * cos((PI/points) * (n + 0.5) * (k + 0.5));
        out[k] = sum;
    }
}

Build an object file:

$ gcc -O2 -c dctiv.c

It’s probably a good idea to write a small test program and see if the little C library behaves as expected.

$ gcc -o testprogram test.c dctiv.o

Let’s write Haskell! With C!

To use our C code from Haskell we’re going to use the FFI – the Foreign Function Interface. This is an addendum to the Haskell 98 standard. Using FFI is quite pleasant, especially if we begin with pure Haskell code that we later port to C. This means we can design the C code with purity in mind.

To use FFI, we have to tell the compiler via a pragma at the top of the source code. We’ll also use a bunch of libraries with functions useful for calling foreign code. The top of the program will look like this:

> {-# LANGUAGE ForeignFunctionInterface #-}

> import Foreign.C.Types 
> import Foreign.Ptr 
> import Foreign.Marshal.Array 
> import System.IO.Unsafe

Foreign.C.Types and Foreign.Ptr have functions for working with primitive C types and pointers. Foreign.Marshal.Array is for converting between Haskell types and C arrays. System.IO.Unsafe is explained further below.

Next we’re going to import a function from the compiled C library. We can’t use the regular import keyword, instead we use foreign import, and tell Haskell which library we’re interested in, the function we want to use, and what the function looks like:

> foreign import ccall "dctiv.h dctiv"
>     c_dctiv :: CInt -> 
>                Ptr CDouble -> 
>                Ptr CDouble -> 
>                IO ()

Syntax aside the only weird stuff here is the last return value. Our C function returns void, so using () makes sense. But what about the IO? The code will explicitly change regions of memory, and passing two identical pointers may not mean the contents of the pointers are the same.

The function c_dctiv is low level, and it doesn’t look like the pure dctiv function we’re replacing at all! That’s what we’re going to fix now.

The purpose of the Foreign.Marshal.Array library is to convert between Haskell types and C arrays. We’re going to use three functions: withArray takes a Haskell list and writes it to memory, so it can be passed to C functions. allocaArray temporarily allocates memory our C-code can write to. peekArray takes some memory and returns a Haskell list. Here’s an intermediate function that calls c_dctiv for us.

> dctivIO :: Int -> [CDouble] -> IO [CDouble] -- 1
> dctivIO points input -- 2
>     = withArray   input  $ \cinput -> -- 3
>       allocaArray points $ \coutput -> -- 4
>       do c_dctiv (fromIntegral points) cinput coutput -- 5
>          peekArray points coutput -- 6

This function is short, and you can intuitively guess what it does by looking at what’s passed to the C function. The Marshal.Array functions have funky types though, and the code looks slightly arcane. Lets look at the type of withArray as an example to see what's going on:

withArray :: Storable a => [a] -> (Ptr a -> IO b) -> IO b

The Storable type class is meant for types that can be written to memory. All C types are instances of Storable, which makes sense. withArray takes a Haskell list [a], writes it into memory, and then passes the pointer to this memory to a function which is executed. allocaArray too does something with memory and passes it to another function, and these functions are chained together until we're done with the foreign code.

In English, what dctivIO does is the following:

At line 3, withArray takes the CDouble list as input and writes it to memory. It passes the memory location to a new function, that extends all the way down line 6. In this function, the memory location is bound to cinput. From now on we can treat cinput as a double*.

At line 4, allocaArray allocates an array of points elements, which will hold the output. The newly allocated array is passed to yet another function, where it gets bound to coutput when executed. allocaArray will free the memory when it’s done.

The last two lines form an IO block. At line 5 we run our C function c_dctiv, not forgetting to convert our Int to a CInt. The output of c_dctiv will be written to coutput. The last line of the function simply returns a Haskell list (in IO) from the array coutput of length points.

dctivIO is a much more useful and Haskell-like function than the raw c_dctiv, but it’s still not good enough. We want something of type Int -> [Double] -> [Double], not this hack with C-types and IO.

Despite the detail that memory is allocated within withArray, the C function dctiv is really pure. It doesn’t write to files, change state or anything; it does the exact same thing like our pure Haskell version. withArray et al doesn’t know this though, and that’s why we had to work in IO.

As we know dctiv doesn’t really have any side effects, we can purify it using unsafePerformIO from System.IO.Unsafe, which has the magical type IO a -> a. At the same time we add some code to convert between CDouble and Double.

> dctiv :: Int -> [Double] -> [Double]
> dctiv points input = 
>     let cinput  = map realToFrac input
>         coutput = unsafePerformIO (dctivIO points cinput)
>         output  = map realToFrac coutput
>     in output

And we’re done. As the Marshal.Array-functions have to consume the input lists when they are written to memory, converting from the two Double types do no significant harm.

To compile this code, GHC has to know about the C library.

$ ghc -O2 dctiv.o --make  Project.hs

You can even test your code in GHCI, like this:

$ ghci Project.hs dctiv.o

Conclusion

The speed improvement of writing parts of a program in a low level language will of course depend on the task, and how good you are at writing efficient Haskell (and C). For some applications the speed up can be substantial, and learning the basics of FFI can be a good investment.

For more information about FFI and Profiling, here are some more in-depth resources:

The excellent Real World Haskell has chapters on both profiling and the FFI.
HaskellWiki has several articles on FFI and profiling.
There's also the venerable documentation.

Lexicographic permutations using Algorithm L (STL next_permutation in Python)

2008-04-03T04:00:00.020+02:00

One of the more useful functions in the C++ Standard Library is next_permutation in <algorithm>. The STL function has a desirable property that almost every other permutation generating functions I’ve seen lack, namely lexicographic awareness of the elements being permuted.

A typical function will, given a sequence of elements such as (1, 1, 2, 2), permute on indices only. This will in our case give 4! permutations, which is often not what we want. The STL implementation will “correctly” generate only unique permutations, in our case 4! / 2!2!, and also generate them in the right order.

What’s special about most STL implementations is the use of a fairly unknown algorithm for finding permutations in lexicographic order. A canonical templated implementation is usually about 25 lines of code. It is also non-recursive and very fast.

Here’s a Python implementation of next_permutation with user-defined comparison. Use freely.

def next_permutation(seq, pred=cmp):
    """Like C++ std::next_permutation() but implemented as
    generator. Yields copies of seq."""

    def reverse(seq, start, end):
        # seq = seq[:start] + reversed(seq[start:end]) + \
        #       seq[end:]
        end -= 1
        if end <= start:
            return
        while True:
            seq[start], seq[end] = seq[end], seq[start]
            if start == end or start+1 == end:
                return
            start += 1
            end -= 1
    
    if not seq:
        raise StopIteration

    try:
        seq[0]
    except TypeError:
        raise TypeError("seq must allow random access.")

    first = 0
    last = len(seq)
    seq = seq[:]

    # Yield input sequence as the STL version is often
    # used inside do {} while.
    yield seq
    
    if last == 1:
        raise StopIteration

    while True:
        next = last - 1

        while True:
            # Step 1.
            next1 = next
            next -= 1
            
            if pred(seq[next], seq[next1]) < 0:
                # Step 2.
                mid = last - 1
                while not (pred(seq[next], seq[mid]) < 0):
                    mid -= 1
                seq[next], seq[mid] = seq[mid], seq[next]
                
                # Step 3.
                reverse(seq, next1, last)

                # Change to yield references to get rid of
                # (at worst) |seq|! copy operations.
                yield seq[:]
                break
            if next == first:
                raise StopIteration
    raise StopIteration

Example:

>>> for p in next_permutation([1, 1, 2, 2]):
 print p,

 
[1, 1, 2, 2] [1, 2, 1, 2] [1, 2, 2, 1] [2, 1, 1, 2] [2, 1, 2, 1] [2, 2, 1, 1]

An important question is: how does the code work, and why? Not surprisingly, the man with the answers is Donald Knuth. This algorithm doesn’t appear to have a proper name, so Knuth simply calls it Algorithm L. This algorithm is described in The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations. Algorithm L works like this, using a slightly different explanation than Knuth that I hope will be easier to understand.

Algorithm L: Given a sequence of n elements a₀, a₁, …, a_n-1 generate all permutations of the sequence in lexicographically correct order.

Step 1 (Step L2 in Knuth): Partition the sequence into two sequences a₀, a₁, …, a_j and a_j+1, a_j+2, …, a_n-1 such that we have already generated all permutations beginning with a₀, a₁, …, a_j. This can by done by decreasing j from n-2 until a_j < a_j+1. If j = 0 we are done.

For example, the input sequence 1, 4, 3, 2 is split into the sequence 1 and the sequence 4, 3, 2. Obviously, there are no more lexicographic permutations beginning with 1 when the second sequence is in decreasing order.

Step 2a (Step L3 in Knuth): In the second sequence a_j+1, a_j+2, …, a_n-1 working backwards, find a_m, the first value larger than a_j. We find a_m by setting m to n-1 and decreasing until a_j < a_m.

Step 2b: Swap a_j and a_m.

For example, our two sequences are 1 and 4, 3, 2. As the second sequence is decreasing because of the first step, a_m is the smallest element greater than a_j that can legitimately follow a₀, a₁, …, a_j-1 in a permutation.

Step 3 (Step L4 in Knuth): Reverse a_j+1, a_j+2, …, a_n-1.

Here are some examples of the steps on (1, 2, 3, 4) that should clarify step 2a, 2b and 3:

(1, 2, 3, 4) >> (1, 2, 3), (4) >> (1, 2, 4), (3) >> (1, 2, 4), (3) >> (1, 2, 4, 3)
(1, 2, 4, 3) >> (1, 2), (4, 3) >> (1, 3), (4, 2) >> (1, 3), (2, 4) >> (1, 3, 2, 4)
(1, 3, 2, 4) >> (1, 3, 2), (4) >> (1, 3, 4), (2) >> (1, 3, 4), (2) >> (1, 3, 4, 2)
(1, 3, 4, 2) >> (1, 3), (4, 2) >> (1, 4), (3, 2) >> (1, 4), (2, 3) >> (1, 4, 2, 3)

Here are some examples on the sequence (1, 2, 2, 3):

(1, 2, 2, 3) >> (1, 2, 2), (3) >> (1, 2, 3), (2) >> (1, 2, 3), (2) >> (1, 2, 3, 2)
(1, 2, 3, 2) >> (1, 2), (3, 2) >> (1, 3), (2, 2) >> (1, 3), (2, 2) >> (1, 3, 2, 2)
(1, 3, 2, 2) >> (1), (3, 2, 2) >> (2), (3, 2, 1) >> (2), (1, 2, 3) >> (2, 1, 2, 3)

Algorithm L is a fairly simple algorithm and it's also easy to understand and implement. Permutation generation is sometimes used as an interview question because it's difficult to get right even though the underlying problem is easy to grasp. It can thus be useful to know even for those not interested in combinatorics.

TrueCrypt explained (TrueCrypt 5 update)

2008-02-13T00:57:00.001+01:00

TrueCrypt 5 was released a few days ago. Having nothing better to write about, I might as well update the TrueCrypt code to handle TrueCrypt 5 volumes. This is fortunately quite easy to do. We must implement a new cryptographic mode, XTS, and add a new hash algorithm, SHA-512. There are also some minor changes in the code that must be made. Both the LRW mode of operation and the SHA1 hash algorithm are both considered deprecated in TrueCrypt 5, so the code in this post will only handle the new volumes. For information how to read TrueCrypt 4 volumes please read the old post: TrueCrypt Explained. Actually, you should probably read the old post anyway, otherwise this post will be very difficult to understand.

Key strengthening

First of all, let’s add SHA-512. This is already supported out of the box in Python 2.5 thanks to hashlib. If you have Python 2.4 installed, you can use a backported version of the module courtesy of krypto.org. SHA-512 is a stronger hash than SHA1 and more computationally intensive so the PBKDF2 iteration count is 1000 for SHA-512, exactly like Whirlpool. The iteration count for RIPEMD-160 is 2000 as before.

Code: keystrengthening5.py. Requires hashlib, ripemd, whirlpool.

XTS mode

A long-awaited feature of TrueCrypt 5 is system encryption. Encrypting the operating system partition adds an extra layer of security because it gets rid of the problem with traces of secret being data written to the swap file, the registry, temporary files and so on. Writing code to do this doesn’t really have anything to do with cryptography per se, rather it’s a problem of boot loaders, system drivers and so on. But as we all know, cryptography is hard, and there are all sorts of non-obvious problems to think about. One of these problems is related to the LRW-mode of operation, used in TrueCrypt 4.3, combined with system encryption.

There exist a weakness in the LRW mode that makes it unsuitable for system encryption. In this case, “weakness” and “unsuitable” are relative terms. The problem is this: if the LRW key itself is written to the volume, it can be derived. And this can happen if the cryptographic software swaps the key to disk, and the software then encrypts the swap file. This is quite frankly not a problem in practice for TrueCrypt, because even if the LRW key is known, it only makes chosen plaintext attacks easier, and these attacks are very difficult to execute. TrueCrypt also prevents keys from being written to disk: once a cipher is initialized with the key it’s overwritten in memory. For more details, see this post at IEEE.

In either case, TrueCrypt now supports the XTS mode of operation and LRW is deprecated. All new volumes created with TrueCrypt 5 will use the XTS mode. This mode has no known weaknesses that I’m aware of. The XTS mode is very similar to LRW, and is quite easy to implement. Before I describe this mode using common cryptographic notation, I will describe how to use this mode. We will write this function:

def XTSDecrypt(cipher1, cipher2, i, n, block):
    pass

As can be seen, the XTS mode uses two ciphers initialized with different keys. These two ciphers use the same algorithm. Note that unlike TrueCrypt 4.3, the cipher parameters to the mode function are primitive cipher algorithms, not chains/cascades. cipher1 and cipher2 can thus only be one of AES, Twofish or Serpent. How cascades are handled will be described later.

This function takes five arguments. cipher1 is the first cipher initialized with it’s key, cipher2 is the second cipher initialized with another key, block is a 16 byte string of ciphertext, and n and i are two integers which will be described soon. Here’s an example how the code is used:

>>> cipher1 = Rijndael("passwordpassword")
>>> cipher2 = Rijndael("wordpasswordpass")
>>> XTSDecrypt(cipher1, cipher2, 0, 0, "\x00"*16)
'\x0e\xc2\xf1\xb2\x1ce5=\xfd\xe1\xe1jA\xdbb\xc7'

n and i are intimately related. The integer n is the dataunit index. A dataunit is a large block of text, cipher text in this case. The integer i is the index of the block within the dataunit. Let’s say we have 2048 bytes of data to decrypt, and the dataunit size is 512. To decrypt the first 16 bytes in the first dataunit, n is 0 and i is 0. If we want to decrypt the 16 bytes from byte 1024 to 1040, n is 2 and i is 0. To decrypt a complete dataunit with index n, we simply iterate over i, and advance the dataunit 16 bytes each time.

In TrueCrypt 5.0, the dataunit size is always 512. Knowing this, we can write a helper function XTSDecryptMany that takes as argument a dataunit block of 512 bytes and the dataunit index, and decrypts the whole block.

def XTSDecryptMany(cipher1, cipher2, n, blocks):
    length = len(blocks)
    assert length % 16 == 0
    data = ''
    for i in xrange(length / 16):
        data += XTSDecrypt(cipher1, cipher2, i, n, blocks[0:16])
        blocks = blocks[16:]
    return data

That said, we can now describe the XTS mode using common cryptographic notation:

C = E_K1(P xor (E_K2(n) mul (a pow i))) xor (E_K2(n) mul (a pow i))

E_K1 and E_K2 are cipher1 and cipher2 respectively, and P is plaintext block and C is ciphertext block. n is the dataunit index and i is the block index within the dataunit. Note that E_K2 will always encrypt. E_K1 will decrypt for XTSDecrypt, but encrypt for XTSEncrypt (we won’t write this function, but it’s trivial to change the code).

And now for the math part. As with LRW mode, mul is multiplication in GF(2¹²⁸). XTS mode also has another finite field operator: pow is exponentiation in GF(2¹²⁸). This is not really an operator, just repeated multiplication. Finally a is simply the polynomial x, that is the number 0x2 because we represent our polynomials as bits. If this doesn’t make any sense, see the TrueCrypt Explained post.

If you remember the previous discussion, the dataunit size is 512. With a block size of 16, i will never be larger than 31. And fortunately for us, 2ⁱ for 0 ≤ i < 128 in GF(2¹²⁸) is the same as 2ⁱ for 0 ≤ i < 128 in Z. So in Python we can write (a pow i) simply as 2**i or 1<<i.

Having said that, the function XTSDecrypt can be implemented almost exactly as the function LRW. There is however a small difference implementation wise: In TrueCrypts implementation of LRW the integers, when converted to strings, are represented as big endian. TrueCrypts implementation of XTS however, represents integers as little endian. This doesn’t matter at all for security, but can be confusing if you try to reuse code.

Code: xts.py. Requires gf2n.

Putting it all together

With the XTS code working, we must now update the code that decrypts the header and decrypts the rest of the volume. Adding the SHA-512 hash is very easy, we just have to remove the SHA-1 function from the list of HMACs and add SHA-512 instead, and also make sure the iteration count is 1000 instead of 2000 for this hash.

Replacing the LRW mode support with XTS mode support is not very difficult, but there are more changes needed. The most important change is related to cascaded ciphers, such as AES-Twofish. In TrueCrypt 4.3 a cipher cascade was treated as a primitive cipher and the LRW function was therefore only used once per encrypt/decrypt. We can write this as AES-Twofish-LRW.

In TrueCrypt 5.0 however, the ciphers in a chain are treated individually. First the block is encrypted/decrypted with AES-XTS, then with Twofish-XTS, for example. This means we must add some extra code to handle all decryptions. Given a list of 2-tuples (cipher1, cipher2) we can simply do:

def Decrypt(ciphers, i, n, ciphertext):
    assert len(ciphertext) == 16
    for cipher1, cipher2 in reversed(ciphers):
        ciphertext = XTSDecrypt(cipher1, cipher2, i, n, ciphertext)
    return ciphertext

We will also modify XTSDecryptMany so it will call Decrypt instead of XTSDecrypt:

def DecryptMany(ciphers, n, blocks):
    length = len(blocks)
    assert length % 16 == 0
    data = ''
    for i in xrange(length / 16):
        data += Decrypt(ciphers, i, n, blocks[0:16])
        blocks = blocks[16:]
    return data

DecryptMany is the highest-level decryption routine we will use. For TrueCrypt volumes that use cascaded algorithms, the parameter ciphers is a list of length 2 or length 3. For volumes that use a single algorithm, ciphers is a list of length 1.

Now we can modify the class TrueCryptVolume and decrypt TrueCrypt 5 volume headers. For each hash algorithm we generate a 192 byte long key using PBKDF2. In TrueCrypt 4.3 we only had to generate a 128 byte key, but now when we use the XTS mode instead of LRW a longer key is required to handle the (at most) six keys needed, of 32 bytes each.

This 192 byte key is split into six 32 byte keys. That means two keys for every algorithm. For a three-cipher cascade, all six keys will be used. A simple volume that only uses AES, only two keys will be used.

We will now try to decrypt the volume header for each cascade in the list of possible cascades supported. These are as before:

Cascades = [
    [Rijndael],
    [Serpent],
    [Twofish],
    [Twofish, Rijndael],
    [Serpent, Twofish, Rijndael],
    [Rijndael, Serpent],
    [Rijndael, Twofish, Serpent],
    [Serpent, Twofish]
]

Let’s say the correct algorithm is Rijndael. We will create two Rijndael instances cipher1 and cipher2 (remember two instances are needed for XTS) and initialize each of them with the first and the second key from the PBKDF2 output. We can then decrypt the volume header with DecryptMany, like this:

DecryptMany([(cipher1, cipher2)], 0, header)

Should the correct cascade be [Twofish, Rijndael], the volume will be decrypted with:

DecryptMany([(ciphertwo1, ciphertwo2), (cipherrij1, cipherrij2)], 0, header)

Where ciphertwo1, ciphertwo2, cipherrij1 and cipherrij2 are initialized with their corresponding keys from the 192 long key previously generated.

Once the header has been decrypted correctly, all ciphers will be reinitialized with the new keys from the decrypted volume header. We can then use DecryptMany to decrypt the rest of the volume, via TCReadSector. We must modify the TCReadSector function slightly so the dataunit index is correct for hidden volumes. This is documented in the code.

Code: truecrypt5.py. Requires rijndael, serpent, twofish, keystrengthening5, xts.

Conclusion

TrueCrypt 5 is a very well documented project, both the documentation on the website and the source code. The TrueCrypt team deserves credit for that. If you are interested in every detail of TrueCrypt you should read the source code, it’s quite easy to understand. That said, hopefully this minimalist Python implementation and the longer article about TrueCrypt 4.3 will help you better understand the core functionality of the project. If you like TrueCrypt, please donate to the TrueCrypt project, they deserve it.

Finally the obligatory disclaimer: The author of this blog is not affiliated with the TrueCrypt project in any way. The code in this post is MIT License, so you can do pretty much anything you want with it.

TrueCrypt explained

2008-01-04T21:48:00.001+01:00

The most popular cryptographic software for Windows is probably TrueCrypt. In this article I will explain how TrueCrypt works and as a by-product a working Python implementation will be provided. This article is written from a programmer perspective and the math will be kept to the minimum. The emphasis is on how TrueCrypt uses cryptographic primitives such as AES and SHA-1, not how the primitives themselves work.

It should be noted this article was written with the current (as of writing) TrueCrypt 4.3 specification in mind, which has deprecated several cryptographic algorithms and modes of operation, although they are still in the program for compatibility reasons. This article will only describe the non-deprecated features namely AES, Serpent, Twofish and the LRW mode of operation. If you have an old TrueCrypt volume that uses for example Blowfish and CBC mode, you won’t learn how to read it in this article.

Before we get into the nitty-gritty details of TrueCrypt there are some basic things we must understand. If we remove the cryptographic part of TrueCrypt the only thing the software does is mount file systems, exactly like the program ‘mount’ on unixes and ‘daemon-tools’ (which can mount the CD and DVD file systems) on Windows. If we take an encrypted TrueCrypt-container and decrypt it, we’re left with a file system.

What we are going to do is take a TrueCrypt volume, we can call the file volume.tc, and decrypt this to a file system, say volume.fat32. We can then use user-level file system tools to extract files from the file system, just like we work with zip/tar archives. This is not very user friendly, but perfectly adequate since we are only interested in the cryptographic part of the program, not the win32-driver that does the mounting or the graphical user interface.

If you remove every part of TrueCrypt except the part that does the encryption and decryption, TrueCrypt is very easy to understand.

Preparations

In this article we are using the Python programming language. The good news is Python actually provides the SHA-1 algorithm in the standard library, and SHA-1 is one of the cryptographic primitives required to read a TrueCrypt volume.

The bad news is we don’t have Rijndael aka AES, Serpent (a cryptographic algorithm like AES), Twofish (another cryptographic algorithm like AES), RIPEMD-160 (a hash similar to SHA-1) or Whirlpool (an advanced hash algorithm). Fortunately for us, there are many third party Python modules with this functionality, such as the Python bindings for mcrypt or the Python Cryptographic Toolkit. It is also fairly straightforward to create modules from C code using Pyrex.

However, for the sake if portability and simplicity, I have provided some pure Python implementations of all the cryptographic algorithms needed. These implementations are converted directly from the C code and are thus very ugly Python, but the code works although it’s slow.

Code: serpent.py, twofish.py, rijndael.py, ripemd.py, whirlpool.py

The three ciphers have the same interface and should be easy to work with. The two hashes have the same interface as the sha and md5 module.

>>> from serpent import *
>>> cipher = Serpent()
>>> cipher.set_key('a'*32)
>>> cipher.encrypt('01234567abcdefgh')
'\xc0oN\xefw\\\xa8\x06GQG[\xcc\x94\x0e1'
>>> cipher.decrypt(_)
'01234567abcdefgh'

Please note that the cryptographic primitives above are not really part of this article. For the rest of this article I will simply assume the ciphers and hashes with the interfaces described are available. As mentioned previously, this article won’t explain how the five primitives works, just how TrueCrypt uses them.

To try the code in the article it is very useful to have a TrueCrypt volume ready. I strongly recommend against using your production volumes with all your secret files, mainly because your password may end up in log files or swap files. Here are some recommendations when you create the new volume. One. Select a small size, a few MB or so. A small volume is easier to work with. Two. Use a FAT file system. The FAT file system is much easier to read for humans than NTFS. Three. Select the SHA-1 hash. While RIPEMD-160 and Whirlpool works perfectly well, the SHA-1 code is much faster because it’s not implemented in pure Python like the other two. Four. Don’t forget to add some files to your volume! I suggest text files, which are very easy to read.

Once you have code for the cryptographic primitives and a working TrueCrypt volume, we can proceed.

High-level overview

You have a TrueCrypt volume and you want to decrypt it. The only thing you have at your disposal is the correct password and a programming language. Here is a high-level overview of what must be done. For now we will ignore hidden volumes. Those will be dealt with later.

Step 1. The password you chose when you created the TrueCrypt volume is not actually used directly, but indirectly. Using a so-called key strengthening algorithm, we use the password and some additional data to generate new better passwords (the correct terminology is keys). These keys will be needed later. The idea of key strengthening is to perform lots of time-consuming transformations on the initial password to make brute force attacks more difficult.

Step 2. Using the keys generated, the volume header will be decrypted. The volume header is the first few bytes of the TrueCrypt volume file and it contains information needed to decrypt the rest of the volume. The volume header also contains useful ancillary data such as time stamps.

Step 3. If Step 2 was successful, we now have at our disposal all the keys necessary to decrypt the actual file system.

These three steps are somewhat simplified since we don’t know which cryptographic algorithm is the correct one (when a TrueCrypt volume is created we can select between several, including chains of algorithms). We will deal with this later.

Key Strengthening

Do you remember when you created your last TrueCrypt volume? One of the steps required when creating the volume is you have to choose a hash algorithm, one of SHA-1, RIPEMD-160 and Whirlpool. This algorithm is used for the key strengthening. But I’m getting ahead of myself here.

As mentioned previously, key strengthening is the conceptual method of creating strong key(s) from a weak key. There are many methods to do this. A very simple method is to hash the key thousands of times. Consider for example the following Python code.

import sha
def strengthen(key):
    for i in xrange(50000):
        key = sha.new(key).digest()
    return key

>>> strengthen("password")
'+\xdeF\xb4\xd9\xb6\xf3\x1f\xde\xb9!*\xe6\xec\xa2K\xac!\xbbv'

An attacker armed with a dictionary will soon be discouraged because for each password he tries he has to perform several thousand additional computations. If the attacker has a list of 2000 potential passwords, he has to perform 2000 * 50000 SHA-1 computations if we are using the example above! Notice there is no way to parallelize our strengthening algorithm. This is by design.

As a clever by-product most key strengthening algorithms can also be used to derive several keys from one key. Lets say you already have a 128 bytes long key. In that case you can create several shorter keys by splitting it, for example into four 32 bytes long keys. But what if your key is short? Instead of repeatedly hashing the output as above, we can for example concatenate the output to the input and thus create a very long stream of bytes, which can be used to create several keys of various lengths. The important thing to realize is by some small modification to the strengthen routine we can return a key much longer than 20 bytes, which can be split into several keys, if needed.

import sha
def strengthen(key):
    for i in xrange(50000):
        key += sha.new(key).digest()
        if len(key) >= 128:
            key = key[-128:]
    return key

>>> strengthen("password")
'\xab$\xa1\x83l\xba\xa6Y\x94wi\x00\xe11\x96I\xb1\xc7\x84\x14\xcd'
'\x8fo\x97\xd64\xafw\xd0?J\xbd\xd6n\x80#X\xb0\x1cY%\x0f\xd4\x99\xf0'
'\x8c\x14\x885u?\xdd\xa2*\x87\x96\xd0\x97\xdf\xd9\x08\xc0\xe4\xd5k'
'\x9e=\xa3\xb5\r\xec\x88h\x12\xf9}\xb1A\x02\x20:\xd7<\xf4\x04\xce#7m'
'\xd5\x20\r\x86\x1a\x1b\xe0\xd7\x9ai\xc3(5\x8b\xe6\x1c\xf1\xa9:'
'\xfa\x94H\xbbV\x9f\xa3#^\x9f\xa3:\x9a/\xcb.A\xb0\xc3J'

TrueCrypt uses an advanced key strengthening algorithm known as PBKDF2, described in RFC 2898. It is not completely necessary for us to understand how PBKDF2 works; it is a cryptographic primitive like AES and SHA-1. We only have to know how to use it. The function PBKDF2 takes as arguments a hash-function, a password, a salt, an iteration count and a desired length of the output. (Note that the hash function is not the function directly but a HMAC version. This doesn’t change anything for us since it’s just another cryptographic primitive well understood). Here’s an example:

>>> PBKDF2(HMAC_RIPEMD160, "password", "\x12\x34\x56\x78", 5, 4)
'\x7a\x3d\x7c\x03'

The first argument is the HMAC version of the RIPEMD-160 hash algorithm. The second argument is our password. The third argument is the salt. A salt is a randomly generated value, more about the salt later. The fourth argument is the iteration count. In this case we have selected 5 iterations, but more is better (but slower). This is basically the same thing as the 50000 iterations in our simple example above. The fifth argument is the desired length of the output. As seen, the output is 4 bytes long.

With our new acquaintance the cryptographic primitive PBKDF2 we can proceed. For each hash algorithm TrueCrypt uses we are now going to generate a 128 bytes long key. This key is needed later. We already know the password; after all we created the TrueCrypt volume. The iteration count is part of the TrueCrypt specification and is 2000 for SHA1, 2000 for RIPEMD-160 and 1000 for Whirlpool (because Whirlpool is more advanced and computationally heavy than the two others). Why generate a 128 byte long key and not a shorter or longer one? We only need 128 bytes, which we will see in the next step.

What then is the salt? Now we can finally start working with the actual TrueCrypt volume file. The salt is the 64 bytes at the beginning of the volume file. Open the TrueCrypt volume, read the first 64 bytes (don’t forget to read the file in binary mode) and we have the salt. Notice that the salt is not encrypted and it doesn’t need to be! Intuitively this doesn’t make any sense but it will become clear later in the article.

It should be noted that we are not supposed to know which of the three hash algorithms was used when the volume was created. We simply create three 128 bytes keys, one for each of SHA-1, RIPEMD-160 and Whirlpool, and find the correct key in the next step by trial-and-error (assuming, of course, we used the correct password). With that said, the only thing we have to do before proceeding to the next step is get the salt and generate the three keys.

password = "password"

fileobj = file("volume.tc", "rb")
salt = fileobj.read(64)
fileobj.close()

for hmac in [HMAC_SHA1, HMAC_RIPEMD160, HMAC_WHIRLPOOL]:
    iterations = 2000
    if hmac == HMAC_WHIRLPOOL:
        iterations = 1000
    key = PBKDF2(hmac, password, salt, iterations, 128)
    print repr(key)

Code: keystrengthening.py. Requires ripemd, whirlpool.

Decrypting the header

The good news is: from here on everything will be relatively easy. None of the hash algorithms are used after we have created the keys, and no new keys will be generated. Once we have the three 128 byte keys from the previous step we can forget everything about hash algorithms, key strengthening, Whirlpool, HMACs, PBKDF2 and what not. The bad news is this part is heavy on theory.

In this step we have to work very closely with the TrueCrypt volume file. More specifically, we have to work very closely with the first 512 bytes of the TrueCrypt volume file. Open your TrueCrypt volume, read the first 512 bytes, and forget about the volume file because this is the only thing we need for now.

If you remember the previous step, the first 64 bytes of the volume file is the salt. We don’t need the salt, so we can discard it. So what we are actually interested in is the 448 bytes from byte 64 to byte 512.

>>> data = file("volume.rc", "rb").read(512)
>>> header = data[64:512]
>>> len(header)
448

So, just to remind us, we now have a 448 bytes long encrypted header and three 128 bytes long keys. We are going to use one of the keys to decrypt the header. How do we know we are successful? Fortunately for us, the decrypted header begins with the magic bytes ‘TRUE’. So if we use one of our keys, decrypt the header with it and find that the first four bytes are ‘TRUE’, we have probably provided the correct password to the key strengthening algorithm. To make sure we have the correct password and the magic bytes were not just a coincidence, there are some additional data in the header we can use to verify the correctness of the header. This is not important now.

So what to do? Simple, you say: Just try the three keys in each cryptographic algorithm and see if the magic bytes are what we’re looking for when decrypting the header. Not so fast. TrueCrypt doesn’t use the cryptographic algorithm directly on the data. This is why: All the cryptographic algorithms we are using are so called block ciphers. They encrypt and decrypt blocks that are a certain number of bytes large. Let’s say the block size is 16 bytes. This means the cipher can only encrypt and decrypt data of length 16. If we have a large block of data we split it into 16 bytes long blocks and encrypt/decrypt them separately.

Here is the problem. What if two blocks contains the exact same data, for example file headers? Then the encrypted blocks will look exactly the same for an attacker looking at the cipher text (that is, the TrueCrypt volume file). This is highly undesirable, especially for such large files as file systems where there’s plenty of room for redundant information. If each block in the volume was decrypted separately with the same key an attacker would easily be able to guess the contents of the volume, based on the regularities alone.

Enter cryptographic modes. A cryptographic mode is a “scheme” how data should be encrypted and decrypted to remove weaknesses, such as the weakness we have described. A very simple cryptographic mode works like this: Fill the first block, block 0, with random bytes. The remaining blocks should contain the actual data. However before encrypting block one, XOR the data with the random bytes from block 0. Before encrypting block two, XOR the data with the encrypted data from block one, and so on. As each block is “chained” to the previous block, the cipher text will not look the same for two blocks with the same plain text. As with the salt in the previous steps, the random bytes don’t have to be encrypted. Using the common cryptographic notation, this mode can be described as:

C_i = E_K1(C_i-1 xor P_i)

Where C_i is the cipher text, E is the encryption algorithm, K1 is the key, C_i-1 is the previous cipher text block and P_i is the plain text.

The mode described above is a simple form of the so-called CBC-mode, where CBC stands for Cipher Block Chaining. For the untrained eye CBC-mode seems plausible and useful but in fact it has many non-obvious weaknesses. For this reason, a new mode was designed. This is known as the LRW-mode from the authors Liskov, Rivest (the guy who designed MD5), Wagner, and it’s the mode TrueCrypt use.

The LRW-mode is one of the few cryptographic primitives very useful to understand how it works to better understand TrueCrypt. With the same notation as above, we can completely describe the LRW mode:

C_i = E_K1(P_i xor (K2 mul i)) xor (K2 mul i)

If you understood the notation describing CBC above it should not be too difficult to understand the description of LRW. There are three problems. First of all, K2 is a second key, we can call this the LRW key. This LRW key is actually part of the 128 bytes long key we generated in the previous step, exactly like the cipher key K1. We know both K1 and K2. Second, the above mode mentions i. What is i? It’s just the block index, that is the index of the block P_i.

Now, unfortunately, comes the abstract algebra. In the description above, mul means multiplication in the Galois field GF(2¹²⁸). This is the mathematician’s way of describing a kind of advanced bit hacking operation. Addition and subtraction in GF(2ⁿ) is the same thing as our old friend xor. In fact, the formal definition of LRW doesn’t “xor”, it “performs addition in GF(2ⁿ)”. (Note that n can be something other than 128, but it is 128 for the ciphers in this article.)

I will not describe exactly how this kind of multiplication is performed, I will just refer to the following Wikipedia article: Wikipedia: Finite File Arithmetic and give these encouraging words: if you can multiply large numbers with a pen and paper like you learned in elementary school, you can perform a multiplication in GF(2¹²⁸).

We can now write some code! First of all we have to write a short module for arithmetic in the Galois fields GF(2ⁿ), because we need to do this where n=128 for the LRW mode and the ciphers TrueCrypt uses. There are many clever ways to quickly multiply numbers in this field but I will for simplicity go for a schoolbook implementation, it’s less than 100 lines. For example:

def gf2n_mul(a, b, mod):
    """Multiplication in GF(2^n)."""
    pass

def gf2pow128mul(a, b):
    """Multiplication in GF(2^128)."""
    return gf2n_mul(a, b, mod128)

>>> gf2pow128mul(0xb9623d587488039f1486b2d8d9283453, 0xa06aea0265e84b8a)
0xfead2ebe0998a3da7968b8c2f6dfcbd2

Code: gf2n.py.

Using the module we just wrote, we can now write some code for the LRW mode. The main function will look like this (see lrw.py for the actual implementation, it's just in code what we explained above):

def LRW(cipherfunc, lrwkey, i, block):
    pass

Where cipherfunc is a cryptographic function initialized with a key (K1) that performs either encryption or decryption, lrwkey is the LRW key K2, i is the block index and block is the block of plain text or cipher text, depending on whether we want to encrypt or decrypt some data.

The informed reader will notice a multiplication in our Galois field will at most output 16 bytes. This means the length of the block must be 16, or a multiple of 16. A helper function LRWMany will be provided which simply splits the block into 16 byte blocks and calls the LRW function for each block.

def LRWMany(cipherfunc, lrwkey, i, blocks):
    length = len(blocks)
    assert length % 16 == 0
    data = ''
    for b in xrange(length / 16):
        data += LRW(cipherfunc, lrwkey, i + b, blocks[0:16])
        blocks = blocks[16:]
    return data

Code: lrw.py. Requires gf2n.

We can now finally decrypt the TrueCrypt volume header. Remember the 128 bytes long key? The first 16 bytes of the key is the LRW key. The bytes from 32 to 128 are keys for the cryptographic algorithms themselves. More precisely byte 32-64 is cipher key 1, byte 64-96 is cipher key 2 and byte 96-128 is cipher key 3. For TrueCrypt volumes using multiple cascaded ciphers two, or all three, cipher keys are used. For volumes using a single cipher only the first key is used. In Python:

lrwkey = key[0:16]
key1 = key[32:64]
key2 = key[64:96]
key3 = key[96:128]

Let us for a short moment ignore cascaded ciphers and say we have a TrueCrypt volume using the Twofish algorithm. To decrypt the header we simply do the following:

>>> cipher = Twofish()
>>> cipher.set_key(key1)
>>> decryptedheader = LRWMany(cipher.decrypt, lrwkey, 1, header)

Where header is the 448 byte header and 1 is the block index. The header has block index 1. Here’s an example how the header may look like before and after decrypting it.

+\xf0\xbb\x0e\xeb\x82K\x0bQ\xfa\rs\x11\xb8c=\x1f!\xc6`2\xa5\xb8\xefZ\xd7\xf6\x82~\x1b
\xe0\x18Cl\x1eM\xffW\x85H\xc6k\xba\xb4\xb9\x86{G\x05(\xa4\x8b\x0c\x9b\x06\x13\xa5\x8d
\xe9\x06E\xa7\x01pz\x97\x87\x02\x8a\xb6;\xf9O\x8e\xfa\x14\x3ea\xd1\x911\xa2w\xfe?C\x1e&
\xd5\xfe\xdc\xceu\xd9\x04\xfb\xf2\xa3\xde\xa0-\xee\xdc\xb8\xe7\x91\x80[N\xc6)#l\x08q
\x89\xe0=\xa1\xd7\x0e\x90nO\xc9s\xde\x92\xbbv\x14\t\xeb_\x92\x85_\x18\xb9\xba\x1b]
\xb7\xf8\x17\x03\xc5\x86\xac\xcd]\xd7\xc5\xcb\x9f!\xc4\xec\x05\x85\xc9\x91\xb5Pm\xdb
\xdc\xc7\x14_\xe3I\x86k\xdcA\x3ef\xb9\x10"\xcd\x14I\xd1\xe3|\xd0\x95\xd0\xcb\x12\x83a,
\xa3K\xac\x0b+\x1b\x02-?g\xf8\xad\x93\x01\x96p^\xd6\xa3\x8dY\xa4\xc6\xa5+u\x80\x19
\xcexX\x8d\xd0\xa6\xb0\x0b\x97\x19\xa2h\xe8/D\xef,\xc4\x1dy\x7f\xdc\xf6\n\xe4\xafw\x80
\x97\x18\xb5jPT\x81m\xbc\x9a\x94\x1b\x3e\xa9\xf6\xe0\xa9\xc1M\xff\xed)D\x8f\x16\x8e\x9al
\x0c\xdbhl\xf7\xbb\xfc\x04%\x96\xd2\xa1n\xf3\xe9K\x96\xfe\xe1\x20\xec\'\xc4\xce\x89
\xbcG\x1d\xc9\x8e\xcd\xb3?\x15{_\xff^f9\xc5-\xdd\x9c6\xf5b\xa4\xb5\xf08v\x93\xed\xee
\xc1K\xd6\x95R7\xe6\x1c\xcad\xee\xe9\xa7\x10\xf7\xe0\x90\xaa\x81\x04\xb2c\xa2d\xd8T
\xe5\xa3\xe3f\xbcG;\xb7\xf8\x3e\xe2\xad\xcc5\r\xf0Z\xec\x98\xc9\xd9\xb6\xf3\x12\xa1fY
\xa0S\xd9E\xe5\xbc\x86p(&\x0c-\xf3\x8a\x80\x81\xb5\x95j\x9c\rfQ\xb2\xdcu\xfa=\x18\x01
\x96[\x1a\xab[\xb1{8\xaf{\x1e\xb2\\\xaf\x11(!\x00h\xb1\xe7\x85\xd4\n\xc5\x3c\x90\xb0QQ\xd5

'TRUE\x00\x02\x04\x10\x19\xe8\x89\x0c\x01\xc8=\xbes\xcd\xd9\x10\x01\xc8=\xbes\xcd\xd9
\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xcb\r\xf0_CRw
\x84\xa2\x81!:\x1d\xbd\x90\x16\x95\x99WS\x1do\xc5\xf4\xdd#\xe7\x10\xfc\x18\x86\xddl
\xbf/\xd6\x02M\x13\x11\xc7\x86H\xbe"\x05\x19\xe4\x1e\xf2fO4\xa2_\x15Lg\x80\x99~\xd1
\xd1K\x14H\xb7\xd7\xa0B\xf8\xc4\xb9`\xfcc\x14j\xa9\xd7\xe2\xcc\x9a\x81C\xa3\xc3\xfb~
\xfc\xc0\x83z\xb3\x8c\x10\x05z\x98\xaf\xcd,+\x91N\x1c#\xf6\xe3\x83\x88\x1b\x1e\x89r
\xba\r;3\xc4\x0b.\'\x15\x3cn9\xb3\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00'

As we can see, the decrypted header has the magic bytes ‘TRUE’ in the beginning, as promised. Before we go over the content of the header, we should be able to handle cascaded ciphers. This is actually trivial to implement because we don’t have to change the LRW functions, just supply a decrypt or encrypt function as we did before. Consider the following code, with (almost) the same interface as the cipher algorithms:

class CipherChain:
    def __init__(self, ciphers):
        self.ciphers = [ciph() for ciph in ciphers]
    def set_key(self, keys):
        i = 0
        for cipher in self.ciphers:
            cipher.set_key(keys[i])
            i += 1
    def encrypt(self, data):
        for cipher in self.ciphers:
            data = cipher.encrypt(data)
        return data
    def decrypt(self, data):
        for cipher in reversed(self.ciphers):
            data = cipher.decrypt(data)
        return data
    def get_name(self):
        return '-'.join(reversed([cipher.get_name() for cipher in self.ciphers]))

Ponder the code for a while. Especially consider the encrypt and decrypt function and the use of the Python function “reversed()”. Now consider a TrueCrypt volume using a three cipher cascade instead of simply Twofish. The cascade Rijndael, Twofish, Serpent is known as “Serpent-Twofish-AES” in the TrueCrypt user interface.

>>> cipher = CipherChain([Rijndael, Twofish, Serpent])
>>> cipher.set_key([key1, key2, key3])
>>> decryptedheader = LRWMany(cipher.decrypt, lrwkey, 1, header)

Please note that CipherChain([Rijndael, Twofish, Serpent]) is not the same cascade as CipherChain([Serpent, Twofish, Rijndael]).

In this section of the article we have covered lots of ground and we have taken some detours into the realms of abstract algebra and cryptographic modes. We can now do something that is probably more familiar to the reader: parse binary data. We can now temporarily forget everything above and concentrate fully on the decrypted header (such as the one above).

There are basically two parts of the decrypted header. First we have information. This is the data in the very beginning of the header. Then we have the curious bytes in the middle of the header, with all the zero bytes surrounding it. These bytes are very important because they are the keys for the rest of the volume. These keys are needed to decrypt the actual file system. We will call this key the master key.

Let’s get back to the information in the beginning of the decrypted header. The first 4 bytes are as previously mentioned the magic bytes TRUE (assuming, of course, we have decrypted the header correctly. If we used the wrong cipher or the wrong hash or the wrong password, the first four bytes will probably be garbage). Following the magic bytes are some big endian integers. The complete layout is as follows:

4 bytes: The string ‘TRUE’
2 bytes: A 16 bit big endian integer. The version number of the volume.
2 bytes: A 16 bit big endian integer: The minimum version of TrueCrypt required to read the volume.
4 bytes: A 32 bit big endian integer: A checksum.

In our decrypted header above, we can parse the first bytes by hand here and now. The version number is 2, the minimum TrueCrypt version is 0x0410 and the checksum is 0x19e8890c. This checksum is used to make sure we really have a correctly decrypted header. Although it’s highly improbable, decrypting a volume with the wrong password may result in the first four bytes being ‘TRUE’. The checksum is added to verify the decrypted header is really valid. If the checksum matches the CRC32 of byte 192-448 of the header, we have a valid header (with extremely high probability).

Following the checksum are three 64 bit big endian integers. The first two are timestamps and the third is the size of the volume if this is a hidden volume (ignore this for now). The first timestamp is the date and time the volume was created. The second timestamp is the date and time the header was modified.

Knowing all this, we can now write some code to decrypt the header using only the password we chose when the volume was created. First lets write some pseudo-code:

for each hash
    key = keystrengthening(hash, password)
    for each cipher
        decryptedheader = decrypt(key, header)
        if magic bytes matches and crc32 matches checksum:
            success!

As a recap, in the first step in this article we took the TrueCrypt volume password and generated keys needed to decrypt the header. These keys, in turn, we used in the second step to decrypt the volume header. In the middle of the decrypted volume header lays the master key needed to decrypt the rest of the volume.

Here’s the complete code for this and the previous step. This code will work like this:

fileobj = file("aestwoser.tc", "rb")
tc = TrueCryptVolume(fileobj, "passwordpasswordpassword", Log)
TCPrintInformation(tc)
fileobj.close()

TrueCryptVolume is a class representing a TrueCrypt volume and tc is the object representing the volume aestwoser.tc specifically, which has the password as shown. The constructor of TrueCryptVolume will generate the strengthened keys from password, as described in the first step in this article. It will then decrypt the header by trying each cipher and cascade. In essence, it works exactly like the pseudo code just written. A TrueCryptVolume object has some useful member variables, especially “master_lrwkey” which is the master LRW key extracted from the decrypted header, and “cipher”, which is the cipher algorithm relinitilized with the master keys also extracted from the decrypted header.

from rijndael import Rijndael
from serpent import Serpent
from twofish import Twofish
from lrw import *
from keystrengthening import *

#
# Utilities.
#

import struct
import time
import binascii

def Log(message):
    print >> sys.stderr, "Progress:", message

def CRC32(data):
    crc = binascii.crc32(data)
    # Convert from signed to unsigned word32.
    return crc % 0x100000000

def BE16(x):
    return struct.unpack(">H", x)[0]

def BE32(x):
    return struct.unpack(">L", x)[0]

def BE64(x):
    a, b = struct.unpack(">LL", x)
    return (a<<32) | b

def Win32FileTime2UnixTime(filetime):
    return filetime / 10000000 - 11644473600

class CipherChain:
    assert not "Code as above in the article"

Cascades = [
    [Rijndael],
    [Serpent],
    [Twofish],
    [Twofish, Rijndael],
    [Serpent, Twofish, Rijndael],
    [Rijndael, Serpent],
    [Rijndael, Twofish, Serpent],
    [Serpent, Twofish]
]

HMACs = [
    (HMAC_SHA1, "SHA-1"),
    (HMAC_RIPEMD160, "RIPEMD-160"),
    (HMAC_WHIRLPOOL, "Whirlpool")
]

class TrueCryptVolume:
    """Object representing a TrueCrypt volume."""
    def __init__(self, fileobj, password, progresscallback=lambda x: None):

        self.fileobj = fileobj
        self.decrypted_header = None
        self.cipher = None
        self.master_lrwkew = None
        self.hidden_size = 0

        salt = fileobj.read(64)
        header = fileobj.read(448)
        
        assert len(salt) == 64
        assert len(header) == 448

        for hmac, hmac_name in HMACs:
            # Generate the keys needed to decrypt the volume header.
            iterations = 2000
            if hmac == HMAC_WHIRLPOOL:
                iterations = 1000

            info = ''
            if hmac_name in "RIPEMD-160 Whirlpool":
                info = ' (this will take a while)'
            progresscallback("Trying " + hmac_name + info)
            
            header_keypool = PBKDF2(hmac, password, salt, iterations, 128)
            header_lrwkey = header_keypool[0:16]
            header_key1 = header_keypool[32:64]
            header_key2 = header_keypool[64:96]
            header_key3 = header_keypool[96:128]

            for cascade in Cascades:
                # Try each cipher and cascades and see if we can successfully
                # decrypt the header with it.
                cipher = CipherChain(cascade)
                
                progresscallback("..." + cipher.get_name())
                
                cipher.set_key([header_key1, header_key2, header_key3])
                decrypted_header = LRWMany(cipher.decrypt, header_lrwkey, 1, header)
                if TCIsValidVolumeHeader(decrypted_header):
                    # Success.
                    self.decrypted_header = decrypted_header
                    
                    master_keypool = decrypted_header[192:]
                    master_lrwkey = master_keypool[0:16]
                    master_key1 = master_keypool[32:64]
                    master_key2 = master_keypool[64:96]
                    master_key3 = master_keypool[96:128]

                    self.master_lrwkey = master_lrwkey
                    self.cipher = cipher
                    self.cipher.set_key([master_key1, master_key2, master_key3])
                    self.hidden_size = BE64(decrypted_header[28:28+8])
                    self.format_ver = BE16(decrypted_header[4:6])

                    # We don't really need the information below but we save
                    # it so it can be displayed by print_information()
                    self.info_hash = hmac_name
                    self.info_headerlrwkey = hexdigest(header_lrwkey)
                    self.info_headerkey = hexdigest(header_keypool[32:128])
                    self.info_masterkey = hexdigest(master_keypool[32:128])

                    progresscallback("Success!")
                    return
        # Failed attempt.
        raise KeyError, "incorrect password (or not a truecrypt volume)"

def TCIsValidVolumeHeader(header):
    magic = header[0:4]
    checksum = BE32(header[8:12])
    return magic == 'TRUE' and CRC32(header[192:448]) == checksum

def TCPrintInformation(tc):
    if not tc.decrypted_header:
        return

    header = tc.decrypted_header
    program_ver = BE16(header[6:8])
    volume_create = Win32FileTime2UnixTime(BE64(header[12:12+8]))
    header_create = Win32FileTime2UnixTime(BE64(header[20:20+8]))

    print "="*60
    print "Raw Header"
    print "="*60
    print repr(tc.decrypted_header)
    print "="*60
    print "Parsed Header"
    print "="*60    
    print "Hash          :", tc.info_hash
    print "Cipher        :", tc.cipher.get_name()
    if tc.hidden_size:
        print "Volume Type   : Hidden"
        print "Hidden size   :", tc.hidden_size
    else:
        print "Volume Type   : Normal"
    print "Header Key    :", tc.info_headerkey
    print "Header LRW Key:", tc.info_headerlrwkey
    print "Master Key    :", tc.info_masterkey
    print "Master LRW Key:", hexdigest(tc.master_lrwkey)
    print "Format ver    :", hex(tc.format_ver)
    print "Min prog. ver :", hex(program_ver)
    print "Volume create :", time.asctime(time.localtime(volume_create))
    print "Header create :", time.asctime(time.localtime(header_create))
    print "="*60

TCPrintInformation will display some information about the volume, such as the hash algorithm used, the cipher(s) used and the information in the header. It may look like this:

============================================================
Raw Header
============================================================
'TRUE\x00\x02\x04\x10\x19\xe8\x89[snip]
============================================================
Parsed Header
============================================================
Hash          : SHA-1
Cipher        : Rijndael-Twofish-Serpent
Volume Type   : Normal
Header Key    : dced57ba7bc3e11ab047e747d2c4a3a6e205f3d36dafd766fe3bc
                77e61597df12781597b7d8c4092bba7c0bfd78e3e58ca90eb6c63
                e68090bae282dbed0ca90fc85f408277ed3b7e26ba2f0d782dfdf
                a04985d4c49caf5d80d59b3f3d3d51567
Header LRW Key: e0977ad2c02991d1e9eda77e9df88583
Master Key    : 6cbf2fd6024d1311c78648be220519e41ef2664f34a25f154c678
                0997ed1d14b1448b7d7a042f8c4b960fc63146aa9d7e2cc9a8143
                a3c3fb7efcc0837ab38c10057a98afcd2c2b914e1c23f6e383881
                b1e8972ba0d3b33c40b2e27153c6e39b3
Master LRW Key: cb0df05f43527784a281213a1dbd9016
Format ver    : 0x2
Min prog. ver : 0x410
Volume create : Thu Dec 13 20:29:17 2007
Header create : Thu Dec 13 20:29:17 2007
============================================================

Reading the data

A short recap of what we have done. In step one we took our TrueCrypt volume and a password. We used this password to derive several new keys using a key strengthening algorithm. These keys were in turn used to decrypt the TrueCrypt volume header. In the volume header we found the keys for the rest of the volume. These are the keys we are going to use to decrypt the actual file system.

Fortunately for us, since we have already implemented the LRW functions, this step will be very easy! We just reinitialize the cipher we used to decrypt the header with the keys from the header and use the new master LRW key to decrypt the rest of the volume. We only have to add one new function to the code above and other than some math to compute the LRW block indices the code should not be too difficult to understand. Given an index starting from 1, the function TCReadSector will read 512 bytes long blocks of data. 512 bytes is the sector size defined in the TrueCrypt specification (note that the sector size doesn’t have anything to do with the underlying file system, it is just the TrueCrypt terminology for the smallest block of data we work with when decrypting the file system).

def TCReadSector(tc, index):
    """Read a sector from the volume."""
    assert index > 0
    tc.fileobj.seek(0, 2)
    file_len = tc.fileobj.tell()

    # The LRW functions work on blocks of length 16. Since a TrueCrypt
    # sector is 512 bytes each call to LRWMany will decrypt 32 blocks,
    # and each call to this function must therefore advance the block
    # index 32. The block index also starts at 1, not 0. index 1
    # corresponds to lrw_index 1, index 2 corresponds to lrw_index 33 etc.
    lrw_index = (index - 1) * 32 + 1

    # For a regular (non-hidden) volume the file system starts at byte
    # 512 (after salt+header).
    seekto = TC_SECTOR_SIZE * index

    # last_sector_offset is the beginning of the last sector relative
    # the end of the file. For a regular non-hidden volume this is simply
    # 512 bytes from the end of the file.
    last_sector_offset = TC_SECTOR_SIZE
    if seekto > file_len - last_sector_offset:
        return ''

    tc.fileobj.seek(seekto)
    data = tc.fileobj.read(TC_SECTOR_SIZE)
    
    return LRWMany(tc.cipher.decrypt, tc.master_lrwkey, lrw_index, data)

While we’re at it it’s useful to know how many sectors we can read. The function TCSectorCount works pretty much the same as the code above.

def TCSectorCount(tc):
    """How many sectors can we read with TCReadSector?"""
    tc.fileobj.seek(0, 2)
    volume_size = tc.fileobj.tell()
    # Minus the salt+header.
    volume_size -= 512
    return volume_size / TC_SECTOR_SIZE

The first block may look like this, for a FAT volume:

\xeb\x3c\x90MSDOS5.0\x00\x02\x01\x08\x00\x02\x00\x02\xffO\xf8P\x00\x01\x00\x01\x00?
\x00\x00\x00\x00\x00\x00\x00\x00\x00)\xe2m`GNO NAME    FAT16   \x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00U\xaa

We have now successfully decrypted a TrueCrypt volume and printed the first 512 bytes of the underlying file system! We can now write a small command line program that will write the complete file system to disk, like this:

$ truecrypt.py volume.tc password filesystem.fat

Here's the code.

def cmdline():
    scriptname = sys.argv[0]
    try:
        path, password, outfile = sys.argv[1:]
    except ValueError:
        print >> sys.stderr, "%s volumepath password outfile" % scriptname
        sys.exit(1)

    if os.path.exists(outfile):
        print >> sys.stderr, "outfile %s already exists. use another " \
              "filename and try again (we don't want to overwrite " \
              "files by mistake)" % outfile
        sys.exit(1)

    try:
        fileobj = file(path, "rb")
    except IOError:
        print >> sys.stderr, "file %s doesn't exist" % path
        sys.exit(1)

    try:
        tc = TrueCryptVolume(fileobj, password, Log)
    except KeyError:
        print >> sys.stderr, "incorrect password or not a TrueCrypt volume"
        fileobj.close()
        sys.exit(1)
    except KeyboardInterrupt:
        print >> sys.stderr, "aborting"
        fileobj.close()
        sys.exit(1)

    TCPrintInformation(tc)

    outfileobj = file(outfile, "ab")
    num_sectors = TCSectorCount(tc)
    num_written = 0
    try:
        for i in xrange(1, num_sectors + 1):
            if i % 100 == 0:
                Log("Decrypting sector %d of %d." % (i, num_sectors))
            outfileobj.write(TCReadSector(tc, i))
            num_written += 1
    except KeyboardInterrupt:
        print "Aborted decryption."
        pass
    outfileobj.close()
    print "Wrote %d sectors (%d bytes)." % (num_written,
                                            num_written * TC_SECTOR_SIZE)
    fileobj.close()
    sys.exit(0)

if __name__ == '__main__':
    cmdline()

This program is of course not very useful in practice but it is at least more user friendly than looking at the file system in the Python top loop. It should be mentioned: you should not use the above program on a “production” volume with all your secret files on them. Your shell may log the password to file, which is undesirable. Additionally you do not want your encrypted file system written to disk because traces may be left of your secret files even if you delete the extracted file system. Traces of your secret files on a non-encrypted filesystem is extremely undesirable.

Finally, don’t worry if the written file system is full of garbage – TrueCrypt writes random bytes to the empty areas of the file system during formatting. If you create a new volume with a FAT file system and then add some text files, you can find your files in the beginning of the file system.

Wrapping it up: hidden volumes

By now we have some code that can successfully decrypt most TrueCrypt volumes. We still have to handle hidden volumes, though. Fortunately for us this is very easy. Before we do that, here is a short explanation of hidden volumes.

A hidden volume is a second TrueCrypt volume residing in the same file as another TrueCrypt volume. Note that the hidden volume is not inside the regular volume, but beside – it is not necessary to decrypt the regular volume to access the hidden volume. A TrueCrypt volume with a hidden volume is basically the same thing as two regular TrueCrypt volumes concatenated together, but in a clever way so the file system in the first volume believes it is as large as the TrueCrypt file itself (it would be very suspicious if the TrueCrypt file was say 10 gigabytes, but the regular volume was only say 6 gigabytes when mounted).

The reason TrueCrypt supports hidden volumes is plausible deniability. If the user is forced for some reason to give up his password, he will give the adversary the password to the regular volume. There is no way whatsoever to tell if there is a hidden volume in the TrueCrypt file so from the adversary’s point of view the TrueCrypt volume just mounted looks like any other TrueCrypt volume, with or without a hidden volume. In fact, when the regular volume is mounted, the hidden volume may become overwritten if new files are added. Not even TrueCrypt knows there’s a hidden volume when the regular volume is mounted. (TrueCrypt does however know when a hidden volume is mounted, but this is not necessary for plausible deniability – we are not giving up the password to the hidden volume when we can give up the password to the regular volume!)

The only way to mount a hidden volume is to provide the correct password. This password must, of course, be different than the password to the regular volume; otherwise the regular volume would be mounted all the time regardless of our intentions. Here’s what happens when we use the hidden volume password instead of the regular password. Obviously, all the steps we have described previously will fail: the hidden volume password is not the same as the regular password. This is because we haven’t written any code for hidden volumes yet. The question is then: what should we do?

This is fortunately very straightforward. When we fail to decrypt the header for the regular volume, we will now go back to step 1, that is we are going to generate new keys using the key-strengthening algorithm. But instead of reading the 64 bytes salt from offset 0 in the TrueCrypt file we are going to read the salt from offset 1536 from the end of the TrueCrypt file. We now have new keys and we will try to decrypt the header. Not the same header, but another header from offset 1536+64 from the end of the TrueCrypt file.

In the code we have already written to read regular TrueCrypt volumes, we have the following two lines to read the salt and the header:

salt = fileobj.read(64)
header = fileobj.read(448)

To read a hidden volume, we only have to add a line to seek to 1536 bytes from the end:

fileobj.seek(-1536, 2)
salt = fileobj.read(64)
header = fileobj.read(448)

Because we should be able to handle both regular and hidden volumes, we only have to add a loop outside the code that decrypts the header:

for volume_type in ["normal", "hidden"]:
    fileobj.seek(0)
    if volume_type == "hidden":
        fileobj.seek(-1536, 2)

We can now decrypt both regular and hidden volume headers by using different passwords. The only thing remaining is to fix the TCReadSector function: the first sector is obviously not at the same offset (in the TrueCrypt file) as the regular volume. The first sector of a regular volume is 512. The first sector for the hidden volume depends on the size of the file and the size of the hidden volume.

def TCReadSector(tc, index):
    """Read a sector from the volume."""
    assert index > 0
    tc.fileobj.seek(0, 2)
    file_len = tc.fileobj.tell()

    # The LRW functions work on blocks of length 16. Since a TrueCrypt
    # sector is 512 bytes each call to LRWMany will decrypt 32 blocks,
    # and each call to this function must therefore advance the block
    # index 32. The block index also starts at 1, not 0. index 1
    # corresponds to lrw_index 1, index 2 corresponds to lrw_index 33 etc.
    lrw_index = (index - 1) * 32 + 1 

    # For a regular (non-hidden) volume the file system starts at byte
    # 512. However for a hidden volume, the start of the file system
    # is not at byte 512. Starting from the end of the volume, namely
    # byte file_len, we subtract the hidden volume salt+header (at offset
    # 1536 from the end of the file). We then subtract the size of the
    # hidden volume.
    mod = 0
    last_sector_offset = TC_SECTOR_SIZE
    if tc.hidden_size:
        mod = file_len - tc.hidden_size - TC_HIDDEN_VOLUME_OFFSET
        # We subtract another sector from mod because the index starts
        # at 1 and not 0.
        mod -= TC_SECTOR_SIZE
        last_sector_offset = TC_SECTOR_SIZE + TC_HIDDEN_VOLUME_OFFSET
    seekto = mod + TC_SECTOR_SIZE * index

    # last_sector_offset is the beginning of the last sector relative
    # the end of the file. For a regular non-hidden volume this is simply
    # 512 bytes from the end of the file. However for hidden volumes we
    # must not read past the headers, so the last sector begins 512 bytes
    # before the header offset.
    if seekto > file_len - last_sector_offset:
        return ''

    tc.fileobj.seek(seekto)
    data = tc.fileobj.read(TC_SECTOR_SIZE)
    
    return LRWMany(tc.cipher.decrypt, tc.master_lrwkey, lrw_index, data)

def TCSectorCount(tc):
    """How many sectors can we read with TCReadSector?"""
    volume_size = 0
    if tc.hidden_size:
        volume_size = tc.hidden_size
    else:
        tc.fileobj.seek(0, 2)
        volume_size = tc.fileobj.tell()
        # Minus the salt+header.
        volume_size -= 512
    return volume_size / TC_SECTOR_SIZE

The complete program – truecrypt.py

## truecrypt.py - partial TrueCrypt implementation in Python.
## Copyright (c) 2008 Bjorn Edstrom <be@bjrn.se>
##
## Permission is hereby granted, free of charge, to any person
## obtaining a copy of this software and associated documentation
## files (the "Software"), to deal in the Software without
## restriction, including without limitation the rights to use,
## copy, modify, merge, publish, distribute, sublicense, and/or sell
## copies of the Software, and to permit persons to whom the
## Software is furnished to do so, subject to the following
## conditions:
##
## The above copyright notice and this permission notice shall be
## included in all copies or substantial portions of the Software.
##
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
## EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
## OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
## NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
## HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
## WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
## OTHER DEALINGS IN THE SOFTWARE.
## --
## Changelog
## Jan 4 2008: Initial version. Plenty of room for improvements.

import struct
try:
    import psyco
    psyco.full()
except ImportError:
    pass

import sys
import os

from rijndael import Rijndael
from serpent import Serpent
from twofish import Twofish
from lrw import *
from keystrengthening import *

#
# Utilities.
#

import struct
import time
import binascii

def Log(message):
    print >> sys.stderr, "Progress:", message

def CRC32(data):
    """Compute CRC-32."""
    crc = binascii.crc32(data)
    # Convert from signed to unsigned word32.
    return crc % 0x100000000

def BE16(x):
    """Bytes to 16 bit big endian word."""
    return struct.unpack(">H", x)[0]

def BE32(x):
    """Bytes to 32 bit big endian word."""
    return struct.unpack(">L", x)[0]

def BE64(x):
    """Bytes to 64 bit big endian word."""
    a, b = struct.unpack(">LL", x)
    return (a<<32) | b

def Win32FileTime2UnixTime(filetime):
    """Converts a win32 FILETIME to a unix timestamp."""
    return filetime / 10000000 - 11644473600

#
# Ciphers.
#

class CipherChain:
    def __init__(self, ciphers):
        self.ciphers = [ciph() for ciph in ciphers]
    def set_key(self, keys):
        i = 0
        for cipher in self.ciphers:
            cipher.set_key(keys[i])
            i += 1
    def encrypt(self, data):
        for cipher in self.ciphers:
            data = cipher.encrypt(data)
        return data
    def decrypt(self, data):
        for cipher in reversed(self.ciphers):
            data = cipher.decrypt(data)
        return data
    def get_name(self):
        return '-'.join(reversed([cipher.get_name() for cipher in self.ciphers]))

Cascades = [
    [Rijndael],
    [Serpent],
    [Twofish],
    [Twofish, Rijndael],
    [Serpent, Twofish, Rijndael],
    [Rijndael, Serpent],
    [Rijndael, Twofish, Serpent],
    [Serpent, Twofish]
]

HMACs = [
    (HMAC_SHA1, "SHA-1"),
    (HMAC_RIPEMD160, "RIPEMD-160"),
    (HMAC_WHIRLPOOL, "Whirlpool")
]

#
# TrueCrypt.
#

TC_SECTOR_SIZE = 512
TC_HIDDEN_VOLUME_OFFSET = 1536

class TrueCryptVolume:
    """Object representing a TrueCrypt volume."""
    def __init__(self, fileobj, password, progresscallback=lambda x: None):

        self.fileobj = fileobj
        self.decrypted_header = None
        self.cipher = None
        self.master_lrwkew = None
        self.hidden_size = 0

        for volume_type in ["normal", "hidden"]:
            fileobj.seek(0)
            if volume_type == "hidden":
                fileobj.seek(-TC_HIDDEN_VOLUME_OFFSET, 2)

            progresscallback("Is this a " + volume_type + " volume?")
            
            salt = fileobj.read(64)
            header = fileobj.read(448)
            
            assert len(salt) == 64
            assert len(header) == 448

            for hmac, hmac_name in HMACs:
                # Generate the keys needed to decrypt the volume header.
                iterations = 2000
                if hmac == HMAC_WHIRLPOOL:
                    iterations = 1000

                info = ''
                if hmac_name in "RIPEMD-160 Whirlpool":
                    info = ' (this will take a while)'
                progresscallback("Trying " + hmac_name + info)
                
                header_keypool = PBKDF2(hmac, password, salt, iterations, 128)
                header_lrwkey = header_keypool[0:16]
                header_key1 = header_keypool[32:64]
                header_key2 = header_keypool[64:96]
                header_key3 = header_keypool[96:128]

                for cascade in Cascades:
                    # Try each cipher and cascades and see if we can successfully
                    # decrypt the header with it.
                    cipher = CipherChain(cascade)
                    
                    progresscallback("..." + cipher.get_name())
                    
                    cipher.set_key([header_key1, header_key2, header_key3])
                    decrypted_header = LRWMany(cipher.decrypt, header_lrwkey, 1, header)
                    if TCIsValidVolumeHeader(decrypted_header):
                        # Success.
                        self.decrypted_header = decrypted_header
                        
                        master_keypool = decrypted_header[192:]
                        master_lrwkey = master_keypool[0:16]
                        master_key1 = master_keypool[32:64]
                        master_key2 = master_keypool[64:96]
                        master_key3 = master_keypool[96:128]

                        self.master_lrwkey = master_lrwkey
                        self.cipher = cipher
                        self.cipher.set_key([master_key1, master_key2, master_key3])
                        self.hidden_size = BE64(decrypted_header[28:28+8])
                        self.format_ver = BE16(decrypted_header[4:6])

                        # We don't really need the information below but we save
                        # it so it can be displayed by print_information()
                        self.info_hash = hmac_name
                        self.info_headerlrwkey = hexdigest(header_lrwkey)
                        self.info_headerkey = hexdigest(header_keypool[32:128])
                        self.info_masterkey = hexdigest(master_keypool[32:128])

                        progresscallback("Success!")
                        return
        # Failed attempt.
        raise KeyError, "incorrect password (or not a truecrypt volume)"

    def __repr__(self):
        if not self.decrypted_header:
            return "<TrueCryptVolume>"
        return "<TrueCryptVolume %s %s>" % (self.cipher.get_name(), self.info_hash)

def TCIsValidVolumeHeader(header):
    magic = header[0:4]
    checksum = BE32(header[8:12])
    return magic == 'TRUE' and CRC32(header[192:448]) == checksum

def TCReadSector(tc, index):
    """Read a sector from the volume."""
    assert index > 0
    tc.fileobj.seek(0, 2)
    file_len = tc.fileobj.tell()

    # The LRW functions work on blocks of length 16. Since a TrueCrypt
    # sector is 512 bytes each call to LRWMany will decrypt 32 blocks,
    # and each call to this function must therefore advance the block
    # index 32. The block index also starts at 1, not 0. index 1
    # corresponds to lrw_index 1, index 2 corresponds to lrw_index 33 etc.
    lrw_index = (index - 1) * 32 + 1 # LRWSector2Index(index)

    # For a regular (non-hidden) volume the file system starts at byte
    # 512. However for a hidden volume, the start of the file system
    # is not at byte 512. Starting from the end of the volume, namely
    # byte file_len, we subtract the hidden volume salt+header (at offset
    # 1536 from the end of the file). We then subtract the size of the
    # hidden volume.
    mod = 0
    last_sector_offset = TC_SECTOR_SIZE
    if tc.hidden_size:
        mod = file_len - tc.hidden_size - TC_HIDDEN_VOLUME_OFFSET
        # We subtract another sector from mod because the index starts
        # at 1 and not 0.
        mod -= TC_SECTOR_SIZE
        last_sector_offset = TC_SECTOR_SIZE + TC_HIDDEN_VOLUME_OFFSET
    seekto = mod + TC_SECTOR_SIZE * index

    # last_sector_offset is the beginning of the last sector relative
    # the end of the file. For a regular non-hidden volume this is simply
    # 512 bytes from the end of the file. However for hidden volumes we
    # must not read past the headers, so the last sector begins 512 bytes
    # before the header offset.
    if seekto > file_len - last_sector_offset:
        return ''

    tc.fileobj.seek(seekto)
    data = tc.fileobj.read(TC_SECTOR_SIZE)
    
    return LRWMany(tc.cipher.decrypt, tc.master_lrwkey, lrw_index, data)          

def TCSectorCount(tc):
    """How many sectors can we read with TCReadSector?"""
    volume_size = 0
    if tc.hidden_size:
        volume_size = tc.hidden_size
    else:
        tc.fileobj.seek(0, 2)
        volume_size = tc.fileobj.tell()
        # Minus the salt+header.
        volume_size -= 512
    return volume_size / TC_SECTOR_SIZE

def TCPrintInformation(tc):
    if not tc.decrypted_header:
        return

    header = tc.decrypted_header
    program_ver = BE16(header[6:8])
    volume_create = Win32FileTime2UnixTime(BE64(header[12:12+8]))
    header_create = Win32FileTime2UnixTime(BE64(header[20:20+8]))

    print "="*60
    print "Raw Header"
    print "="*60
    print repr(tc.decrypted_header)
    print "="*60
    print "Parsed Header"
    print "="*60    
    print "Hash          :", tc.info_hash
    print "Cipher        :", tc.cipher.get_name()
    if tc.hidden_size:
        print "Volume Type   : Hidden"
        print "Hidden size   :", tc.hidden_size
    else:
        print "Volume Type   : Normal"
    print "Header Key    :", tc.info_headerkey
    print "Header LRW Key:", tc.info_headerlrwkey
    print "Master Key    :", tc.info_masterkey
    print "Master LRW Key:", hexdigest(tc.master_lrwkey)
    print "Format ver    :", hex(tc.format_ver)
    print "Min prog. ver :", hex(program_ver)
    print "Volume create :", time.asctime(time.localtime(volume_create))
    print "Header create :", time.asctime(time.localtime(header_create))
    print "="*60

def cmdline():
    scriptname = sys.argv[0]
    try:
        path, password, outfile = sys.argv[1:]
    except ValueError:
        print >> sys.stderr, "%s volumepath password outfile" % scriptname
        sys.exit(1)

    if os.path.exists(outfile):
        print >> sys.stderr, "outfile %s already exists. use another " \
              "filename and try again (we don't want to overwrite " \
              "files by mistake)" % outfile
        sys.exit(1)

    try:
        fileobj = file(path, "rb")
    except IOError:
        print >> sys.stderr, "file %s doesn't exist" % path
        sys.exit(1)

    try:
        tc = TrueCryptVolume(fileobj, password, Log)
    except KeyError:
        print >> sys.stderr, "incorrect password or not a TrueCrypt volume"
        fileobj.close()
        sys.exit(1)
    except KeyboardInterrupt:
        print >> sys.stderr, "aborting"
        fileobj.close()
        sys.exit(1)

    TCPrintInformation(tc)

    outfileobj = file(outfile, "ab")
    num_sectors = TCSectorCount(tc)
    num_written = 0
    try:
        for i in xrange(1, num_sectors + 1):
            if i % 100 == 0:
                Log("Decrypting sector %d of %d." % (i, num_sectors))
            outfileobj.write(TCReadSector(tc, i))
            num_written += 1
    except KeyboardInterrupt:
        print "Aborted decryption."
        pass
    outfileobj.close()
    print "Wrote %d sectors (%d bytes)." % (num_written,
                                            num_written * TC_SECTOR_SIZE)
    fileobj.close()
    sys.exit(0)

if __name__ == '__main__':
    cmdline()

Code: truecrypt.py. Requires rijndael, serpent, twofish, keystrengthening, lrw.

Conclusion

There we have it, a couple of hundred lines of code describing the essence of TrueCrypt. We have not fully implemented TrueCrypt – in fact there are some features missing such as key files not to mention the ability to read and write files to volumes directly – but we have described the most important parts of the specification. Hopefully the code and the article will be useful for some purpose.

Finally some technicalities: One: the author is not affiliated with the TrueCrypt foundation in any way. Two: see the source code for code licenses. Most of the primitives are public domain or almost public domain and most of the TrueCrypt specific code is MIT License. The article you are now reading, minus the code, is licensed under Creative Commons (Attribution-Share Alike 3.0).