Saturday, January 04, 2020

Non-realtime publishing for censorship resistance

“Because the Internet is so new, we still don't really understand what it is. We mistake it for a type of publishing or broadcasting, because that's what we're used to. So people complain that there's a lot of rubbish online” -- Douglas Adams
In this blog post I’m going to make the case that “real time communication” should not be a given requirement when designing censorship resistant services, if resiliency is a key feature. By making real-time communication optional but not required, a service can keep functioning even in the presence of interference from very resourceful attackers in control of the service implementation, or underlying infrastructure (like the Internet itself).

To make this concrete I’m going to propose a class of services I call zines. I’m going to apply the concept to three famous example of services that try to be more or less censorship resistant: WikiLeaks (a journalistic publishing platform), Silk Road (an anonymous market), and The Pirate Bay (a torrent website).

What is a service anyway?

WikiLeaks, Silk Road and The Pirate Bay are both ideas and technical implementations. WikiLeaks is the idea that a person can publish something anonymously. Silk Road is the idea that a person can buy/sell stuff without external interference. The Pirate Bay is a search index serving small pieces of data.

When people think of these three services however, they typically think of the implementation. WikiLeaks is an actual website running on a server somewhere. Silk Road was also a website, implementing eBay like functionality - with some twists - on a server running on top of the Tor network. The Pirate Bay: a website and server as well.

All of these three services are implemented this way because that is the most practical way to achieve real time communication in the usability sense of the term. As a curious citizen I can visit WikiLeaks in my web browser and immediately be served their website and content. This is taking full advantage of one of the main features of the Internet.

This is a weakness if censorship resistance is considered very important. The servers can be shut down, as happened with Silk Road and I believe The Pirate Bay as well. The WikiLeaks administrators seem to run a game of whack-a-mole where they are shut down by authorities and move servers elsewhere.

Services are out, Publishing is in

Here is a key realization: Let’s go back in time and learn from the old days of content distribution. By removing the real time communication requirement, we could get rid of all the servers and instead turn these “services” into a publishing problem. A neat thing with this kind of publishing, like me publishing this blog post, is that it is agnostic to the medium. For convenience I am publishing this blog post here on my website, but due to the nature of the “protocol” (the English language) I could also print this blog post on paper and give it to my neighbors. If I wanted to be anonymous and sneaky I could just sign it with my private key, hide the text (steganography) inside some image or music file and publish it on every single social media website on the Internet. That is fairly strong resiliency.

Contrast this with a service like Silk Road, that is so tied to the technical implementation that removal of the service infrastructure means complete service disruption. The Silk Road “protocol” was never meant to be agnostic to the platform it was running on and therefore it did not survive after shutting down the servers it was implemented on top of. This kind of centralized infrastructure is a weakness.

Zine: offline censorship resistance

Zine is an abstract framework, or way of thinking, for designing certain classes of resilient “services”. It sacrifices real time communication to gain resiliency from distribution potential.

The main idea is that you have one or more owners. Think of an owner as a public/private keypair. The owner manipulate (offline) a collection of files (a “zine”). This collection is encrypted, signed and published periodically. It doesn’t matter where they are published, could be anywhere on the Internet, in an actual magazine, or maybe on a Tor hidden service. It doesn’t matter.

The cryptographic signing here is important. It is the one proof that the zine has not been tampered with. The 3 example services give a certain degree of “identity proof” from the underlying infrastructure, for example the WikiLeaks domain name, and the TLS certificate. Because Zine is agnostic to all this, it relies on it’s own signing.

The users can perform actions on the zine by relaying (somehow) requests to the owners. The owners may then chose to apply the action and thus manipulate/update the collection, and then publish a new collection/zine for consumption by the users. In practice they will probably have a little application for manipulating the zine, like a command line tool, text editor, or something, that handles the formatting/marshalling.

The actions are highly dependent on product. WikiLeaks may have an “upload” action for example, whereas The Pirate Bay may have an “add-torrent” action for adding a torrent to the index.

An aspect of censorship resistance is anonymity. For the sake of generality, we can say that actions may in most cases by signed by the user key-pair, but in certain applications it may be okay to not do that or use throw-away key-pairs if even pseudonymity or traceability of previous actions is undesirable.

On the owner side, when they (somehow) get a request, they will probably first inspect it (including the user public key) and then they may apply the resulting mutation. The result may go into a new zine, or mutate state that is not published, but maintained. That depends on the application and “business logic”. For example, an application may be completely stateless i.e. everything except the owner private key is (encrypted) and embedded in the zine. In other cases, the owner may have an auxiliary database or similar that is used during zine generation but not published.

I believe this framework is enough to implement a plethora of different types of applications that may benefit from censorship resistance.

There are obvious drawbacks:
  • As can be seen, we have immediately sacrificed the usability of real-time updates, for stronger resilience.
  • The user will also initially (and maybe in the future) have the problem of discovery. Where is the latest collection anyway?
  • The distribution is left as an exercise to the reader and could be a real problem for larger data sets. Distributing smaller files should be an easier problem to solve.
There are some benefits though:
  • The Zine can be published anywhere and is therefor much harder to take down, be it denial of service or infrastructure takeover. When the cat is out of the bag it will live a life on it’s own, unlike services that are backed by real-time (online) databases and such.
  • All interactions with the Zine are offline (in the sense there are no client/server interaction) which could lead to less vulnerabilities.
  • The owners only have to care only about their keypair and no other sensitive things (servers etc).
Note that the owner can of course be automated, for near real time updates. Any existing service could also publish a zine as an out of band side channel, next to the real time “usable” implementation. Should the real time implementation be taken down, the zine can live on.

Example: WikiLeaks

An important feature of WikiLeaks, as with many other popular websites, is curation. WikiLeaks publish certain types of data, not any data. However, the current technical implementation of WikiLeaks have to care a lot about distribution: they publish large archives of material, searchable. This improves on usability.

To make a Zine out of the WikiLeaks idea, the owner could publish a main collection that is basically an index that points to more data/collections, that could be published anywhere. All of this is just optimizations though: “theoretically” the entirety of WikiLeaks, or certain parts of it, could be just published on any underlying platform and not their own servers.

Example actions: upload.

Example: Silk Road

I think this is my favorite example because the offline implementation is interesting. The Owner will have to periodically publish a collection. The collection will contain vetted users key pairs (buyers/sellers), a listing of items being sold, and auction state (that is, the current state of bids and bid history). To protect users from each other, parts of the collection will have to be encrypted cleverly. The owner(s) will have to maintain other offline state as well. There are a lot of devils in the details here, but of course that is also true for the online (real time) implementation.

Example actions: send-message-to-user, bid, add-listing.

Owner may have some backing infrastructure for managing transactions.

Example: The Pirate Bay

The collection will be an index of “magnet links” pointing to the BitTorrent Distributed Hash Table (so no tracker server is needed).

Example actions: add-torrent, dmca-takedown-request.

Conclusion

Designers and operators of censorship resistant services should consider if real time communication is strictly necessary. Does the benefits of a publishing approach like Zine, in their particular use case, outweigh the obvious drawbacks?

Thanks to Gunnar Kreitz for feedback on this post.