Tuesday, July 07, 2015

Let’s construct an elliptic curve: Introducing Crackpot2065

This article is meant to demystify some aspects of elliptic curve cryptography. The target audience is people who already have a basic understanding of applied cryptography but would like to close some gaps in knowledge when it comes to elliptic curves in general, and elliptic curve domain parameters in particular.

The first half of the article will lay the foundation on how ECC works, with examples from the “sage” mathematical software system. It will cover elliptic curves, curve equations, finite fields, addition/multiplication within the system, the Elliptic Curve Discrete Logarithm Problem (ECDLP), the group laws and the meaning of orders and subgroup orders. It will briefly mention some systems using the curves, such as ECDH and ECDSA, but the main focus is on the curves themselves and their properties.

The second half will focus on how to create a curve from scratch, suitable for cryptographic use. I will introduce a curve I call Crackpot2065 which is “safe” according to one criteria. I do not advise anyone to use the curve I’ve created for real purposes as there are many more well studied curves that have better security guarantees and implementations. I am just doing this for educational/expository reasons and that because I think it’s fun. With that said...

mad-hatter-2.jpg

Part 1: How Does Elliptic Curve Cryptography Work?


At a high level, Elliptic Curve Cryptography can be split in two broad areas. The first broad area concerns the mathematical “environment” in which we work, that is the elliptic curves themselves. The second broad area concerns the algorithms and systems using these structures to perform some interesting cryptographic operation, for example signing or key agreement. To namedrop some terms that you may have heard before: The first broad area is about structures such as the elliptic curve Curve25519 or NIST P-256. The second broad area is about algorithms such as ECDH and ECDSA.

Informally we can describe the “environment” as follows: ECC uses a mathematical structure called an elliptic curve over a finite field. An elliptic curve is a set of (x, y) points satisfying an equation that typically looks something like y^2 = x^3 + ax + b, where a and b are curve parameters that have constraints. The finite field is often the field of integers mod a prime number. That means that the set of points on the curve are also on integer coordinates. It is possible to define addition, and by extension multiplication, on these elliptic curve points. ECC works because if Alice has a secret integer s and a public key point s*P, then Eve cannot easily find s from the published point s*P. This is called the Elliptic Curve Discrete Logarithm Problem and it is believed that this is a very difficult problem to solve. The ECDLP problem form the basis of the security of the algorithms.

Let us understand the above paragraph by some examples. The examples below will use the computer algebra system Sage:

sage: ec = EllipticCurve(GF(2**255-19), [0,486662,0,1,0])
sage: ec
Elliptic Curve defined by y^2 = x^3 + 486662*x^2 + x over Finite Field of size 57896044618658097711785492504343953926634992332820282019728792003956564819949
First of all we have (as seen above) created an elliptic curve over a finite field, called “ec”. The number 2**255-19 is a big prime number. The equation is an elliptic curve where the equation is given on a Montgomery form. The three equation forms that we will discuss in this article are:

The short Weierstrass form: y^2 = x^3 + ax + b, where 4a^3+27b^2 is nonzero in GF(p).
The Montgomery form By^2 = x^3 + Ax^2 + x, where B(A^2-4) is nonzero in GF(p).
The Edwards form x^2 + y^2 = 1 + dx^2y^2, where d(1-d) is nonzero in GF(p).

All of the above equations are elliptic curves. It is possible to convert a curve written on one form to the other. Normally in the cryptographic literature the most common way to express an elliptic curve is on the short Weierstrass form. Many modern curves are using the other forms for performance reasons.

sage: Point = ec([9, 14781619447589544791020593568409986887264606134616475288964881837755586237401])
sage: Point
(9 : 14781619447589544791020593568409986887264606134616475288964881837755586237401 : 1)
Next we create a point on the curve on (x, y) coordinates. We have chosen a specific point with x coordinate 9. Using the (x, y)-coordinates directly in the obvious way are called “affine coordinates”. For performance reasons it is often desirable to use other coordinate systems instead. On the “projective coordinate system” the curve points are instead written as a 3-tuple (X : Y : Z) where x=X/Z and y=Y/Z which means that (x, y) = (x : y : 1). These are just two examples of coordinate systems that can be used when computing on elliptic curves.

sage: 80 * Point
(23817343013128792117854356851130396784845341481516229275406643496458380679609 : 35116296009168191515727253524688666433668212002350538291544025151813503173711 : 1)
sage: Point + Point
(14847277145635483483963372537557091634710985132825781088887140890597596352251 : 8914613091229147831277935472048643066880067899251840418855181793938505594211 : 1)

It is possible to do multiplication and addition on the points, as shown above, which will be discussed soon. What is not shown above is that elliptic curves often have a special point called the “point at infinity”. When dealing with curves on Montgomery or short Weierstrass form, the point at infinity exists. Edwards curves do not have a point at infinity.

What all curves have though, regardless of form, is a neutral element. Addition over the curve forms an abelian group. Consider the addition operation on the curve, then the laws for the abelian group hold (shamelessly copied from Wikipedia):

Closure: For all a, b in A, the result of the operation a • b is also in A.
Associativity: For all a, b and c in A, the equation (a • b) • c = a • (b • c) holds.
Identity element: There exists an element e in A, such that for all elements a in A, the equation e • a = a • e = a holds.
Inverse element: For each a in A, there exists an element b in A such that a • b = b • a = e, where e is the identity element.
Commutativity: For all a, b in A, a • b = b • a.

Lets have a look at this.

sage: ec(0) # The point at infinity is the neutral point for our curve
(0 : 1 : 0)
sage: Point + ec(0) == Point # Identity
True
sage: some_point = ec.lift_x(4)
sage: Point + some_point == some_point + Point # Commutativity
True
sage: -Point + Point # Additive inverse
(0 : 1 : 0)
sage: some_point_2 = ec.lift_x(1)
sage: Point + (some_point + some_point_2) == (Point + some_point) + some_point_2 # Associativity
True

Multiplication is defined as repeated addition. That is, 3*P = P+P+P. Multiplication can be implemented programmatically using repeated doubling and addition using an algorithm called double-and-add, or, more preferably, a similar approach called a Montgomery ladder that is similar but can be made to run in constant time instead, preventing timing attacks leaking information about the scalar (which is often the secret key).

sage: 3*Point == Point+Point+Point
True

As mentioned briefly above, the two primitive algorithms that must be implemented in an ECC system to get multiplication are double(P) and add(P, Q). The addition function often has the constraint that P != Q, which is why the specialization of addition, double(P), exists instead of just doing add(P, P) directly. If you go out of the way to try to derive an addition formula from the curve equation in short Weierstrass or Montgomery form, you will find that there can be a special “division by zero” step in the add(P, P) case. This is why most treaties on ECC always include algorithms for both add and double, to avoid making add(P, Q) riddled with special cases (the naive add(P, Q) for Short Weierstrass curves has many special cases as it is).

Exactly what is the meaning of addition on an elliptic curve, and how are the algorithms for add(P, Q) and double(P) implemented? Unlike pretty much all other ECC introduction articles on the Internet, I’m gonna leave this outside the scope of the document for the simple reason that I believe it will make the topic more confusing to understand. I can recommend Wikipedia for pointers and cool graphical illustrations. For the remainder of this article it’s enough to trust that addition follow the group laws above.

Elliptic curves on finite fields will by definition have a finite number of points. We call this the order of the curve. Counting the number of points on the curve can be done somewhat efficiently using somewhat complicated algorithms called Schoof’s algorithm or the more efficient improved algorithm Schoof-Elkies-Atkin, aka SEA.

sage: ec.order() # the number of valid points on the curve
57896044618658097711785492504343953926856930875039260848015607506283634007912

Consider an arbitrary point on the curve, for example the point “Point” in the sage example above. Repeated addition using this point forms a subgroup on it’s own: addition is cyclic in the this group which means that if you just add a point to itself (that is 0P, 1P, 2P, 3P…) you will eventually end up at the identity element 0P=0 where you started. We say that the point Point generates a cyclic subgroup using the point as the generator. This subgroup has an order on it’s own (the order in this case is the smallest n>0 such that nP=0):

sage: Point.order()
7237005577332262213973186563042994240857116359379907606001950938285454250989
sage: Point.order() * Point # order*P == 0*P
(0 : 1 : 0)

Due to Lagranges Theorem of group theory, the order of the subgroup will divide the order of the group.

sage: ec.order() / Point.order()
8

The above number is called the cofactor.
In Elliptic Curve Cryptography, the “domain parameters” of the curve include a base point which generates a subgroup of a large order. It is within this subgroup that we perform our curve operations with the assumption that the reversal of the public key s*P is a hard problem. The domain parameters must include a base point as it’s required to agree upon which subgroup to work in prior to perform cryptographic operations.

(Briefly) Introducing Curve25519 and ECDH


The elliptic curve given above, that is the equation y^2 = x^3 + 486662*x^2 + x over Finite Field GF(2**255-19) together with the base point (9, <large number>) gives the domain parameters for an elliptic curve called Curve25519, constructed by famous crypto guy Daniel J Bernstein. Curve25519 defines a public key as the x-coordinate of the point s*P where s is the secret key and P is the base point. The secret key is a random 255 bit number with the constraints that the number is a multiple of the cofactor (to prevent leaking a few bits of info during a certain class of attack called a small subgroup attack). The secret key is also constructed such that the highest bit is always set. This together with the Montgomery ladder ensure that no timing information will leak when using the secret key.

Curve25519 is not a general purpose curve in the sense that you can just take the curve and apply it to any algorithm. It is specifically constructed to be used with the Elliptic Curve Diffie Hellman algorithm. Some other curves (with “curve” here I mean the domain parameters including the base point, not just the equation) are more general purpose and can be used for other algorithms as well, such as ECDSA or others. You cannot take djb:s definition of Curve25519 as is and apply it to ECDSA because those algorithms require both (x, y) coordinate to work, and Curve25519 is only defined for the x-coordinate. y can be computed from x using the curve equation however this will result in two options due to the equations including a square root: either you will get the point A, or the points inverse -A. Curve25519 does not define which point to use in that situation. djb:s similar curve used for signing, Ed25519, does define which point to use in that case, and is as such more generic.

ECDH is the elliptic curve version of the Diffie Hellman key agreement. Alice and Bob would like to establish a shared secret. Alice has a secret key a and a public key A. Bob has a secret key b and a public key B. They both know each other public key. Alice computes aB and Bob computes bA. They use the x coordinate of the result as a shared secret. Because of the group laws, the result is the same (A is a*P and B is b*P so it follows that b*a*P = a*b*P).

sage: aliceSecretKey = 2**254 + 8 * randint(1, 2**251-1)
sage: A = aliceSecretKey * Point
sage: bobSecretKey = 2**254 + 8 * randint(1, 2**251-1)
sage: B = bobSecretKey * Point

sage: (aliceSecretKey * B).xy()[0]
16006640369170327653803989998811672408911551588929921312340351204923053532044
sage: (bobSecretKey * A).xy()[0]
16006640369170327653803989998811672408911551588929921312340351204923053532044

The observing attacker cannot recover the secret without solving the ECDLP, so ECDH is safe under the assumption that ECDLP is hard. ECDH is typically the least complicated Elliptic Curve Cryptography schema to understand. Other algorithms includes ECDSA which is a variant of DSA and it’s used for signing. Daniel J Bernstein has also created an algorithm called EdDSA, which is also an algorithm for cryptographic signatures with some other properties than ECDSA. It’s a neat algorithm where signatures are deterministic and they do not have the same catastrophic failure of ECDSA in case random numbers (ECDSA, as DSA, requires a random number “k” for signatures) are re-used by accident, which is what broke the Playstation protection a few years back.

This concludes our short overview of Elliptic Curve Cryptography fundamentals. The rest of this article will describe how to generate your own domain parameters, that is the prime for the finite field, the curve equation, and the base point.

Part 2: Let us create our own domain parameters!


For reasons that I have trouble to explain to my friends I got the strange idea to construct my own elliptic curve. That is I wanted to find a suitable prime for the finite field, a suitable equation and a suitable base point. Thankfully the SafeCurves project define a plethora of gotchas and requirements for a curve to be considered “safe”. I figured a good goal would be to create a curve that meets the SafeCurves requirements.

Step 1: Let’s choose a prime number


Because I have limited computational resources and limited patience, I quickly redefined my goal to find a curve that meets the requirements just enough. The smaller the numbers are, the faster I can find them with my 2009-era budget workstation. One of the main computational definitions of security in the SafeCurve criteria is that the work required to solve ECDLP (using Rho’s method) should be at least 2**100 operations. That means that the subgroup order O should be roughly 0.886*sqrt(O) > 2**100. Armed with this knowledge I set out to find good prime numbers around 2**200.

For performance reasons, most curves in practice have prime numbers that are very close to a power of 2. This means that the field operations, especially the modular step, can be implemented quickly using bit hacks. I enumerated all the prime numbers of the form 2**e-c where c is as small as possible and found some possibilities:

2**200 - 75
2**201 - 55
2**202 - 183
2**203 - 159
2**204 - 167
2**205 - 81
2**206 - 5
2**207 - 91
2**208 - 299
2**209 - 33

Using common reasoning I figured that the first few would be too small. So I settled for 2**206 - 5 just because the constant was the smallest which was aesthetically pleasing to me. It may be possible that the 204 or 203 one also meet the Rho requirement, but I made my choice.

It is highly desirable to pick prime numbers of the form p == 3 (mod 4) or p == 5 (mod 8) because in those cases it’s efficient to compute the square root in the finite field. This is beneficial for point compression: Given the point (x, y) you can typically “compress” that to just one of the numbers: the “receiver” can compute it on his side of the wire: for example given x you can compute y. Doing this requires the square root calculation.

The prime number I chose happens to be 3 (mod 4) which lends to quick square rooting.

Step 2: Let’s find a curve equation


Okay, now it gets a bit more complicated. What we want to do now is to, given the finite field we’ve chosen, find a curve equation that will given a base point in the curve generate a subgroup of very large order.

To make this easier, I decided to go with an Edwards curve. Edwards curves are parametrized on “d” only. By brute forcing all “d” I tried to find a curve that has a curve order that is a very large prime factor. That means I can later choose a base point that will (due to Lagranges theorem) generate a subgroup of the large prime factor order.

The pseudo code for sage would look something like this:

for all d = 2, 3, 4, …
   convert “d” to a curve on short weierstrass or montgomery form (so we can use the EllipticCurve constructor)
   find the order of the group using Schoof or SEA
   factor the number above
   pick a d such as the factors above contain one very large prime factor and some very small ones

The SafeCurves criteria additionally contains a requirement that the “twist” of the curve - which is to say another elliptic curve that is related to the one at hand - should have a large order as well. This is outside the scope of this article, but can be checked in a similar fashion. In fact, the SafeCurves criteria contain many other requirements that I’ve decided not to cover in this article at this point.

After around 8-12 hours on my slow computer I found the following:

d = 15688 gives
curve order = 2^2 * 25711008708143844408671393477455870117170824866529792321311457
twist order = 2^2 * 25711008708143844408671393477461333163539670934519578408332573

Alright! Those orders give a Rho cost of approximately 2**101.8 which is (slightly) greater than the required 2**100. The cofactor is a multiple of 4 which is another of the SafeCurves requirements, which is good for our purpose.

Step 3: Let’s find a base point


Given the curve x^2 + y^2 = 1 + 15688x^2y^2 over GF(2**206-5) we want to find a base point with the order given above, that is 25711008708143844408671393477455870117170824866529792321311457. Again remember that Lagranges theorem states that the order of the subgroup will be a divisor of the parent group order. So we:

let L = desired order 25711008708143844408671393477455870117170824866529792321311457
let Possibilities = {divisors of curve order}
for all y = 2, 3, 4, …
   find point P = (x, y) on the curve, P != neutral
   check that L*P = neutral and that a L is the smallest integer in Possibilities where L*P = neutral

Fortunately the search was pretty quick because the first y I tried, y=2, happened to be one that had the desired order.

That gives us the base point
(21562982200376473978929601187519256941293801462850163935410726, 2). Actually there is another choice: the inverse of this point. But we go for this one because it has the smallest x-coordinate of the two choices. A common convention is to define the “negative” point to be the one with the larger integer, similar to the convention with signed integers having the high bit set.

Step 4: Let’s sanity check against SafeCurves


The SafeCurves project actually contains a verification script that will try all the criterias and output the result. Lets try that on the new curve (before we do this we have to find a point generating the entire curve, similar to above, and also we have to supply lots of prime numbers so the SafeCurves program doesn’t have to do the factoring itself. I used msieve to compute the factorizations).

$ sage verify.sage data/curves/crackpot2065/
$ cat data/curves/crackpot2065/verify-safecurve
True

Of course, just because this curve meets the requirements doesn’t mean it’s safe: that requires that the implementation of the curve is actually safe too, and that it’s used sanely with the algorithm (ECDH, ECDSA, EdDSA etc.). And just because the curve is small doesn’t mean it’s fast either. In practice, if you want to use a “small” curve go with Curve25519 or Ed25519, depending on use case, or use something such as NaCL or libsodium.

Step 5: Now what?


The above description barely scratches the surface of finding good elliptic curve domain parameters. Even assuming the parameters are safe in a sense, there are still problems that need to be solved in order to use a curve in practice. One big issue would be to implement the curve in a safe and high performing way. Another big issue would be to use the curve in a correct way with the available algorithms. Is it tempting to look at the curves separate from the algorithms because that will give a warm fuzzy feeling for software engineers who like composability. However, there’s a danger in just taking a curve and “plugging it in” in any arbitrary algorithm. That’s a part that requires some knowing what you are doing.

With that said, in practice, it is often possible to compose a curve and an algorithm and end up at a “system” (i.e. a curve and an algorithm together in a specified way) that actually works. This is the case with some of the NIST families of curves and ECDSA. In other cases more info and care needs to be taken, such as with Curve25519, which is really the “system” of the curve defined in the Curve25519 paper, together with the ECDH algorithm.

Can my Crackpot2065 curve be used in practice? There are two sides to this. One the negative side, using this curve would require more work, such as defining the secret key space (which should probably be a multiple of 4 with a high bit set), and a good implementation, and some clear thought on how to use it with the algorithms without doing something moronic. On the positive side, if everyone used the same curves then an attacker with lots of resources (or at least the theoretical attacker) would be inclined to study or do pre-computations on those curves, which would be an argument for a more diverse set of domain parameters than are commonly used today.

As a fun thought experiment, a friend of mine mentioned a scenario where curve parameters can be generated so quickly that two parties can compute them together before commencing with the rest of the protocol in question. This is of course not very practical, but imagine a situation where generating domain parameters can be done very quickly, instead of requiring hours of computations.

Conclusion


My goal with this article was to demystify the elliptic curve domain parameters. By choosing how one might go about and pick those, I hope that this has cleared up at least some confusion on the topic. I have basically written the article I would have liked to read myself before I started this project. That would have saved me about of week of work.

If you are more interested in domain parameters, please check out the excellent SafeCurves project which includes lots of good information and references on parameter selection.

… Having said that, I may have lied a bit when I told you that my curve is “safe” according to the SafeCurves criteria. Because the verify.sage script has an input called “rigid”, which is up to the person running the verification script to decide. I decided to call Crackpot2065 “fully rigid”, according to the SafeCurves definition: “Fully rigid: The curve-generation process is completely explained.” This blog post is hopefully rigid enough. :-)

Have fun and don’t shoot yourself in the foot with your new crypto gun.

Obligatory caveat: The author of this article, Björn, is not a real cryptographer. There may be mistakes in this article. I apologize for any inconvenience caused by this. If you find a problem or have questions, just email me at be@bjrn.se.