Cryptography and Number Theory for Digital Cash

by J. Orlin Grabbe


Note: In order to read the following text properly, your web browser must be able to handle subscripts and superscripts; i.e. the html commands “sub” and “sup”. These commands are used sparingly, but they are used.

The ability to attack or misuse a digital cash system depends a lot on the cryptography and cryptographic protocols involved. A good cash system uses good cryptography. Good cryptography requires strong algorithms, keys of adequate length, and secure key management. Without these, there can be little privacy.

A good cash system uses good protocols. The digital cash system of Stefan Brands, for example, is largely based on digital signatures--the Schnorr signature scheme, in particular. The system, properly implemented, is thus only as secure as the underlying signature system. If there are flaws in the latter, the attractive features of anonymous digital cash disappear also--features like untraceability, unlinkability, and security for all parties.

If we are looking for privacy and security in banking, we can't ignore cryptography and we can't ignore mathematics. The ability of digital cash systems to give some measure of security and privacy has more to do with number theory than banking regulations.

A. Some Preliminary Definitions

Encryption is the process of transforming data or a message into some unintelligible form, but in such a way that the transformation is reversible, so that it is possible to restore the original data or message. The process of encryption involves some recipe, or set of instructions, which gives the step-by-step procedure for transforming the data. This recipe is called an algorithm. An encryption algorithm must be associated with a reverse procedure that restores the original message. The reverse process is called decryption. Taken together, the techniques of encryption and decryption are called cryptography.

Algorithms--such as DES or RSA--are, in general, publicly known. This has the advantage that any weaknesses in an algorithm can be found by researchers. Any attempt at "security through obscurity", by keeping an algorithm secret, is usually self-defeating, because it will most likely simply mask weaknesses in the algorithm itself which could be exploited by anyone who has figured out what the algorithm is. But when a good publicly-known algorithm is used, security depends on the length of the key, which may be thought of as a string of 0s and 1s used in the algorithm to transform the data.

One example of a key transformation is in connection with the XOR function used in binary arithmetic. The XOR function follows these rules of binary addition:

0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 0

Because these addition rules are different than those in ordinary arithmetic (in particular the rule that says 1+1 = 0), we will use “+” to denote XOR addition. Using XOR addition, any string of zeros and ones can be used as a key to encrypt and decrypt data. Suppose we have a "message" which is a string of four binary digits, M = 1011. Suppose we also have a four digit key k = 1010. We can now encrypt the message M, using the key k, by adding the two of them to get C = M “+” k:

M:1011
k:1010
C:0001

The encrypted message C is 0001. Anyone looking at this value will not know the original message M, because it could have been any one of sixteen possible combinations of four zeros or ones.

But, knowing the key, it is easy to restore the original message. All we have to do is add the key k to the encrypted message C, following the XOR rules of addition, and we are left with the original message M:

C:0001
k:1010
M:1011

In general, M = (M “+” k) “+” k. Performing an XOR addition with the same key (or similar block of data) twice restores the original message.

Security depends on the length of the key k. Here there are sixteen possible keys of length four. The first binary digit can be either zero or one, so there are 2 possibilities. The second digit can also be either of two values, so there are 2x2 = 4 possible keys of length two. There are 2x2x2 = 8 possibilities for a key of length three. A key of length four has sixteen possibilities. Sixteen is 2 to the 4th power. In general, for a key of length N, there are 2 to the Nth power (2^N) possible keys.

Commercial DES uses keys of length 56, so the key space has 2^56 (2 to the 56th power) possibilities. As large as this key space is, it is no longer large enough for security, given current techniques of cryptanalysis. It is even possible to mount a "brute force" attack on a message. A brute force attack is one that simply tries out every possible key. On the other hand, a key space of 2^128 is quit secure--at least with respect to a brute force attack. A key space of 2^128 is 2^72 times as large as the common DES key space of 2^56.

XOR addition is used in two DES encryption modes. Electronic code-book (ECB) mode encrypts each 64-bit text block individually. Two other, more secure, modes, however, make each encryption block dependent on each of the preceding text blocks. Cipher Block Chaining (CBC) is a mode of DES encryption in which the current 64-bit plaintext block is XORed with the previous cipherblock before encryption with the 56-bit DES key k. If C(t) is the t-th 64-bit cipher block, and M(t) is the current 64-bit message block, then in CBC,

C(t) = Ek(M(t) “+” C(t-1)). (CBC)

Cipher Feedback (CFB) is a mode of DES encryption in which each plaintext block is XORed with the previous cipherblock text after the latter is encrypted with the 56-bit DES key. In CFB,

C(t) = M(t) “+” Ek(C(t-1)). (CFB)

(The block sizes can be smaller than 64-bits in CFB.)

The key k used in the previous XOR example was symmetric. The same key was used to encrypt and decrypt. DES and IDEA are two well-known symmetric key algorithms. Symmetric key algorithms tend to be somewhat ad hoc-- clever cookbook recipes that work. Public key, or asymmetric, systems, by contrast, use two different keys for encryption and decryption. RSA is a well-known asymmetric system. Public key systems are typically based on relationships from number theory, and the process of encryption works by raising the message M to some power a: C = M^a. Decryption works by raising the encrypted message C to another power b: M = C^b. The first key, or power, a will be known to everyone--hence it is called a "public" key-- while the key b will be kept secret by the user--hence it is called a "secret" or "private" key

B. Some Preliminary Mathmatical Notation

A message to be encrypted or sent will be generally denoted as M. Remember that, to the computer, the string of 1s and 0s that make up M can be treated as a binary number, whether M is “I love you” or “You owe me $500” or “Bank number 437695B.” Encryption may involve raising this number to a power. The notation E(x) will denoted an encryption algorithm or function, while D(x) will denote a decryption algorithm or function. An encrypted message would thus be designated as E(M), while a decryption of this encrypted message would be shown as D(E(M)). Of course this latter term is just the original message, so D(E(M)) = M. If a message is encrypted or decrypted using the symmetric key k, then the notation Ek(x) or Dk(x) will be used. If a public-key system is involved, then the public key will be denoted pk, while the private (secret) key will be denoted sk.

In a symmetric encryption system, Dk(Ek(M)) = M, because the same key is used to decrypt the message that was used to encrypt it. In a public-key system, one key must be used to encrypt and the other to decrypt. Hence Dsk(Epk(M)) = M and also Dpk(Esk(M)) = M. In the first case, the message is encrypted with Alice's public key, then decrypted by Alice using her secret key. In the second case, common in digital signatures, Alice encrypts the message with her secret key, and then it is decrypted by someone else using Alice's public key.

The sign “^” will be used to mean “raised to the power of”; for example, 2^3 = 8, since 2 raised to the power of 3 equals 8. (Note that in the computer language Fortran, b^y would be written b**y.) A single *, by contrast, denotes multiplication: “three times b equals 12” would be written “3*b = 12”, or perhaps simply as “3b = 12”. The notation log_b (y) will denote the logarithm of y with respect to base b. For example, log_2 (8) = 3, since the logarithm of 8 (to the base 2) is 3.

Finally, mod p will denote “arithmetic done modulo p”. By this we will mean "divide by p, and keep the remainder r, 0<= r < p."

Modulo 3, for example, will divide any positive whole number by 3, and take the remainder (which will be either 0, 1, or 2). For example, 7 mod 3 = 1, since 3 goes into 7 twice, with 1 left over. The number we divide by (3, in this example) is called the "modulus". Similarly, 62 mod 25 = 12. That is, when 25 is the modulus, 62 = 25*2 + 12 = 12. But if 3 were the modulus, 62 = 3*20 + 2 = 2.

In general, we will write a = b mod n, meaning that a mod n = b mod n.

For example, 67 = 11 mod 7, because 67 mod 7 = 4 and also 11 mod 7 = 4. Hence

a= b mod n

means for some integers k1 and k2,

a = k1*n + r (0<= r < n)
b = k2*n + r .

Hence also

a - b = (k1-k2)*n,

so that n divides into a-b a whole number of times (namely, k1-k2 times). Restated, “n divides a-b”, which may also be written
n|(a-b)” .

Thus, in the previous example,

67 = 11 mod 7

means for integers k1 = 9 and k2 = 1,

67 = 9*7 + 4 (0<= 4 < 7)
11 = 1*7 + 4 .

Hence also

67-11 = (k1-k2)*7 = (9-1)*7 = 8*7,

so that 7 divides into 67- 11 a whole number of times (8 times). We could also write 7|(67-11) or 7|56.

If we do calculations this way, using whole numbers, dividing by the modulus, and throwing away everything except the remainder, we are doing modular arithmetic. Most computers are not constructed to do modular arithmetic very well, because they aren't set up to keep track of whole numbers beyond a certain length: they would much rather stick in a decimal point somewhere, and keep track of only a few numbers (floating point arithmetic). Consider, for example, 3.7B3579D4F6821 x 16^73. This is a very large number, but the computer essentially just keeps track of 16 hexadecimal numbers (64 bits): namely 3,7,B,3,5,7,9,D,4,F,6,8,2,1 and the digits in the power: 7 & 3. But public key cryptography may use "keys" whose length is 2048 bits or more. Therefore most computer implementations of public key cryptography involve computer software which works around the hardware limitations, or else use specially constructed cryptographic chips.

Remember also that by convention exponentiations take place before multiplications. Hence 3*5^2 = 3*25 = 75. But (3*5)^2 = 15^2 = 225. Note also that (3*5)^2 = 3^2*5^2 = 9*25 = 225. That is, in general, (x*y)^z = x^z*y^z.

C. Modular Arithmetic and the Groups Z(p)* and G(q).

In most of the calculations involving public key cryptography and digital cash, we will be dealing with the set of integers from 0 to p-1, where p is some large prime. This follows from the fact that we will be doing multiplication mod p. We will also be using a set of integer powers from 1 to q, where q is some large prime number that divides p-1. That is, multiplication and exponentiation (taking powers) will take place within the groups Z(p)* and G(q), which we will define and explain in this section.

The two groups Z(p)* and G(q) are very important for public key cryptography and digital cash. They play roles in Diffie-Hellman key exchange, in the Schnorr signature scheme, in the Digital Signature Algorithm, and in the digital cash system of Stefan Brands. That is, the arithmetic is done in Z(p)* or in a group G(q) of powers of prime order q, where q divides p-1. So it is important to understand what these groups are.

The set Z(p)* = {1, 2, 3, 4, ..., p-2, p-1}. If we multiply any two numbers in this set together, and reduce the product mod p, the result is a number in the set. So the set is closed under multiplication. In addition, we if take any number k from the set, there exists another number k^-1, such that k*k^-1 = 1 mod p. That is, any number in the set has a multiplicative inverse. These two characteristics mean that Z(p)* is a group under multiplication mod p. Sometimes the term "multiplicative group" is used. Since Z(p)* is a group under multiplication, it is also a group under exponentiation (taking powers), since the n-th power of a number is simply the multiplication of a number by itself n times. (Note that 0 is omitted from Z(p)* because it doesn't have a multiplicative inverse. If we add 0 to the set Z(p)*, we get the set Z(p), which consists of all remainders mod p, including 0.)

For example, Z(11)* = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. If we multiply 5 and 8 from the set, we have 5*8 = 40 = 7 mod 11, and 7 is an element in the set. Also we have 5*9 = 45 = 1 mod 11, so that 9 is the multiplicative inverse of 5. Similarly, 5 is the multiplicative inverse of 9. If k=5, then k^-1 = 9. Similarly, 2 and 6 are multiplicative inverses, as are 3 and 4. What is the multiplicative inverse of 10? (Answer: 10 is its own multiplicative inverse, since 10*10 = 100 = 1 mod 11.) Also, if we exponentiate a number from the set, say 6 to the third power, we have 6^3 = 6*6*6 = 216 = 7 mod 11, we are again left with an element in the set. The set is closed under multiplication and exponentiation mod 11, and each element has an inverse, so Z(11)* is a group.

Each element has a multiplicative inverse in Z(p)* because p is a prime number. And since p is prime, the only common divisor of p and each of the numbers in the set Z(p)* = {1, 2, 3, ..., p-1} is 1. Restated, the greatest common divisor (gcd) of p and any number in the set is 1. That is, gcd(1,p) =1, gcd(2,p) = 1, gcd(3,p) = 1, . . . , gcd(p-1,p) = 1.

The same is not true if we do modular multiplication with a composite number (i.e. a number which is the product of at least two numbers, each greater than 1). For example, the number 15 = 3*5, so it is composite. Suppose we do mutiplication mod 15, using numbers from the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}. What is the inverse of the number 6 from this set? Answer: there is no inverse for 6, as we can see by multiplying 6 by all numbers less than 15:

6*0 = 6*5 = 6*10 = 0 mod 15
6*1 = 6*6 = 6*11 = 6 mod 15
6*2 = 6*7 = 6*12 = 12 mod 15
6*3 = 6*8 = 6*13 = 3 mod 15
6*4 = 6*9 = 6*14 = 9 mod 15.

There is no number which multiplied by 6 equals 1 mod 15. In addition, we obtain 0 as the result of some multiplications, so the set is not closed. (6 and 5 are in the set, but 6*5 mod 15 = 0, which is not in the set.) Hence, this set (the set of whole numbers from 1 to 14) is not a multiplicative group mod 15.

Returning to the set Z(p)*, we can define some other operations, in addition to multiplication and exponentiation. We can define division by k as multiplication by the inverse of k, namely k^-1. Thus 8/k = 8*k^-1, by definition. If k = 9 in Z(11)*, then 8/9 = 8*9^-1 = 8*5 = 40 = 7 mod 11. Similarly, 3/10 = 3*10^-1 = 3*10 = 30 = 8 mod 11.

Let g be a member of Z(p)*. Then g is said to be a generator mod p if the set of powers of g, namely the set {g^1 mod p, g^2 mod p, . . ., g^(p-1) mod p}, contains, in some order, all the members of Z(p)*:

{g, g^2, g^3, ... , g^(p-2), g^(p-1)} mod p

= {1, 2, 3, ... , p-2, p-1}, in some order.

That is, the set Z(p)* = {1, 2, ... , p-1} represents a rearrangement of {g, g^2, g^3, ... , g^(p-1)}, when all calculations are done mod p. (For convenience we write mod p outside the brackets when it applies to each element in the set, or omit it altogether, if it is understood.)

For example, 3 is a generator of Z(7)*, since

3^1 = 3 mod 7
3^2 = 2 mod 7
3^3 = 6 mod 7
3^4 = 4 mod 7
3^5 = 5 mod 7
3^6 = 1 mod 7.

Or, restated,

{3, 3^2, 3^3, 3^4, 3^5, 3^6} = {1, 2, 3, 4, 5, 6}

when calculations are done mod 7. The two sets have the same elements, although not necessarily in the same order. A rearrangement of the order of the elements of a set is called a permutation. The powers of the generator 3 give a permutation of Z(7)*.

A generator-tuple mod p is a set of k generators, which are all different. That is, {g1, ... , gk} is a generator-tuple if each gi is an generator mod p, and also gi is not equal to gj , if i is not equal to j.

For example, {3, 5} is a generator-tuple of Z(7)*, because both 3 and 5 are generators of Z(7)*. Each element in Z(7)* can be represented both as a power of 3 and a power of 5:

1 = 3^6 mod 7 = 5^6 mod 7
2 = 3^2 mod 7 = 5^4 mod 7
3 = 3^1 mod 7 = 5^5 mod 7
4 = 3^4 mod 7 = 5^2 mod 7
5 = 3^5 mod 7 = 5^1 mod 7
6 = 3^3 mod 7 = 5^3 mod 7.

The number 2 is not a generator mod 7, because the powers of 2 only yield 1, 2, or 4, mod 7:

{2, 2^2, 2^3} = {1, 2, 4}.

Note, however, that the powers of 2 mod 7 yield a subset of Z(7)*. The set {1, 2, 4} is a subset of {1, 2, 3, 4, 5, 6} = Z(7)*. The number 2 is thus said to "generate the subgroup G(3)" mod 7. The designation "G(3)" means there are 3 elements in the group. Alternatively stated, 3 is the lowest power of 2 that yields 1 mod 7. G(3) is a group, because it is closed under multiplication mod 7:

           
	1*1 = 1 mod 7        2*1 = 2 mod 7          4*1 = 4 mod 7      
	1*2 = 2 mod 7        2*2 = 4 mod 7          4*2 = 1 mod 7           
	1*4 = 4 mod 7        2*4 = 1 mod 7          4*4 = 2 mod 7           

and because each element in G(3) also has an inverse.

Note that the number 4 is also a generator of G(3) = {1, 2, 4} mod 7, since

{4, 4^2, 4^3} = {4, 2, 1} mod 7 = {1, 2, 4}.

A group generated by an element g is said to have order q mod p provided q is the lowest power such that g^q = 1 mod p.

The two generators of Z(7)* that we saw previously, namely 3 and 5, are said to have order 6 mod 7, because 6 is the smallest power of 3 or 5 that gives 1 mod 7. That is, 1=3^6=5^6 mod 7, and no smaller power has this property. By contrast, the generators of G(3), namely 2 and 4, have order 3, because 2^3 = 4^3 = 1 mod 7, and no lower power of 2 or 4 equals 1.

In general, for q prime, 1< q < p, we define G(q) as the group (or subgroup) of prime order q, mod p, if for some generator g, 1 < g < p, we have that {g, g^2, g^3, ..., g^q} is a subset of Z(p)*. That is, the powers of g yield each of the elements in the subgroup. Note, by definition, q is the lowest power of g that gives 1; hence, g^q = 1 mod p. Thus powers larger than q simply start over and run through the same set of numbers. If g^q =1 mod p, then g^(q+1) = g mod p, g^(q+2) = g^2 mod p, and so on.

For example, 2^3 = 1 mod 7. So higher powers of 2 yield the same numbers over and over:

2^4 = 2^1 = 2 mod 7
2^5 = 2^2 = 4 mod 7
2^6 = 2^3 = 1 mod 7
etc.

Note that if g is an element of the group Z(p)*, then g is a generator of Z(p)* if g is an element of order p-1. That is, if g^(p-1) = 1, and no lower power equals 1. For then, it would necessarily follow that the powers of g mod p--namely g^1, g^2, ..., g^(p-1)--run through all the numbers 1, 2, ..., p-1.


Fermat's theorem says that for any prime p, and number k not divisible by p, we have

k^(p-1) = 1 mod p

Of course, the integers 1, 2, 3, . . ., p-1 are not divisible by p, so any of these integers raised to the power p-1 equals 1 mod p, by Fermat’s theorem.

Hence for k an element of Z(p)*, we have k^(p-1) = 1 mod p.

For example, for Z(11)*, we have p-1 = 10, and a check shows that, mod 11,

1^10 = 2^10 = 3^10 = 4^10 = 5^10 = 6^10 = 7^10 = 8^10 = 9^10 = 10^10 = 1.

Note that we are not saying that any number k<p has order p-1 mod p. The order of k may be smaller than this. For example, 2 has the order 3 mod 7, as 2^3 =1 mod 7. But it is also true that 2^(7-1) = 2^6 = 1 mod 7, as required by Fermat's theorem. And obviously 2^6 = (2^3)^2 = (1)^2 = 1 mod 7.


It is easy to see that, as a consequence of Fermat's theorem, the order q of any element of a multiplicative group mod p must divide p-1. This is known as Lagrange's theorem.

For example, in the case p = 7, we have p-1 = 7-1 = 6, so the order of any element must divide into 6 a whole number of times. We saw previously that 3 and 5 have order 6 in Z(7)*, and 6|(7-1). Similarly, we saw that 2 and 4 have order 3 mod 7, and 3|(7-1).

The reason it works this way is because if an element g is of order q mod p, then g^q =1 mod p. But it's also true by Fermat's theorem that g^(p-1) = 1 mod p. So if q didn't divide p-1 a whole number of times, then for some number k, p-1 = k*q + r, where 0 <r <q. Thus we would have that 1 = g^(p-1) = g^(k*q+r) = (g^q)^k*g^r = 1*g^r, which implies g^r = 1. Which in turn implies g has order r less than q, a contradiction.


Euler's totient function t(n) for a positive integer n is the set of numbers less than n that are relatively prime to n. That is, the number of positive integers k, 0<k<n, with gcd(k,n) =1.

If n = p is prime, then all positive numbers less than p are relatively prime to p, so t(p) = p-1.

For example, for n = p = 7, the numbers 1, 2, 3, 4, 5, and 6 are all relatively prime to 7, so t(7) = 6. For n = 4, the numbers 1, 3 are relatively prime to 4, so t(4) = 2. For n = 15, the numbers 1, 2, 4, 7, 8, 11, 13, 14 are relatively prime to 15, so t(15) = 8. (The other numbers, namely 3, 5, 6, 9, 10, 12, have divisors in common with 15.)


Euler's theorem states that for any number n and any number k relatively prime to n, we have

k^t(n) = 1 mod n

Note that Euler's theorem applies to composite numbers n, as well as prime numbers. For example let n = 15. The number 2 is relatively prime to 15, so by Euler's theorem we have

2^t(15) = 2^8 = 1 mod 15.

We will use Euler's theorem later when we look at the RSA crypto-system. RSA uses large numbers n which are composite (namely the product of two primes), and hence Fermat's theorem does not apply: it is not true, in general, that k^(n-1) = 1 mod n, for n composite, even if k is relatively prime to n. For example, 2^(15-1) = 2^14 = 4 mod 15. That is, the order of k does not necessarily divide n-1, for n composite. The number 2 is relatively prime to 15, but it has order 4, as 2^4 = 1 mod 15. And 4 does not divide 15-1 = 14. However, the order 4 does divide t(15) = 8.


Another result from number theory (the proof of which will not be explored here) is that, for p prime, the number of generators mod p is t(p-1).

For example, the number of generators mod 7 is t(7-1) = t(6). Now t(6) is by definition the number of positive integers less than 6 that are relatively prime to 6. There are two such numbers; namely, 1 and 5. So there are a total of two generators mod 7. Both of these generators (namely, 3 and 5) were shown previously. (The significance of the numbers 1 and 5 here is that if we have a generator g, then both g^1 and g^5 will be generators. Thus 3 and 3^5 = 5 mod 7 are generators. Alternatively, since 5 is a generator, both 5 and 5^5 = 3 mod 7 are generators.)

Consider now the group G(q) of prime order q mod p (i.e., both p and q are prime). Since the order of any element mod p must divide p-1, it follows that q must divide p-1. How many generators of the subgroup G(q) are there? The answer is that for each m that divides p-1, there are t(m) generators of order m, where t is Euler's totient function. In the case m=q is prime, there are thus t(q) = q-1 generators. That is, there are q-1 generators of the subgroup G(q) of prime order q mod p. This fact assures us that for large q we have plenty of generators to choose from.

For example, since 3 divides 7-1 = 6, there are t(3) = 2 generators of order 3 mod 7. That is, there are two generators of the subgroup G(3) = {1, 2, 4}. (Note that both 2 and 4 are generators of G(3).) Similarly, since 2 divides 7-1 = 6, there are t(2) = 1 generator of order 2 mod 7. (Check that 6 is a generator of order 2 mod 7, yielding the subgroup G(2) = {1, 6}.)

Notice that since 2, 3, and 6 all divide p-1 = 7-1, the possible subgroups mod 7 are these:

Subgroups of Z(7)*
Generators gi mod 7
G(2) = {1, 6}6
G(3) = {1, 2, 4}2, 4
G(6) = {1, 2, 3, 4, 5, 6}3, 5

We see that G(6)=Z(7)* has only two generators (and not 6-1 = 5), because 6 is not a prime factor of p-1. Rather, t(6) = 2.

For p = 23, and q = 11, there are t(11) = 10 generators of G(11) = {2, 4, 8, 16, 9, 18, 13, 3, 6, 12, 1}. The number 2 is one such generator mod 23. Any member of G(11), except the integer 1, is a generator of the group.

Note in this last example that 2 and 22 also divide p-1 = 23-1. There are thus t(2) = 1 generator of G(2), and t(22) = 10 generators of G(22)=Z(23)*. Thus, of the 22 elements in Z(23)*, 10 are of order 22, 10 of order 11, and 1 is of order 2. (Check that G(2) = {1, 22}.)

Subgroups of Z(23)*
Generators gi mod 23
G(2) = {1, 22}22
G(11) = {1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18}2, 3, 4, 6, 8, 9, 12, 13, 16, 18
G(22) = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22} 5, 7, 10, 11, 14, 15, 17, 19, 20, 21

We can define discrete logarithms in either Z(p)* or G(q) by the following mechanism. Let g be a generator of one of these two groups, and let y be the x-th power of g, modulo p:

y = g ^ x mod p, or y = gx mod p

Then x is the (discrete) logarithm of y to the base g, modulo p:

x = log_g (y) mod p, or x = logg (y) mod p

For example, 2^5 = 9 mod 23, so 5 = log2(9) mod 23. The integer 5 is the log of 9 (to the base 2) mod 23. Similarly, 3^6 = 1 mod 7, so 6 = log_3 (1) mod 7. The integer 6 is the log of 1 (to the base 3) mod 7.

Note for a generator g of G(q) mod p, that since g^q =1, we always have q = log(1) mod p, for any base (generator) g. Hence, for a group of order q, q plays the role of zero, as g^q = g^0 = 1 mod p, and hence q = log(1) = 0 mod q.

So for any power x of g in G(q), g^x mod p may be reduced mod q:

g^x mod p = g^(x mod q) mod p.

For example, we saw 2 had order q = 3 mod 7, since 2^3 mod 7 = 1. Hence for a power larger than 3, say 8: 2^8 mod 7 = 2^(8 mod 3) mod 7 = 2^2 mod 7 = 4. Once we get to the q-th power of g, g^q = 1, so for x = k*q+r, we have g^(k*q+r) = g^(k*q)*g^r = (g^q)^k*g^r = 1^k*g^r = g^r. So we can reduce x by dividing by q, and keeping only the remainder r.

For future reference, note the following fact about powers of g. If we have two numbers X = g^x mod p and Y = g^y mod p, then

X*Y = g^x*g^y = g^(x+y) mod p.

By contrast to this, we have

X^y = (g^x)^y = g^(x*y) mod p,

while

Y^x = (g^y)^x = g^(x*y) mod p.

Knowledge of the first result, g^(x+y) mod p, doesn't tell us anything about the latter result, g^(x*y) mod p. Only if we first took logs to the base g, and calculated x = log X mod p or y = log Y mod p, could we calculate g^(x*y).

For a simple example using small numbers, suppose g = 2 and p = 25307. Suppose also you observe X = 6113 and Y = 7984. Thus you know that

X*Y = 6113*7984 = 2^(x+y) mod 25307.

That is,

14296 = 2^(x+y) mod 25307.

But what is g^(x*y) = 2^(x*y) mod 25307?

To answer this question, you need to know x = log2 6113 mod 25307 or y = log2 7984 mod 25307. The difference between g^(x+y) and g^(x*y) brings us to Diffie-Hellman key agreement.

D. Diffie-Hellman & Discrete Logarithms

Diffie-Hellman key agreement is a key negotiation technique created by Whitfield Diffie and Martin Hellman in 1976. It provides a secure way for two parties to agree on (negotiate) a shared symmetric key by which to encrypt their current exchange of messages. Diffie-Hellman key exchange has as its practical basis the fact that it is easy to calculate powers in modular arithmetic, but hard to extract logarithms. That is, it takes considerable computer time and money to calculate discrete logarithms relative to the calculation of powers.

Discrete logarithms are believed to be examples of one-way functions. A one-way function is one which can be calculated or computed relatively efficiently in one direction, as compared to the reverse direction. That is, a function f(x) is a one-way function if it is easy to calculate y = f(x), given x, but it is hard to calculate or find x given y.

For a generator g of Z(p)*, calculation of = g^x mod p from x is easy, but computing x from y is difficult--at least as long as the prime p is sufficiently large. "Sufficiently large" in 1997 meant at least 768 bits, but preferably 1024 bits (which would allows for prime numbers less than 2^1024). There are about 10^305 primes less than 2^1024, so there are plenty to choose from. (The number of primes less than or equal to n is asymptotic to n/ln(n). So a good approximation to the number of primes less than n is simply n/ln(n), as long as n is large. Here and elsewhere "ln" is the natural logarithm, the logarithm to the base e = 2.718281828....)

Now suppose two people want to exchange infomation, but they want to do so in private. The first thing they have to do is agree on a cryptographic key to encrypt their messages. Diffie-Hellman key agreement is a way for two parties to decide on a cryptographic key for their use in the current message-exchange session, without fear that a third party will be able to intercept this key, and consequently decrypt their traffic. It relies on the fact that the two parties who are exchanging messages only have to compute powers in modular arithmetic in order to know what the key they are agreeing on is, but the eavesdropper will have to compute discrete logarithms to obtain the same key from their initial message exchange. (A different attack, called “man in the middle,” will not be discussed at this point.)

An example of a Diffie-Hellman calculation is given in Table 1. It might be useful to read through the steps in the Table, prior to reading the following explanation of this technique.

Suppose Alice and I decide to use Diffie-Hellman key agreement to establish a session key. This key will be used to encrypt the messages we send each other, for this occasion only. Each of us has previously agreed on a generator g of Z(p)* and a prime number p. As noted in the previous section, the generator g (1<g<p) has the property that g, g^2, g^3, ... , g^(p-1), yield--in some order--all the numbers 1, 2, 3, ..., p-1, when calculations are done mod p.

Table 1: Steps in Negotiating Diffie-Hellman Key Agreement

Diffie- Hellman
Example
A and B agree on two numbers, a generator g of Z(p)* and a prime p.Alice and I agree g = 7, while p = 23.
A chooses a secret random integer x.Alice chooses x = 5.
B chooses a secret random integer y.I choose y = 8.
A computes X = g ^ x mod p.Alice computes X = 7 ^ 5 mod 23 = 17.
B computes Y = g ^ y mod p.I compute Y = 7 ^ 8 mod 23 = 12.
A sends X to B, and B sends Y to A.Alice sends 17 to me; I send 12 to Alice.
A computes K = Y ^ x mod pAlice computes K = 12 ^ 5 mod 23 = 18.
B computes K = X ^ y mod p I compute K = 17 ^ 8 mod 23 = 18.
K is the encryption key for this communication session.K = 18 is the encryption key for this communication session.

For example, Alice can simply choose g and p herself, and send them to me. It doesn't matter if anyone else knows what these are. Alice now decides to send me a message. She needs an encryption key to encrypt this message. So to start the process, Alice first chooses a random number x, where 1< x < p-1. Alice keeps x secret and sends me X instead:

X = g^x mod p.

I, in turn, choose a random number y (where 1 < y < p-1) and send Y,

Y = g^y mod p,

back to Alice. Now Alice computes

K = Y^x mod p = g^(y*x) mod p

by raising the number she got from me to the power x, and taking the result mod p. Meanwhile, I compute

K = X^y mod p = g^(x*y) mod p

by raising the number I got from her to the power y, and taking the result mod p. Since the numbers we each calculate are equal, we both possess the same session key:

K = session key = g^(x*y) mod p = g^(y*x) mod p .

For a sufficiently large g and p, there is no known computationally feasible way for anyone who does not know either x or y to compute g^(x*y) mod p, unless they first tackle the difficult problem of taking the logarithm of g^x mod p or that of g^y mod p.

If p is a prime about 1000 bits in length (about 300 decimal digits), only about 2000 multiplications of 1000-bit numbers are required to compute the exponentiation of g. By contrast, the fastest techniques for taking logarithms in arithmetic modulo p currently demand more than 2^100 (or approximately 10^30) operations. Using the best supercomputer would take many millions of years to compute the logarithm of g^x mod p.

The length of the Diffie-Hellman key K itself is typically about 128 bytes (1024 bits). This is often further processed (via a hash function, explained later) down to 8 bytes (64 bits) to obtain a DES encryption key.

E. RSA Public-Key System

RSA Data Security, headquartered in Redwood City, CA, is a company founded in 1982 by the inventors of the RSA system (Ron Rivest, Adi Shamir, and Leonard Adleman). The RSA encryption technology is found in Microsoft Windows, Netscape Navigator, Intuit's Quicken, and Lotus Notes. It is also found in Apple, Sun, and Novell operating systems, and on Ethernet network cards. RSA is part of the SWIFT (Society for Worldwide Interbank Financial Telecommunications) standard for international financial transfers, as well as the ANSI X9.31 draft standard for the US banking industry.

In the RSA public-key cryptosystem used for electronic cash, both encryption and decryption are done by raising the message M (and recall that any computer message can be viewed simply as a binary number) to a power that is the appropriate key. As noted previously, these exponentiations are done in modular arithmetic, which saves only the result of division by a fixed number called the modulus. This modulus needs to be quite large, usually at least 230 decimal digits (768 binary digits). The example in Table 2 uses small numbers for illustration.

When the system is set up, the key-making entity (bank, customer, merchant) generates two large primes p and q. Their product n = p*q will be the modulus of the exponentiations. Euler's totient function for n is t(n) = t(p*q) = (p-1)*(q-1). The basis of the RSA system is Euler's Theorem (covered previously), which says that for any number x relatively prime to n, we have

x^t(n) = x^[(p-1)*(q-1)] = 1 mod n.

Or, setting t = (p-1)*(q-1),

x^t = 1 mod n,

provided x is divisible neither by p nor by q. Recall that the quantity t corresponds to the number of positive integers less than, and relatively prime to, n = p*q.

Next, the keymaker chooses an e and d with

e*d = 1 mod t, [that is, e*d = k*t + 1, for some positve integer k],

where (e, n) will be its public key and (d, n) its private key.

Consequently, any admissible x encrypted with e can be decrypted with d, since, calculated mod n,

(x^e)^d = x^(e*d) = x^(k*t + 1) = x*x^(k*t) = x*(x^t)^k = x*(1^k) = x.

The keymaker disseminates the public key e to all users, together with the modulus n = p*q, but never reveals p, q, or the private key d.

In general, then, in the RSA system, for plaintext message M and its associated ciphertext message C:

M^(e*d) = (M^e)^d = (M^d)^e = M mod n

Encryption: Ee(M) = C = M^e mod n

Decryption: Dd(C) = C^d mod n = (M^e)^d mod n = M mod n.

Notice how, in principle, RSA signatures work. (We will simplify by deferring discussion of hash functions until later.) A bank whose secret key is d will sign a message M by encrypting it with the secret key d, which simply means raising M to the power d:

Signature: Ed(M) = M^d mod n.

This message can be verified, and decrypted, by anyone by raising it to the power e, which is the bank's public key:

De(Ed(M)) = (M^d)^e mod n = M mod n.

Table 2: RSA Public Key System

RSA Public Key System
Example
Choose p and q, where both p and q are prime. Choose 11 and 13, which are prime numbers.
Find their product n = pq.Calculate n = 11*13 = 143.
Calculate Euler's totient function t = (p-1)(q-1). Calculate t = (11-1)*(13-1) = 120.
Select an integer e, where e<t, and the greatest common divisor of e and t is 1.Let e = 7. This will work because 7<120, and their greatest common divisor is 1.
Calculate d such that e*d = 1 mod t.We want 7*d = 1 mod 120.

Thus d = 103, as 7*103 = 721 = 1 mod 120.

The public key is (e, n).The public key is (7, 143).
The private key is (d, n).The private key is (103, 143).
Plaintext can be any number M, where M < n, and neither p nor q divides M Let the numerical representation of M be M = 5, for example.
The ciphertext is C = M^e mod n.The ciphertext is C = 5^7 mod 143 = 47.
The plaintext is C^d mod n = (M^e)^d mod n = M. The plaintext is 47^103 mod 143 = 5.

To see this last step, note the following, mod 143:

47^103 = (5^7)^103 = 5^721
= 5*[5^720] =5*[(5^120)^6]
= 5*[ 1^6] = 5.

Here we have used the result that

5^120 = 5^t = 1 mod 143 by Euler's theorem.

Or, more simply, x^(e*d) = x; hence, 5^721 = 5.

The RSA system is patented (US Patient 4,405,829). The patent expires Sept. 29, 2000.

The security of RSA depends on the difficulty of factoring large numbers. For, given n, it is not easy to find the prime factors p and q such that n = p*q. This difficulty is crucial, for if p and q were known, then given the public key e also, finding d--the private key--such that e*d =1 mod (p- 1)(q-1) would not be difficult.

For public key cryptography, it was (in 1997) believed that a modulus of 768 bits (made up of two prime factors each of around 384 bits) was sufficient for security until the year 2004. That is, a "large enough" modulus was n = 2^768 approximately, with p, q primes each in the neighborhood of 2^384. The fastest known factoring algorithm was the number field sieve, which had running time, for factoring a large number of size n, of order

O(exp[1.9223*(ln n)^(1/3)*(ln ln n)^(2/3)]).

To factor n in the neighborhood of n = 2^1024 would require about 2^87 steps. (Substitute n = 2^1024 in the expression.) If a computer does 1 million instructions per second, it would do 31,536,000 x 1,000,000 instructions in a year (a "mips-year"). Since this equals about 2^45 steps in a year, it would take about 2^(87-45) = 2^42 years to factor the number n, or 4*10^12 years. That's for a machine that does 1 million instructions per second. A 100-megahertz Pentium does about 50 million instructions per second, so the number could be factored on this machine in about 8*10^10 years.

RSA uses public keys to encrypt symmetric session keys. The latter are often insecure. A message encrypted with a 40-bit symmetric session key, using RSA's RC4 stream encryption algorithm, takes an average of 64 mips-years to break. That is, a computer operating at 64 million instructions per second would take a year to break the message's encryption. This size (40-bits) was until recently the maximum symmetric-key size allowed for U.S. export software (under the International Traffic in Arms Regulations), although 128-bit keys could be used in domestic software. As of 1997, symmetric keys less than 90-bits in length were not considered secure.

F. Hash Functions & Digital Signatures

Often when one wants to sign something, such as a long message or document, it is inconvenient to sign the entire message. One reason is public key cryptography is slow. So the solution is to create a message digest, which is a number or string of fixed length which represents the message. The message may be, for example, as long as 50,000 bits, but the message digest may be only 128 bits. The idea is to sign the message digest, instead of the message.

For example, the sender of the message can also create and sign a message digest, encrypting the latter with her private key. The encrypted digest is sent along with the message. Then the receiver of the combined message and encrypted message digest can 1) recompute the message digest from the message received, 2) decrypt the encrypted message digest using the sender's public key, and 3) compare that the two are the same. If they are, then in all probability the message received was the same as the message sent, and moreover one can be confident that it was sent by the owner of the public key used to decrypt the message digest. This process of recreating a message digest from the received message, and comparing it to the decrypted version of the received message digest, is sometimes referred to as a message authenticity check.

Hash functions are used to create message digests. A hash function H(x) is a mapping from any variable-length string or message x = M to a string (binary number) of fixed length h:

h = H(M).

The fixed string h is called the hash value. A "message digest" is simply a hash value for a given message. Different hash functions H(x) will yield different hash values. A typical hash string length is 128-bits, such as found in RSA's MD5 (Message Digest 5). This means that any message is mapped to one of 2^128 outcomes. So the probability that two messages will map to the same number is very low. Essentially, we want hash outcomes to be uniformly distributed over the space of 2^128 outcomes, because if a subset of the outcomes were favored, there would be greater likelihood of two different messages having the same hash value.

If two different messages do map to the same hash value, this is referred to as a collision. So if the hash function uniformly distributes outcomes over the outcome space, it is relatively "collision-free". Of course, no function which maps a larger space into a smaller space can avoid all collisions, so a better term is "collision-resistant". Using collision-resistant hash functions helps prevent message repudiation. No one will be able to deny having signed a given message (repudiate the message) by producing a different message that hashes to the same value, and claiming that this different message was the one intended. For if you could find a second (pseudo) message that hashes to the same value as a signed message (a collision), then you could remove the signature from the signed message, attach it to the pseudo-message, and claim the person had signed the pseudo-message.

In addition to being collision-resistant, a good hash function must also be one-way: one should not be able to feasibly compute the original message given its hash value (message digest). So if you can't compute the original message from its message digest, and you can't find another message that gives the same hash value, then you won't be able to falsify a signed message in transit or in storage.

Hash functions and secret keys are combined to produce digital signatures. First a hash h of the message M is produced. Then this value is encrypted with the sender's secret key, to form Esk(h). This encrypted hash value is sent along with the message itself. The message recipient uses the public key of the sender to decrypt the encrypted hash value: Dpk(Esk(h)) = h. The message recipient then also recomputes h directly from the message M to obtain h = H(M). If the two computations yield the same value for h, then the message must have come from the owner of the public key pk, and moreover the message could not have been altered.

Why? Because if the message has been altered to, say, M+, then the hash value h+ = H(M+) will not be correct (except with a vanishing probability of about 1/(2^128), for a 128-bit hash). Also, if the encrypted hash value was not signed with the supposed sender's secret key sk, but rather with some other secret key sk+, then the decryption will not yield a correct value: Dpk(Esk+(h)) is not equal to h. Thus a digital signature is superior to a handwritten signature, because it verifies both the identity of the signer of the message and the message contents. The slightest change in the message M will cause the signature process to fail.

Two well-known hash functions are RSA's MD5 (Message Digest 5) and NIST's SHA-1 (Secure Hash Algorithm). The hash values for MD5 are 128-bits long, while those of SHA-1 are 160-bits. Two other hash functions are RIPEMD-128 and RIPEMD-160, which yield respectively 128- and 160-bit hashes.

Are these hash functions collision-resistant? In 1994 P. van Oorschot and M. Wiener estimated that for $10 million one could build a special purpose computer to look for MD5 collisions, and these could be found in 24 days on average. MD5 was based on an earlier MD4, which can be considered broken. Neither does RSA recommend use of MD2, an earlier hash function designed for 8-bit machines. As for MD5, RSA recently stated: "Existing signatures formed using MD5 are not at risk and while MD5 is still suitable for a variety of applications (namely those which rely on the one-way property of MD5 and on the random appearance of the output) as a precaution it should not be used for future applications that require the hash function to be collision-resistant". For the present, SHA-1, RIPEMD-128, RIPEMD-160 can all be considered secure.

G. Blind Signatures

Suppose a customer wants a bank to sign a piece of digital cash with the serial number x, but does not want the bank to know which piece of cash it is signing. This can be achieved with a blind signature protocol. A simple example, using the RSA public key system, is as follows (see also Table 3):

1. The customer chooses a blinding factor r, chosen randomly and independently of x, and presents the bank with

x*r^e mod n,

where (e, n) is the bank's public RSA key. That is, the customer raises the blinding factor r to the power e, multiplies this by the serial number x, and sends the bank the result mod n.

2. The bank signs the digital cash (the blinded number) with its private RSA key d:

(x*r^e)^d = (x^d)*(r^(e*d)) = r*x^d mod n.

Recall here that r^(ed) = r, by a previous result. The bank signs the piece of digital cash by raising it to a power, namely to the power of its private key d, and takes the result mod n.The bank returns the coin to the customer.

3. The customer divides out the blinding factor:

(r*x^d)/r= x^d mod n.

The customer keeps the signed digital cash, x^d, for later use.

Since r is random, the bank cannot determine x separately from the product originally presented to it, and thus cannot connect the signing for a particular customer with the serial number of the digital cash when it is subsequently spent. The spending of the coin is untraceable. When presented with the coin, the bank can verify that it is valid, but it cannot say to whom the coin was issued.

Table 3: Blind Signature Protocol

Blind Signature Protocol
Example
A piece of digital cash has the serial number x.A piece of digital cash has the serial number x = 8.
The customer chooses a blinding factor r.The customer chooses r = 5.
The customer calculates x*r^e mod n, where (e, n) is the bank's RSA public key. The customer calculates 8*5^7 mod 143, using the bank's public key (7, 143). This gives 90 as the result.
The bank signs the customer-submitted number with its private key d, giving, mod n, r*x^d . The bank signs 90 with its private key 103, which gives, mod 143, 90^103 = (8*5^7)^103 = 8^103*5^721 = 5*8^103.
The customer divides out the blinding factor r, giving (r*x^d)/r = x^d mod n.The customer divides out the blinding factor 5, giving (8^103*5)/5 = 8^103 mod 143.
That is, the digital cash bearing the serial number 8 has now been signed with the bank's private key of 103. Anyone can verify that the serial number is 8 by using the bank's public key of (7, 143) to calculate (8^103)^7 mod 143 = 8^721 mod 143 = 8*(8^120)^6 = 8*(1)^6 = 8.

To summarize the blind signature protocol, for any message M, the customer presents the bank with

r^e * M mod n.

The bank returns

r * M^d mod n.

The customer divides out the random factor r, which leaves

M^d mod n

as the signed message.

The blind signature protocol was announced by David Chaum in 1982. There are other examples of signature protocols that are conditionally blind--"restrictive blind signatures"-- which reveal the identity of the digital cash owner if he spends it more than once. Restrictive blind signatures will be discussed in connection with Stefan Brands' system of digital cash, as well as next, on double-spending identification.

H. Double-spending Identification

An off-line system of digital cash that allows for buyer anonymity has the problem that if a buyer is committing a fraudulent transaction by multiple-spending (trying to spend the same piece of digital cash more than once), the bank and the seller have no ordinary recourse to take action against the buyer, since they don't know who he is. (In an on-line system, by contrast, the coin is typically checked against a spent-coin data base before the good or service is released to the buyer.) One way the double-spending problem can be handled in off-line systems is to require the buyer to reveal some information about himself when the cash is withdrawn. This information will be segmented in such a way that no one piece of the information will reveal the buyer's identity, but two pieces will. So if the buyer double-spends, the bank will acquire two pieces of information, the buyer's identity will be revealed, and the bank can initiate legal action against the buyer.

1. Cut and Choose

In the cut-and-choose method, the buyer creates N pairs of numbers that are associated with a digitally signed coin at the time it is withdrawn. Any pair can identify the buyer. The bank is able to check that the N pairs are present.

When the buyer offers the coin to the seller, the seller sends a "challenge", which is a sequence of N random bits: 10001011 . . . If a "0" is present the first number of the corresponding number pair is sent, while if a "1" is present, the second number of the pair is sent. Note what this means. If the buyer later offers the same coin to this or another seller, and the seller challenges the buyer with another sequence of N random bits, it is almost certain some of the bits in the two sequences will correspond. Hence the buyer will be identified. In the example in Table 4 we see that three bits differ between 10001011 and 11011010, hence any one of these three pairs may be used to determined the buyer's identity.

The problem with this cut-and-choose method is its slowness, and relative inefficiency in implementation, because of the large amount of information that is involved in the sequence of number pairs. The prevention of double-spending in the Chaum-Fiat-Naor (1990) scheme is by cut and choose and is inefficient for this reason. Greater efficiency can be achieved by means of the Schnorr Protocols, covered next.

Table 4: Cut and Choose Matching

Number Pairs
10001011
11011010
Pair Revealed?
(328, 956)956956No
(1124, 333)1124333Yes
(516, 2251)516516No
(7, 895)7895Yes
(486, 585)585585No
(228, 61)228228No
(1591, 1592)15921592No
(825, 667)667825Yes

2. Zero-knowledge Proofs & Schnorr Protocols

A zero-knowledge proof is one whereby, through some protocol or procedure, you are able to show knowledge of something (usually a number or set of numbers) without having to actually make that something known to the other party.

For example, consider a bank vault with a single door, and with a numeric combination which can be entered from either side of the door. Alice does not know the combination. To prove to Alice that you do know it, you allow Alice to lock you in the vault. You then enter the combination from your side, open the door, and exit the vault. Your reappearance from within the vault proves you know the combination, without your having to reveal anything about the combination to Alice. You have enacted a zero- knowledge proof.

One example of a zero-knowledge proof is found in the Schnorr authentication (identification) protocol. Schnorr authentication is another example of a system based on discrete logarithms. Suppose a customer is proving to a merchant that she--the customer-- knows her own secret key x. That is, prove that she is who she claims to be, as no one else knows this secret key.

First, the customer sends the merchant a random number w. The merchant responds with a challenge: namely, a number c. Then, to prove her identity, the customer responds with a point y on the line

y = w + x*c mod q, for some mutally agreed prime q.

That is, when the merchant sends the challenge c, the customer rapidly fires back y, which she can do because she knows her secret key x.

Of course, the procedure can't be done exactly like this, because the merchant--now knowing c, w, and y would also be able to determine x. We want to disguise things a little so the merchant can determine that y is the proper response to the challenge c, without the merchant being able to determine x itself.

To do this, we exponentiate the equation y = w + x*c by some generator g in G(q), and take the result modulo some prime p (where q divides p-1):

g^y = g^w*g^(x*c) mod p.

Now g^w = a will be a random number the customer sends the merchant. Let h = g^x mod p. Call h the public key of the customer. This will also be known to the merchant. So the last equation reads

g^y = a*h^c mod p.

To recap, the customer sends the value a = g^w to the merchant. The merchant accepts a, sends the challenge c, and receives back y. The merchant, knowing p, q, the public key h, and the base or generator g, can now check the identity

g^y=? a*h^c mod p.

The step-by-step details of this procedure are illustrated in Table 5.

Table 5: Schnorr Authentication Protocol

Schnorr Authentication Protocol
Example
Preliminary
Choose two primes p, q, where q is a prime factor of p-1.Choose p = 23, q = 11, where 11 is a prime factor of 22 = 23-1.
Choose g (not equal to 1) such that g^q = 1 mod p.Choose g such that g^11 = 1 mod 23. Let g = 2, since 2^11 = 2048 = 1 mod 23.
Generate a secret key by choosing x < q.Generate a secret key, say x = 9, since 9<11.
Generate a public key by calculating h, where h = g^x mod p. Generate a public key by calculating h, where h = 2^9 mod 23 = 6.
Note that p, q, g, and h are all public. Only x is secret.Note that p = 23, q =11, g = 2, and h = 6 are all public. Only x = 9 is secret.
Authentication
Customer picks a random number w < q, and computes a = g^w mod p.Customer picks w = 3 < 11, and computes a = g^w = 2^3 mod 23 = 8 mod 23 = 8.
Customer sends a to merchant.Customer sends a = 8 to merchant.
Merchant sends customer a random number c, where c is between 0 and 2^t - 1, where t is about t = 72.Merchant sends customer c=5.
Customer calculates y = w + x*c mod q, and returns y to merchant.Customer calculates y = 3+9*5 mod 11 = 48 mod 11 = 4, and returns y = 4 to the merchant.
Merchant verifies g^y = a*h^c mod pMerchant calculates a*h^c mod p = 8*6^5 mod 23 = 62208 mod 23 = 16. Merchant also calculates g^y mod p = 2^4 = 16. These are the same so the merchant accepts that the customer knows x.

Why does this procedure work? Because in order find y without knowledge of the secret key, the customer would have to take discrete logarithms:

y = log_g [(g^w)*(g^x)^c] mod p = w + x*c mod q.

This cannot be done under any feasible constraints on budget and time, as long as p is about 768 bits, q is about 140 bits, and the random number w is chosen over a range between zero and 2^72. Hence the customer has demonstrated knowledge of the customer's own secret key, but has not revealed that secret key to the merchant. The customer has enacted a zero-knowledge proof.

The Schnorr signature protocol adds a hash function to this process. Instead of the merchant sending a challenge c, the customer calculates c via the hash function. Suppose the customer wants to send a message M to the merchant. First, she chooses a random number w, and computes a = g^w mod p, as before. She then concatenates M and a, and calculates a hash value c:

c = H(M,a)

using a hash function H( ). She then calculates y = w + x*c mod q, and sends the signature (c, y) along with a and the message M to the merchant. The signature tuple (c, y) is equivalent to the challenge and response we saw previously.

The merchant calculates,

g^y = a*h^c mod p

then also calculates the hash value H(M,a), and verifies that

c = H(M,a).

If they are the same, the signature is valid.

In this case, both parties are using the same hash function H. The customer has an incentive to randomize w, because otherwise the paired values of y and c she sends to the merchant from time to time would reveal her secret key x. For two different paired values (c1,y1) and (c2,y2), but with the same w, the merchant can calculate x = (y1- y2)/(c1-c2).

On the other hand, the hash function H must be collision-resistant, because otherwise the customer might be able to manipulate the value of c.

This signature protocol is illustrated in Table 6.

We will see later how the Schnorr signature protocol forms a significant part of Stefan Brands' system of anonymous digital cash.

Table 6: Schnorr Signature Protocol

Schnorr Signature Protocol
Example
Customer wants to send a message M.Customer wants to send a message M = 5.
Customer picks a random number w < q, and computes a = g^w mod p.Customer picks w = 3 < 11, and computes

a = g^w = 2^3 mod 23 = 8 mod 23 = 8.

Customer has a hash function H( ).For simplicity, let H(k, j) = (k*j)^7 mod 17. (This is not a good hash function.)
Customer concatenates M and a, and calculates a hash value c = H(M,a).Customer concatenates M = 5 and a =8, and calculates c = H(5,8) = (5*8)^7 mod 17 = 14.
Customer calculates y = w + x*c mod q, and sends (c, y), the signature, along with a and the message M to merchant.Customer calculates y = 3 + 9*14 mod 11 = 8, and sends (c, y) = (14,8) to the merchant, along with M = 5 and a = 8.
Merchant checks if g^y = a*h^c mod p.Merchant computes g^y mod 23 = 2^8 mod 23 = 3. Merchant computes a*h^c mod 23 = 8*6^14 mod 23 = 3. The two sides are equal.
Merchant computes H(M,a).Merchant computes H(5,8) = (5*8)^7 mod 17 = 14.
Merchant accepts the signature if c = H(M,a).Merchant accepts the signature because c = 14 = H(5,8).

I. The Representation Problem

As we saw above, Diffie-Hellman key agreement and Schnorr signatures all hinge on the practical difficulty of calculating discrete logarithms. So also does the U.S. government's Digital Signature Standard. The representation problem extends this computational difficulty to the calculation of a set of numbers related to discrete logarithms. The ability to draw on a whole set of hard-to-calculate numbers, instead of a single logarithm, gives us an important degree of flexibility in creating anonymous digital cash systems.

Let g be an generator of the subgroup G(q) in Z(p)* (g can be any element from G(q) other than 1). Recall that the discrete logarithm problem involves finding some power a such that, for a given h in G(q),

g^a = h mod p .

Otherwise stated, a = log h mod p, to the base g. The exponent a is the discrete logarithm of h. The representation problem is a generalization of the concept of finding a discrete logarithm a. It involves finding a set of numbers, called an "index-tuple," instead of a single number.

Let k>= 2, and 1<=ai<= q, for i = 1 to k. The representation problem for a given h in G(q) is to find an index-tuple {a1, ... , ak}, for a generator-tuple {g1, ... , gk} from G(q) such that

g1^a1*g2^a2 *...*gk^ak = h mod p.

The index-tuple {a1, ... , ak} is called a representation. Note that we exclude the case ai = 0, because letting ai = q, whereby gi^q = 1 mod p, serves the same purpose as letting ai = 0 and having gi^0 = 1.

For example, we saw previously that {2, 3} are two of the numbers that generate G(11) in Z(23)*. So one example of a representation problem is to find a1 and a2 such that for h = 13,

2^a1*3^a2 = 13 mod 23.

This is equivalent to finding a weighted average of discrete logarithms which equals another discrete logarithm:

a1*log (2) + a2*log (3) = log (13) mod 23.

One solution is a1 = 2 and a2 = 2, since 2^2*3^2 = 4*9 = 36 = 13 mod 23. Another solution is a1 = 7 and a2 = 11, since 2^7*3^11 = 128*1 = 13 mod 23. Yet another solution is a1 = 3 and a2 = 6, as 2^3*3^6 = 8*729 = 13 mod 23.

Notice that if there are k elements in the representation, {a1, ... , ak}, then there are

q^(k-1) representations of h.

This follows from the simple observation that a1 can be chosen from any of the q powers {1, 2, 3, ..., q}, a2 from any of these q powers, etc., up to ak-1, for a total of q^(k-1) . The last element, ak, however, must be chosen so that g1^a1*g2^a2 *...*gk^ak = h mod p.

Thus for k=2 in G(11), there are 11^(2-1) = 11 representations of each number h in G(11). For k=3, there are 11^(3-1) = 121 representations of, say, 4 = 2^a1*3^a2*13^a3 mod 23, because 2, 3, 4, 13 are all elements of G(11) in Z(23)*.

J. Proving Knowledge of a Representation

Since knowledge of a representation cannot be gained without taking discrete logarithms (which is not practically possible for numbers of sufficient size), one can thus use one's knowledge of a representation to verify one's identity. The following protocol is designed to prove knowledge of a representation.

Suppose you, the prover P, know the representation of h = g1^a1*g2^a2 *...*gk^ak mod p, where h is an element of G(q). That is, given {g1, ... , gk}, you know {a1, ... , ak}, and you want to prove that fact, without at the same time revealing what {a1, ... , ak} are. How do you do this? You want a zero-knowledge proof whereby you do not give away what you know, but the other party can be nevertheless assured that you do know {a1, ... , ak}.

Let's call the other party V, for verifier, and assume that V, who also knows h and {g1, ... , gk}, is seeking assurance that you know {a1, ... , ak}.

Step 1: You, the prover P, generate k random numbers {w1 , ... , wk} chosen from G(q). You send to V the number z = g1^w1*g2^w2 *...*gk^wk mod p.

Step 2: V receives z and generates a random number, called a "challenge", c, and sends it back to you.

Step 3: You compute, for i = 1 to k, ri = ai + c*wi mod q, and send these k numbers back to V. (Note that the numbers are reduced mod q, because gi ^q = 1 mod p.)

Step 4: V calculates z*h^c and compares this to g1^r1*g2^r2 *...*gk^rk mod p, performed as a separate calculation. (Note that V is only calculating exponentials, which is not hard to do.) If the two numbers are the same, V accepts that you know the representation.

The two numbers will be the same if you know ai, and will otherwise be different, because

z*h^c = (g1^w1*g2^w2 *...*gk^wk) * ( g1^a1*g2^a2 *...*gk^ak)^c

= (g1^[w1+ c*a1]*g2^[w2+ c*a2]*...*gk^[wk+ c*ak])

= g1^r1*g2^r2 *...*gk^rk mod p .

This process using just two generators is shown again for illustration in Table 7.

Table 7: Proving Knowledge of a Representation

Prover P
Verifier V
Knows representation of h = g1^a1*g2^a2 ; namely, knows a1, a2.Only knows h and g1, g2; doesn't know a1, a2.
Generates two random numbers w1, w2 .
Computes z = g1^w1*g2^w2 . Sends z to V.Receives z from P. Generates challenge c.
Sends challenge c to P.
Receives c. Computes ri = ai + c*wi mod q, for i = 1 to 2. Sends these to V.Receives ri. Checks that z*h^c = g1^r1*g2^r2 mod p. If the equality holds, the Verifier accepts that the Prover knows the representation (still unknown to the Verifier) of h--namely, a1, a2.

K. Certification Authorities and Digital Certificates

The Department of Motor Vehicles is an example of a certification authority. It issues a certificate, called a driver's license, which certifies the connection between a person's photograph and the personal information on the license.

Similarly, a digital certificate verifies the connection between someone's public key and that person's personal identification (however the latter is defined). A digital certificate is typically issued by a trusted third party, who is thus called a certification authority (CA). One example of a private CA is VeriSign Inc., which runs a CA service for commercial World-Wide Web browsers. Companies have to give VeriSign certain notarized business documents before obtaining a VeriSign digital certificate. Other commercial CA services include GTE's CyberTrust and IBM's Net.Registry. Software packages for CA management include Nortel's WebCA and BBN's SafeKeyper Certificate Management System.

A digital certificate is used with a digital signature in the following way. When Bob signs a message with his private key and sends it to another party, Bob sends a certificate along with the signature. The message recipient can then verify the message signature using Bob's public key, but also looks at the certificate for additional security that the public key in fact belongs to Bob. That is, "Bob" has signed his message with his secret key, which can be verified with "Bob's" public key. The certificate is intended to show that "Bob" is really Bob.

The certification authority, in turn, does its job by binding Bob's identity with Bob's public key, and signing the joint information with the CA's own private key. This signed message (digital certificate) from the CA can be read using the CA's public key. Everyone is assumed to have a tamper-resistant copy of the CA's public key so they can verify signatures on digital certificates.

If the CA's secret key is sk-ca, while Bob's public key is pk-bob, then the CA creates a certificate for Bob of the form:

CA certificate = Esk-ca{Bob's identity, pk-bob}.

The certification can then be checked using the CA's public key pk-ca:

verification = Dpk-ca(Esk-ca{Bob's identity, pk-bob}) = {Bob's identity, pk-bob}.

This mechanism, of course, simply passes the buck to a higher level authority. The CA may have certified the association between Bob's identity and Bob's public key, but how does one know that the association between the CA and the CA's public key is valid? Can you trust the CA? So the use of certification authorities results in a chain of trust. At the top is a certification authority which everyone knows and trusts (that's the idea, anyway), and this certification authority certifies the public keys of other certification authorities. And so on down the line until some CA verifies that Bob's key belongs to Bob. The public and private keys of top-level CAs are called root keys. A top-level CA's public root key will be widely distributed and might also be published regularly in a major newspaper.

Meanwhile, the value of credentials signed with the CA's private key is only as good as the security of the CA's private key itself. A private (secret) root key of a CA may be split into several parts for security. It might, for example, be split into five parts, any two of which are sufficient to determine the key. (This is similar to a requirement that two people sign a check over a certain amount.)

To understand the splitting of a key, think of a line in a two-dimensional plane. Any two points on the line determine its slope. Suppose the slope represents the secret key. One point on the line will not reveal the secret key (the slope), but any two points will. So, to split up the key, one simply chooses five different, but otherwise arbitrary, points on the line, and gives the location of each point to a different person. Any two of them can determined the secret key needed for signing. Such a process is referred to as secret sharing.

Digital certificates (and CAs) may be used for other things than proofs of identity. A certificate may, for example, verify Bob's spending limit on a particular credit card. So, in the general sense, a digital certificate is a contract between two or more parties that verifies or specifies certain information.

One format for a digital certificate is laid out in ITU-T X.509. An X.509 certification message includes the following fields:

Version: Version number of the X.509 certificate specifications. Version 1 was published in 1988, version 2 in 1993, and version 3 proposed in 1994.

Serial Number: The certificate serial number.

Signature Algorithm ID: The algorithm that will be used to sign the certificate. For example, the Digital Signature Algorithm (DSA). The DSA uses the group of prime order G(q) in Z(p)*, so the prime p, the prime q, and a generator g must all be specified in this part of the certificate.

Issuer Name: The name of the CA body issuing the certificate.

Validity Period: How long the current certificate is good for.

Subject or User Name: The person or entity on whose behalf the present certificate is being issued.

Subject Public Key Information: The public key of the person or entity.

Subject Unique Identifier: Information about the person or entity that is being bound to the person or entity's public key. For example: account XYZ123 at UBS, Berne.

Signature on the Above Fields: The CA's signature.

The X.509 certificate standard is supported by many security protocols, such as S-HTTP, SSL, PEM, and PKCS-7. Version 3 certificates are also used in SET.

Posted here October 10, 1997


J. Orlin Grabbe is the author of International Financial Markets, and is an internationally recognized derivatives expert. He has recently branched out into cryptology, banking security, and digital cash. His home page is located at http://www.aci.net/kalliste/homepage.html.

-30-